Clouds are opaque. By their very nature, real world clouds are formed of liquid droplets, frozen crystals and other suspended atmospheric particles. Also as a result of their essentially virtualized nature, computing clouds (shouldn’t be, but often are) somewhat opaque if they have been provisioned and spun up by IT teams that don’t adequately annotate their status, services breadth and functionality scope. To make matters worse, when teams start to create so-called “shadow IT” cloud services that do not adhere to core technology team strategy and mandates, an organization’s cloud stack gets even more opaque… and even a little hazy.
Computinng cloud’s should be clearer more clinically defined computing entities, so what’s going on?
You Spent How Much?
Kai Wombacher, product manager at Kubecost, tells the story of how he left the cloudy hot tap running back when he was a rookie software engineer. Straight out of college and working as an AI/ML engineer at a tiny startup, he deployed a deep neural network for a customer on their equally tine cloud infrastructure. Only at the end of the monthly billing cycle did he learn that his deployment was responsible for cloud costs of something like $10,000 a week.
“Someone more senior than me had to spend their time asking Google Cloud Platform to forgive my mistake. To their credit, they did. But trust me when I say that it wasn’t a good feeling,” he said. “If I’d known at the time that my mistake also had a tremendous carbon impact (from the energy spent running datacenter hardware inefficiently on my behalf), I’d have felt even worse. If I’d had the visibility to understand the resources I was using and the true compute and carbon costs I was responsible for, I’d have done things differently.”
But a lot has changed in the last decade with cloud. We’ve been through an era when cloud application performance management gave rise to a huge swathe of APM tools, some of which forked and skewed into worthy open source source projects such as Kubecost’s own OpenCost… and some of which evolved into higher-tier observability platforms, which now benefit from the additional helping hand of artificial intelligence functions. Today, the environmental harm factor associated with inefficient cloud utilization is a large part of the current drive to champion more (ecologically and technologically) sustainable software engineering.
Spend-To-Carbon Is A Linear
Wombacher proposes that, in general, the relationship between cloud spending and carbon impact is linear. Some cloud computing nodes or assets may offer slight exceptions, but generally speaking the more computationally intensive a workload, the more carbon-intensive it is. Industry evangelists and analysts in this space suggest that many cloud engineering teams deploying cloud workloads don’t have much visibility into the expenses behind their decisions (hence the rise of FinOps on many levels, as previously discussed here, although this is not a tale of FinOps), nor do they yet understand that compute and carbon costs go hand in hand.
“The stakes of this lack of visibility continue to increase,” said Wombacher, speaking at a cloud sustainability panel to press and analysts this month. “To train more complex AI/ML models, organizations increasingly harness graphical processing units, which are more expensive, consume more power… and come with far higher carbon costs. Yet, we are (fortunately) also seeing a major and sincere trend toward organizations becoming more environmentally conscious. Teams are seeking out the tools and processes to understand not just dollar costs but their actual carbon costs as well. Even organizations with small cloud environments are often shocked to learn how many kilograms of carbon pollution stems from their workloads.”
Wholewheat Peace-Loving Citizens
When software application development teams do gain real-time visibility into both types of costs (along with the ability to predict costs) very often it’s a question of just seeing the the data that’s is often enough to spur change and drive optimization that reduces both spending and carbon consumption.
Plus, let’s face it, software engineers are typically wholewheat peace-loving planetary citizens who would rather hug a tree than anyone in a suit, so developers are more driven to reduce carbon emissions than they are to save dollars, pounds and pennies.
“At the most recent Cloud Native Computing Foundation KubeCon EU conference in Paris this April 2024, I had multiple conversations with engineering leaders that all revealed the same interesting information. Each of these folks said that they would have a much easier time convincing their teams to make changes if they knew the carbon impact. Between saving the environment and saving money, the environment is a far more motivating carrot for developers. Whereas discussions about spending can bog down in bureaucratic resistance, concern for the planet clearly demands that everyone work together to do their part,” Wombacher.
Common Carbon Cloud Culprits
When teams gain real-time visibility into their cloud and Kubernetes infrastructure usage, they likely see several common areas in need of optimization. Inefficient container requests are always the number one issue. Close behind, inefficient cloud node sizing and lack of an autoscaler (technology that works to add or delete instances of managed cloud, based upon a pre-defined system operations policy) result in workloads requesting far more resources than necessary.
What often happens is that developers think “I just want my application to run” and they have a bias towards assigning outsized resources to make sure it does. For example, a workload might request 100 million cores of RAM – setting aside those resources and absorbing the full compute and carbon costs – when the workload has a maximum usage of just 30 million cores. Even leaving a comfortable cushion, bringing those resource requests in line with reality results in huge reductions to an organization’s bills and carbon footprint. Teams could also often take better advantage of Spot Instances on AWS [discounted a “special offer” priced below the on-demand cloud level that work by using unused capacity elsewhere] to reduce cost and carbon burdens.
“Finally, there’s an oddity of cloud computing that makes it easy to delete a cluster and believe you’ve deleted everything associated with it… but, instead, still leave part of the cluster running and burning resources for no reason,” lamented Wombacher. “The visibility to detect that waste and tools to automate cloud optimization serve as an antidote to workload bloat and make it far easier to remain continuously vigilant about efficiency.”
Down To Disk-Level Detail
Wombacher concluded his analysis of the cloud visibility forecast by saying that he believes that simply introducing the ability to observe cloud resource usage can drive a significant reduction in an organization’s carbon footprint. Why is he so sure? Because he says he’s seen it happen i.e. when organizations have the tools and processes to break down carbon costs by namespace, application, team and even down to the workload level so that they can track the carbon impact of their nodes, disks, network and other aspects of their infrastructure, the opportunities for more green practices become hard to ignore.
To get a picture of green clouds today (in the color sense, not the sensible sustainability sense) we need to use the web to find an image that is probably created using generative AI. Coincidentally, it may be the use of gen-AI that drives the next phase of growth in this space. The weather forecast ahead is cloudy, but with increasingly clear spells.