Most cloud-cost conversations start in the wrong place: a finance spreadsheet, three months late, with no context and plenty of finger-pointing. By the time a quarterly spend review lands on an engineering team's desk, the workloads that drove the overspend have already been refactored, the engineers who built them have moved on to other projects, and the natural response is defensiveness rather than curiosity. The teams that close the gap do not have bigger FinOps budgets or more exotic tooling — they treat cost with the same discipline they apply to latency: instrument it, set thresholds, alert on regressions, and iterate. The difference between a team spending $50k/month on a workload and a disciplined team running the same workload for $30k is almost never tooling. It is tagging hygiene, a sensible commitment strategy, Kubernetes resource discipline, and the cultural norm of asking "what does this cost to run?" before code ships. This post covers all four, with enough operational detail to use on Monday morning.
27%
of cloud spend wasted
Flexera 2025 avg
~$182B
estimated annual waste
derived: $675B spend x 27%
65%
K8s workloads under-utilizing CPU
CNCF / FinOps Foundation 2024
84%
orgs struggle to manage cloud spend
Flexera 2025
Source: Flexera State of the Cloud, 2025; CNCF / FinOps Foundation survey, 2024
The Baseline Problem Is Worse Than You Think
The 27% waste figure from Flexera's 2025 State of the Cloud report has barely moved across three consecutive annual surveys. That flatness is more informative than the number itself: awareness without a system changes nothing. The industry has known about cloud waste for over a decade; it persists because the incentives inside engineering organizations do not naturally align with cost efficiency. Speed of delivery is rewarded and measured. Cost efficiency is invisible until it triggers a finance conversation, by which point the window for cheap correction has long closed.
Understanding where waste concentrates is the starting point for any serious optimization effort. Based on CNCF and FinOps Foundation survey data, the approximate distribution breaks down like this:
| Waste category | Approx. share of total waste | |---|---| | Oversized instances and Kubernetes pods | 35–45% | | Idle and zombie resources (stopped VMs, orphaned volumes, stale snapshots) | 20–30% | | Wrong commitment tier (on-demand pricing for steady-state workloads) | 15–25% | | Unused PaaS services and licences | 10–15% |
These are survey-derived estimates with wide error bars. Your actual breakdown depends on workload mix and how long your estate has gone without active cost hygiene. The consistent finding across sources is that rightsizing and commitment strategy together address at least half of recoverable waste in most environments. Both are engineering problems with engineering solutions — no new tooling required to start.
One number worth internalizing: 84% of organizations in Flexera's 2025 survey cited managing cloud spend as a significant challenge. The problem is not that engineers do not care — it is that cost feedback is too slow, too coarse, and too disconnected from the work that created it. That is a system design problem, and system design problems have system design solutions.
Where Kubernetes Hides the Waste
Kubernetes makes waste structurally easy to hide, for a specific reason: the scheduler allocates resources based on requests, not actual usage. When a developer sets requests.cpu: "2", the scheduler finds a node with 2 CPUs available and places the pod there — those 2 CPUs are allocated whether the container uses 2 vCPUs or 0.05 vCPUs. Actual consumption is invisible to the bin-packing algorithm.
The result, measured across production fleets by the CNCF and FinOps Foundation in 2024, is stark: Kubernetes clusters average roughly 10% CPU utilization relative to provisioned capacity. Around 65% of workloads use less than half their requested CPU; only about 7% have resource requests that accurately reflect actual consumption.
A concrete illustration. A Java service with this manifest:
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"...running with actual steady-state usage of approximately 0.2 vCPU and 800 MiB is occupying half a 4-vCPU node for a workload that could comfortably share a node with nine identical replicas. On a 20-node cluster, accurately sized requests could reduce the required node count to 4 or 5 — a direct 75–80% reduction in compute cost for that workload, with no change to application code.
Start with LimitRange defaults. Before running VPA, set namespace-level LimitRange objects so new workloads land with sensible upper bounds rather than unbounded allocations:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"Then run VPA in recommendation mode. Deploy VPA with updateMode: "Off" and collect data for at least two weeks before touching any manifests:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off"Apply VPA recommendations as the new baseline with 20–30% headroom on CPU. Tools like Goldilocks (Fairwinds, open-source) automate the recommendation dashboard across all namespaces and output suggested manifests directly. The discipline is: collect real data, apply it, repeat quarterly.
Make Cost Observable, Not Reportable
The canonical FinOps failure mode is architectural: cost lives in the finance team's Cost Explorer export, updated monthly, reviewed quarterly, and never co-located with the dashboards engineers look at when something breaks. Fixing this requires two things — tagging discipline and dashboard co-location — and neither requires a FinOps platform purchase.
Tagging is the foundation and it is non-negotiable. Every billable resource must carry at minimum: team, service, environment, and cost-center. Enforce this at the infrastructure-as-code level, not with post-hoc audits. In Terraform, consolidate required tags in a local and apply them to every resource:
locals {
common_tags = {
team = var.team_name
service = var.service_name
environment = var.environment
cost-center = var.cost_center
managed-by = "terraform"
}
}
resource "aws_instance" "app" {
ami = var.ami_id
instance_type = var.instance_type
tags = local.common_tags
}Pair this with a Conftest policy in CI that fails the plan on missing required tags:
package main
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
not resource.change.after.tags.service
msg := sprintf("Resource %v is missing required tag: service", [resource.address])
}Track tagging coverage rate — the percentage of total monthly spend attributable to a team + service tag pair — as a leading operational metric. Target above 90%. Anything below 80% means a large fraction of your estate is invisible to allocation, which makes every downstream analysis unreliable and every chargeback conversation unfair. This number is easy to pull from AWS Cost and Usage Reports or GCP BigQuery billing export, and it should be on a dashboard, not buried in a monthly audit.
Dashboard co-location changes behavior. A service-level Grafana dashboard that shows latency, error rate, and cost-per-request in the same view changes the conversation. OpenCost (CNCF graduated project, open-source) exposes a REST API and Prometheus metrics for per-workload cost allocation inside Kubernetes:
# OpenCost cost allocation, last 7 days, grouped by app label
curl "http://opencost.svc:9003/allocation/compute?window=7d&aggregate=label:app&accumulate=false"Feed that into a Grafana panel alongside your SLO panels. The goal is a single glance that answers: did this service's cost-per-request regress this week, and by how much? Cost only gets acted on when engineers see it in the same context as reliability. Separate dashboards create separate mental models, and separate mental models mean cost always loses to latency and error rate.
Commitment Strategy: The Three-Tier Model
On-demand pricing is the cloud provider's highest-margin product, and the default is for you to stay there indefinitely. The antidote is a three-tier commitment model applied by workload predictability:
| Tier | Workload type | Approx. savings vs on-demand | Risk | |---|---|---|---| | Reserved Instances / Savings Plans | Predictable, always-on baseline | 30–60% | Capital locked for 1–3 years | | Spot / Preemptible | Interruptible: batch jobs, CI runners, ML training | 60–90% | Instance reclaimed with ~2 min notice | | On-demand | Genuinely spiky, short-lived, or stateful | 0% (baseline) | None — this is the safety net |
The discipline is: measure actual steady-state usage (median over 30 days, not peak), purchase reservations to cover that capacity, and let spot and on-demand absorb the variable remainder. Nothing more, nothing less.
Worked example. A service runs 20 m5.xlarge instances continuously in us-east-1. Using approximate AWS public pricing (2025):
- On-demand rate: approximately $0.192/hour per instance
- 1-year no-upfront Reserved Instance rate: approximately $0.119/hour per instance
- Annual on-demand cost: 20 x $0.192 x 8,760 hours = approximately $33,638
- Annual reserved cost: 20 x $0.119 x 8,760 hours = approximately $20,852
- Annual saving: approximately $12,800, or roughly 38%, on this one service
Across a 50-service estate with similar patterns that scales to several hundred thousand dollars annually, with no code changes and no architectural work. Commitment discipline is the highest-leverage single action in most FinOps programs.
AWS Compute Savings Plans are generally easier to manage than per-instance RI purchases because they apply across instance families and sizes within a compute type. GCP Committed Use Discounts work independently on vCPU and memory, which provides flexibility when resizing workloads. Azure Reserved VM Instances work per-instance family. All three providers offer roughly equivalent economics — what varies is ease of management. The operational habit that makes this sustainable is a monthly or quarterly coverage check:
# AWS CLI: check Savings Plans coverage for the previous month
aws ce get-savings-plans-coverage \
--time-period Start=2025-04-01,End=2025-04-30 \
--granularity MONTHLY \
--query "SavingsPlansCoverages[].Coverage.CoverageHours.CoverageHoursPercentage"If coverage is below 60%, you are leaving significant money on the table. If it is above 90%, verify your workloads have not shrunk since the last commitment purchase — over-committed reservations on retired workloads are a real cost, just less visible than on-demand overspend.
Workload optimization and waste reduction is the top priority of FinOps practitioners — ahead of accurate forecasting, full cost allocation, and unit economics combined.
Guardrails Without Gates: Cost in the Pipeline
The instinct to prevent cost overruns by adding approval gates to deploys is counterproductive. Gates slow delivery, add friction, and teach engineers to route around the process. The right model is guardrails: automatic visibility, selective automation for obvious wins, and human judgment reserved for genuine decisions.
Pre-merge cost diffs with Infracost. Run Infracost as a CI step so the estimated monthly cost change appears as a pull-request comment before merge:
# Generate a baseline estimate, then diff the PR branch against it
infracost breakdown --path ./infra --format json --out-file infracost-base.json
infracost diff --path ./infra --compare-to infracost-base.jsonA diff that adds $800/month of infrastructure is not automatically blocked — the team decides. But it is visible before merge, when context is highest and course-correction is cheapest. This is the shift-left principle applied to cost: the earlier the feedback, the lower the cost of the fix.
Post-deploy anomaly detection at the service level. AWS Cost Anomaly Detection, GCP Budget Alerts, and Azure Cost Alerts all support tag-scoped thresholds. Configure them at the service level, not the account level — account-level alerts are too coarse to drive specific action. A useful starting threshold: alert when a service's daily spend is more than 30% above its 90-day rolling average. That fires when a developer spins up a large RDS instance for testing and forgets to clean it up, not just when the entire account has a bad month.
{
"AnomalyMonitor": {
"MonitorName": "payments-service-monitor",
"MonitorType": "DIMENSIONAL",
"MonitorDimension": "SERVICE"
},
"AnomalySubscription": {
"SubscriptionName": "payments-team-alert",
"Threshold": 30,
"ThresholdExpression": "GREATER_THAN"
}
}Connect alerts to the owning team's Slack channel, not a shared FinOps inbox. The person who sees the alert should be the person with the context and the permissions to fix it.
Showback Before Chargeback
Showback
- Teams see their spend in dashboards
- No budget consequence for overspend
- Builds cost literacy before accountability
- Safe to deploy before tagging coverage is complete
Chargeback
- Spend deducted from team budgets
- Requires 85%+ tagging coverage to work fairly
- Effective only when teams have agency to act on it
- Counterproductive before allocation data is trustworthy
The sequencing of showback versus chargeback matters more than most FinOps implementations acknowledge. Get it wrong and you either create a metric people ignore — showback with no organizational follow-through — or a political battle that poisons the culture: chargeback imposed before teams have the data and tools to act on it.
The decision gate for moving from showback to chargeback has three conditions: tagging coverage above approximately 85%; per-service cost dashboards that are actually reviewed by the owning team; and teams that have the organizational permission and technical access to resize instances, adjust reservations, and clean up orphaned resources. Chargeback before agency is blame by accounting — it creates resentment without producing action.
Regardless of which model you run, the leading metric is tagging coverage rate, tracked weekly. An organization with 95% coverage and showback is in a substantially better position than one with 60% coverage and a chargeback model, because untagged spend cannot be attributed, owned, or improved. Fix the tagging first; debate the accountability model second.
When you do make the transition to chargeback, run a shadow period first: operate the financial allocation for one quarter without deducting from team budgets. Use that quarter to surface attribution errors, contested allocations, and tagging gaps. Fix them before the dollars carry real consequences. Moving directly from no allocation to enforced chargeback is one of the most reliable ways to make engineers hostile to cost culture permanently.
The Culture Layer: Blameless Means Attributable
- 01
Allocate
Tag every resource: team, service, environment, cost-center. Target 90% coverage. Nothing downstream is reliable without this baseline.
- 02
Observe
Surface cost dashboards alongside SLO panels. Add cost-per-request to Grafana using OpenCost. Make spend visible to the people who create it.
- 03
Target
Set anomaly alerts scoped to service tags. Define what a cost regression looks like before one happens, not after the bill arrives.
- 04
Commit
Baseline steady-state capacity, purchase reservations or savings plans. Run a quarterly coverage review to catch drift as workloads evolve.
- 05
Rightsize
Apply VPA recommendations for Kubernetes workloads. Terminate idle VMs and orphaned volumes. This is continuous toil, not a one-time cleanup.
- 06
Automate
Add Infracost to CI. Schedule non-prod environments. Automate rightsizing pipelines. This is the Walk-to-Run transition.
Source: FinOps Foundation Crawl / Walk / Run model
The tooling is genuinely the easy 20%. The other 80% is making cost a normal, blameless part of engineering conversations — and that requires deliberate process changes, not just dashboard access.
Design reviews with cost as a first-class question. The right time to ask "what does this cost to run at 10x current load?" is in the design review, not six months after the bill arrives. For most services this is a back-of-envelope calculation: estimated request volume, infrastructure shape, a few minutes with a cost estimator. If the answer is "we have no idea," that is a gap to close before the PR merges. The estimate does not need to be precise — an order-of-magnitude answer is enough to catch the worst decisions before they are deployed at scale.
Cost regressions belong in retros. When a service's cost-per-request increases by more than 20% week-over-week without a corresponding increase in traffic, that belongs in the next retro alongside reliability incidents. Not as blame — as "what changed, what did we learn, how do we detect this faster next time?" This is the same post-mortem culture applied to a different signal. The team that treats a $15k/month cost spike with the same rigor as a 1-hour partial outage will close both problems faster than the team that only treats one as serious.
Blameless means attributable, not invisible. A blameless cost culture does not mean hiding who built what. It means treating cost regressions as system failures rather than individual failures. The five-whys process that applies to production incidents applies equally here: a cost spike is a signal that something changed unexpectedly, and the interesting question is what system property failed to surface it earlier — missing anomaly alert, no cost panel on the dashboard, resource requests never reviewed after initial deployment.
The FinOps Foundation's Crawl/Walk/Run maturity model is a useful internal benchmark. Crawl means limited tagging, no per-service visibility, all cost management reactive. Walk means consistent allocation above 80%, regular per-service review, and committed coverage above 50%. Run means proactive: anomaly detection firing in near-real-time, Infracost in every infrastructure PR, VPA recommendations reviewed and applied on a regular cycle, committed coverage above 70%. Most organizations move from Crawl to Walk in three to six months with consistent effort. Walk to Run takes another six to twelve. Neither transition happens from a single initiative — it requires cost to be a standing agenda item alongside reliability and velocity, not a project that gets closed out when the savings target is hit.