Cloud migration ROI: why optimisation debt compounds — and how to break the cycle

Lift-and-shift migration is often the right short-term call. The risk profile is low, the timeline is predictable, and it avoids re-architecting under deadline pressure. The problem is not the decision itself — it is what gets assumed away at the same time. When you move workloads as-is, you move their waste too: oversized instances provisioned for peak capacity, always-on non-production environments, on-demand pricing for stable workloads that could be on reserved capacity, and manual operations processes that consume engineering hours every week. None of that disappears by crossing the cloud boundary. It just converts from a sunk capital cost into a recurring line item, billed by the hour.

This is what we call optimisation debt: the gap between what you are paying and what you could be paying if the estate were sized and operated correctly. It is not a planning failure — it is the predictable outcome of migrating under deadline pressure and then moving on to the next project. The teams that come out ahead treat optimisation as a funded first-class workstream rather than a someday backlog item. This post explains why that distinction matters, what the evidence shows, and what the specific actions are that break the cycle.

Why the first cloud bill looks like your data centre

On-premises infrastructure is procured in lumpy increments. You buy servers for peak capacity plus a buffer, the cost is largely fixed, and idle CPU is essentially free because the hardware is already paid for. When you migrate without changing how you size or operate workloads, you reproduce that same procurement logic in an environment where billing is granular. Every idle CPU-hour now appears on an invoice.

The structural reasons the first bill is high are predictable without needing to inspect any particular estate:

Oversized instances. On-prem baselines are often measured at peak, or at the P99 traffic event that justified the capacity purchase. If you size a VM to P99 and migrate it as-is, you run that capacity continuously — at 3am on a Tuesday, during maintenance windows, and through every period of normal traffic.

Always-on non-production environments. Development, staging, and QA environments that shared physical hardware now each have dedicated instances. Without a shutdown policy, they run 24 hours a day, seven days a week, at roughly 5% utilisation.

On-demand pricing for stable workloads. On-demand compute is expensive for workloads that run around the clock. AWS Compute Savings Plans, Azure Reserved VM Instances, and GCP Committed Use Discounts reduce compute costs by 30–60% for stable workloads — but taking the commitment requires knowing your post-migration baseline, which takes 30–60 days of data to establish. Until then, you pay on-demand rates.

Manual operations carried over. If deployments still require human intervention and runbooks still describe manual steps, the operational overhead that justified the cloud move has not been reduced. It has been relocated.

Egress costs absent from the estimate. Data transfer between cloud regions, from cloud to on-premises during any hybrid transition period, and from cloud to end users over the public internet are billed separately. They routinely appear as a surprise in the first bill if data flows were not mapped during the assessment phase.

The scale of the problem: what the data shows

These are not edge cases that affect poorly managed estates. Flexera's 2025 State of the Cloud report — which surveyed over 700 cloud decision-makers and practitioners — found that organisations estimated 27% of their cloud spend was wasted. That figure held at 27–32% consistently from 2019 through 2025. The 2026 report saw it tick up to 29%, with the increase attributed to new AI workloads being provisioned before utilisation patterns are understood — a pattern that mirrors exactly what happens immediately after a migration.

Separately, Cast.ai's 2025 Kubernetes Cost Benchmark — based on telemetry from over 2,100 organisations running workloads across AWS, GCP, and Azure throughout 2024 — found that applications used an average of 10% of their provisioned CPU. Memory utilisation averaged 23%. Average CPU utilisation actually declined from 13% the previous year as teams provisioned increasingly generous defaults.

Cloud waste and utilisation benchmarks across the industry

29%

Cloud spend wasted

industry median, Flexera 2026

10%

Avg CPU utilisation

Kubernetes clusters, Cast.ai 2025

23%

Avg memory utilisation

Kubernetes clusters, Cast.ai 2025

84%

Orgs struggle to manage spend

Flexera 2025

Source: Flexera State of the Cloud 2026; Cast.ai Kubernetes Cost Benchmark 2025

Provisioned versus actual resource utilisation across Kubernetes clusters

CPU provisioned100%

CPU actually used~10%

Memory provisioned100%

Memory actually used~23%

Source: Cast.ai Kubernetes Cost Benchmark 2025 — 2,100+ organisations, Jan–Dec 2024

Ten percent CPU utilisation across thousands of production clusters is not a measurement anomaly. It is the direct consequence of provisioning for peak, not automating scaling, and not revisiting resource requests after the initial deployment.

A realistic four-phase migration timeline

The most common planning mistake is scoping the project as two phases — assess and migrate — and treating optimisation as something the operations team will pick up afterward. In practice, all four phases need a budget simultaneously, because the work in each one creates the preconditions for the next.

All four phases need dedicated budget — optimisation cannot live in the backlog

01
Assess
Inventory workloads, map dependencies and data flows, and establish a real cost baseline — not a forecast built from sticker prices. Profile CPU, memory, and I/O at P50 and P95, not just at peak. Identify candidates for managed services versus rehost. Map egress flows before they become a billing surprise.
02
Migrate
Move workloads using the lowest-risk strategy that still fits the target architecture. Run a parallel cut-over period with traffic splitting on stateful or business-critical workloads. Instrument observability before you move — not after. Confirm the 30-day baseline clock starts on day one of production traffic.
03
Optimise
Rightsize based on 30–60 days of post-migration telemetry. Purchase reserved capacity once the baseline is known. Configure autoscaling on variable-traffic workloads. Schedule non-production environments to shut down outside business hours. This is where the savings live, and it must be a funded sprint, not a backlog item.
04
Operate
Install ongoing cost governance: a tagging policy enforced in CI/CD, cost ownership by team with a shared dashboard, anomaly alerting wired to a Slack channel that someone reads, and a monthly rightsizing review. Without this, spend drifts upward as engineers add resources and never revisit them.

Source: ClimsTech Engineering practice

Refactoring in flight: the highest-ROI changes

You do not need to re-architect everything during a migration. But a small number of deliberate changes made in flight — rather than promised for later — have a disproportionate impact on the long-term bill and on the cost of future changes.

Containerise the workloads with variable traffic

Services with uneven traffic are the clearest case for containerisation during migration rather than after. A VM sized to peak runs that peak capacity continuously. A container in an autoscaling group runs at median and scales out to peak in under a minute. The billing logic is fundamentally different.

The threshold question is straightforward: does this workload have meaningful traffic variance across the day or week? If yes, a container on Kubernetes, ECS, or a comparable managed compute surface is a better migration target than a VM. If it is a stable batch job or a monolith with flat traffic, a VM or a managed PaaS service may be the right call. The decision should be made per workload, not applied as a blanket policy.

Move stateful data onto managed services

Running databases on VMs carries hidden costs: storage operations, patching cycles, backup management, point-in-time recovery setup, and the engineering hours spent responding to incidents at 2am. Managed services — RDS, Cloud SQL, DynamoDB, Azure Database for PostgreSQL — price in those operational costs but remove the toil. The total cost comparison is often closer than it appears; the operational load reduction is real and measurable.

The migration path matters more than the destination. Use logical replication to move data live wherever possible. AWS Database Migration Service, GCP Database Migration Service, and Azure Database Migration Service all support continuous replication for common engines, which eliminates the maintenance-window risk on business-critical data.

Infrastructure as code from day one

An environment that exists as clicked-together console state cannot be easily reproduced, audited, resized, or cleaned up. An environment in Terraform or Pulumi can be spun up in minutes, deleted when the test completes, reviewed in a pull request, and parameterised so non-production environments get smaller defaults automatically.

# Every environment is reproducible from a single apply.
# Non-production gets smaller defaults and a tighter replica count.
module "service" {
  source       = "./modules/gke-service"
  name         = "checkout"
  environment  = var.environment
 
  min_replicas = var.environment == "production" ? 3 : 1
  max_replicas = var.environment == "production" ? 40 : 5
 
  cpu_request  = "250m"
  cpu_limit    = "1000m"
  mem_request  = "256Mi"
  mem_limit    = "512Mi"
}

Wire autoscaling before the migration lands

Autoscaling is not a post-migration optimisation task. If you configure it during the migration — sizing baseline instances for median load and letting the scaler handle peak — you avoid ever running at an oversized baseline. The Cast.ai 2026 Kubernetes Optimization Report is instructive here: automated rightsizing cut provisioned CPUs by half across the clusters studied, and OOM kill rates dropped rather than rose. The reliability risk of rightsizing with proper telemetry is lower than the cost of not doing it.

as migrated

Lift-and-shift

VM sized to P99 traffic peak
Non-production runs 24/7 at 5% utilisation
Manual deployments, no rollback automation
On-demand pricing from day one
First bill mirrors the on-premises spend

target state

Refactor in flight

Container or managed PaaS sized to P50 with autoscaling to peak
Non-production scheduled off outside business hours from week one
IaC-driven deployment with automated rollback on health check failure
Reserved capacity committed after 60-day baseline period
Bill drops at 60–90 days without a dedicated post-migration sprint

The structural difference between the two migration approachesSource: ClimsTech Engineering practice

TCO: the five levers that actually move the number

Total cost of ownership comparisons between cloud and on-premises frequently mislead because they compare sticker price per instance-hour against amortised server cost per CPU-hour, and neither figure captures the full picture. A more useful model tracks five levers independently and asks whether the migration moves each one in the right direction.

| Lever | On-prem baseline | Lift-and-shift cloud | Optimised cloud | |---|---|---|---| | Compute utilisation | 30–40% typical | 10–15% (no autoscaling) | 50–70% (autoscaling + rightsized) | | Capacity pricing | Fixed CAPEX | On-demand (2–3x reserved rate) | Savings Plan or CUD (30–60% off on-demand) | | Storage cost | Flat provisioned CAPEX | Flat provisioned, billed per GB | Tiered lifecycle (hot / warm / cold) | | Egress cost | Internal network only | Variable, often unmodelled | Architecturally minimised | | Operational hours | High — patching, runbooks, on-call | Carried over unchanged | Reduced via managed services and IaC |

The columns compound. Optimised cloud wins not because any single lever is decisive, but because all five move simultaneously.

Worked example: rightsizing a 20-node application tier.

Suppose a team migrates 20 application servers, each running at 12% average CPU utilisation — consistent with Cast.ai's benchmark for manually managed clusters. The rehost target is 20 x m5.2xlarge (8 vCPU, 32 GB) at on-demand pricing. At approximately $0.384 per instance-hour in us-east-1, that is $7.68/hr — roughly $67,200 per year on compute alone.

After a 60-day baseline period, the picture becomes clear:

Median load requires roughly 1 vCPU per node. A rightsize to m5.large (2 vCPU, 8 GB) with an autoscaling group targeting 50–70% CPU covers P50 at 6 nodes and scales to 20 for peak events.
A 1-year Compute Savings Plan on the 6-node baseline reduces that committed spend by approximately 17–30% depending on payment option (no-upfront versus full-upfront).
The 4 non-production nodes in the original 20 run roughly 65% of the week outside business hours. Scheduled scaling to zero during those periods reduces their effective cost to about 35% of the continuous-on figure.

Conservative annualised estimate after these changes: baseline compute reduces by 60–75% from the lift-and-shift figure. The exact number depends on traffic shape, the Savings Plan discount at the time of commitment, and how aggressively the autoscaler is tuned — but the structural direction is consistent across every instance type and region. The savings are not incremental. They are categorical.

Real-world pitfalls and their fixes

Pitfall 1 — Egress costs land as a surprise

Cloud providers charge for data leaving their network. Migrations with a hybrid transition period — traffic flowing between on-premises systems and cloud — can generate egress bills that exceed the compute cost in the first month. The fix is to map your data flows during the assessment phase, not after the invoice arrives. AWS VPC Flow Logs, GCP Network Topology, and Azure Network Watcher surface inter-service and external data flows before you commit to a migration architecture. If a workload sends large volumes of data to on-premises systems or to third-party services in other regions, that flow needs to be costed explicitly in the migration estimate.

Pitfall 2 — Rightsizing without a telemetry baseline

Reducing instance sizes without validating against real production traffic patterns causes latency spikes and out-of-memory kills. The Cast.ai data showing improved reliability after rightsizing is predicated on automated rightsizing with telemetry-driven CPU and memory profiles — not on guessing. The fix is instrument first, rightsize second. Establish a 30-day baseline of P50, P95, and P99 resource utilisation before touching instance sizes.

# Capture a 30-day resource utilisation baseline per pod before
# any rightsizing changes. Sort by CPU descending to find the heaviest consumers.
kubectl top pods -n production --sort-by=cpu --no-headers \
  >> /var/log/k8s-resource-baseline-$(date +%Y%m%d).log

Prometheus with kube-state-metrics and the Kubernetes metrics server gives you the full picture: record_rule aggregations over 30 days for P50 and P95 CPU and memory by deployment, then size requests to P95 with 20% headroom.

Pitfall 3 — Non-production environments running continuously

The fastest recoverable win in a freshly migrated estate is non-production environments that no one turned off. Development, staging, and QA environments frequently represent 20–30% of compute spend and run at 5% utilisation overnight and on weekends. The fix is automated scheduling at the infrastructure layer, not a policy document:

# Scale the staging Auto Scaling group to zero outside business hours.
# Scale back up at the start of the working day.
resource "aws_autoscaling_schedule" "staging_off" {
  scheduled_action_name  = "staging-scale-down"
  min_size               = 0
  max_size               = 0
  desired_capacity       = 0
  recurrence             = "0 20 * * 1-5"
  time_zone              = "Europe/London"
  autoscaling_group_name = aws_autoscaling_group.staging.name
}
 
resource "aws_autoscaling_schedule" "staging_on" {
  scheduled_action_name  = "staging-scale-up"
  min_size               = 1
  max_size               = 5
  desired_capacity       = 2
  recurrence             = "0 8 * * 1-5"
  time_zone              = "Europe/London"
  autoscaling_group_name = aws_autoscaling_group.staging.name
}

This pattern works identically for GKE node pools via the cluster autoscaler minimum node count, for ECS services via Application Auto Scaling scheduled actions, and for RDS instances via the start/stop schedule API.

Pitfall 4 — Missing the reserved capacity window

Most teams know they should buy reserved capacity and delay the commitment because the workload might change. The risk is real — but the cost of waiting is guaranteed. Every month on on-demand rates for a stable baseline is a month of paying 2–3x the committed rate. The practical fix: after 60 days of post-migration telemetry, commit to reserved pricing for your P10 baseline — the compute level you run at 90% of the time. Use Compute Savings Plans (AWS) or Committed Use Discounts (GCP) rather than specific instance reservations: they apply across instance families and regions and tolerate architecture changes far better than instance-specific reservations.

Pitfall 5 — Tag debt accumulates from day one

Without a consistent resource tagging policy enforced at creation time, cost analysis is impossible below the account level. You cannot assign spend to teams, services, or environments. You cannot set meaningful budgets. You cannot identify orphaned resources. The fix is to enforce tags at the infrastructure layer — not as a post-migration cleanup exercise, which consistently ends up around 60% complete and drifts further over time.

# Merge required tags into every resource via a local.
# Any resource missing a required tag fails the plan at review time.
locals {
  required_tags = {
    environment  = var.environment
    team         = var.team
    service      = var.service
    cost_centre  = var.cost_centre
  }
}
 
resource "aws_instance" "app" {
  ami           = data.aws_ami.app.id
  instance_type = var.instance_type
 
  tags = merge(local.required_tags, {
    Name = "${var.service}-${var.environment}"
  })
 
  lifecycle {
    precondition {
      condition     = length(var.team) > 0 && length(var.cost_centre) > 0
      error_message = "team and cost_centre must be non-empty strings. Required for spend attribution."
    }
  }
}

Pair this with a CI check that validates the required tag variables are set before terraform plan runs in any environment.

The ongoing discipline: FinOps as a team practice

Getting to an optimised cloud estate is a project. Staying there is a discipline. Cloud spend has a natural upward drift: new services are provisioned at generous defaults, old services are never decommissioned, storage grows without a lifecycle policy, and AI experimentation workloads are provisioned without sizing guardrails. The Flexera 2026 data capturing the first increase in waste percentage in five years points directly at this pattern: new workload categories get added before the team has developed intuition for their cost profile.

The minimum viable FinOps practice for a mid-sized engineering organisation looks like this:

Tag policy enforced in CI/CD. Every resource merge requires the required tag set. This is the foundation of everything else — without it, no subsequent analysis has workload-level granularity.

Cost ownership by team. Each team sees their own spend in a shared dashboard, updated daily. Grafana with a cloud billing data source, AWS Cost Explorer embedded in a team portal, or a dedicated FinOps tool like CloudHealth or Apptio Cloudability all work. When a team can see that their staging environment costs more than their production environment, they fix it without being asked.

Anomaly alerting wired to a channel someone reads. AWS Cost Anomaly Detection, GCP Budget Alerts, and Azure Cost Management all support threshold-based and ML-driven anomaly alerting. A single unexpected spike that triggers a Slack notification and gets investigated the same day costs a fraction of one discovered in the monthly finance review.

Monthly rightsizing review. Thirty minutes per team, once per month. Look at the top 10 resources by spend and compare their utilisation against the sizing. This routine consistently surfaces resources that were oversized at launch and never revisited.

Reserved capacity renewal review. One-year Savings Plans and Committed Use Discounts expire. Put a calendar reminder 60 days before each commitment renewal to revalidate the baseline against current usage data before auto-renewing at the previous commitment level.

Clusters running at 10% CPU utilisation are not a performance problem — they are a procurement policy that was never updated for elastic compute.

— Cast.ai Kubernetes Cost Benchmark 2025

The underlying point is this: optimisation debt is not a sign of bad engineering during the migration. It is the predictable outcome of migrating under deadline pressure and then moving the team to the next project. The difference between organisations that pay it down within six months and those that carry it for three years is almost never technical capability. It is whether they treated optimisation as a deliverable with a sprint and a budget, or as a promise made to finance during the business case.

What to remember

Budget the optimisation sprint before the migration starts — not as a backlog item. It will compete with feature deadlines and lose every quarter unless it has its own allocation.
Baseline CPU and memory at P50, P95, and P99 for 30–60 days post-migration before rightsizing anything. Rightsize without telemetry and you either overshoot (reliability impact) or undershoot (no savings).
Commit to reserved capacity (Savings Plans, CUDs) at the 60-day mark on your P10 baseline. The cost of waiting on on-demand rates is guaranteed; the risk of a workload changing is manageable.
Enforce resource tags in CI/CD at the infrastructure layer from day one. Post-migration tag cleanup consistently ends up around 60% complete and drifts further without structural enforcement.
Containerise and autoscale the workloads with variable traffic during the migration, not after. Automated rightsizing with telemetry improves reliability — the Cast.ai 2025 data is clear on this.
Schedule non-production environments off outside business hours at the infrastructure layer from week one. They frequently represent 20–30% of compute spend and run at 5% utilisation overnight.
Install the minimum viable FinOps discipline: tag enforcement in CI/CD, cost ownership by team, anomaly alerting on the same day, and a monthly 30-minute rightsizing review. Without the cadence, the savings erode.