Cloud repatriation: when leaving the cloud actually pays

The cloud repatriation debate generates far more heat than light. Every few months a team publishes exit numbers, and two camps form: cloud-is-always-too-expensive and cloud-is-the-only-sane-choice. Both camps are making the same mistake — treating a workload-level cost-structure question as an identity question. The correct frame is narrower and more productive: does this specific workload, at its current scale, have the demand profile that rented infrastructure is designed to serve? If the answer is clearly no, repatriation deserves a serious engineering analysis, not a religious objection. If the answer is yes, the cloud earns its premium and you should invest in making it cheaper rather than escaping it.

The 37signals numbers, three years on

The most-cited cloud-exit story has now accumulated a meaningful track record. 37signals — the company behind Basecamp and HEY email — began their cloud exit in late 2022 with an initial hardware order of approximately $600,000 in Dell servers. By the end of 2023, that hardware had paid for itself entirely. By 2024, they reported their annual cloud bill had dropped from the original $3.2M run-rate to $1.3M — a saving of just under $2M in a single year. Their five-year projection, per DHH's own public accounting, sits above $10M in cumulative savings.

37signals annual infrastructure run-rate — cloud vs owned hardware

Public cloud (original run-rate)$3.2M/yr

2024 blended (cloud + owned)$1.3M/yr

Source: DHH / 37signals, public blog posts, 2023–2025

The hardware they built: 20 Dell servers across two data centres, totalling 4,000 vCPUs, 7,680 GB of RAM, and 384 TB of NVMe storage. They have also announced plans to replace AWS S3 with a dual-datacenter Pure Storage configuration — roughly 18 PB of capacity — at a hardware cost roughly equivalent to a single year of their AWS S3 bill.

These numbers are real, public, and reproducible. They are also workload-specific. 37signals runs software products with large, mostly predictable load on teams that have operated their own servers for nearly two decades. The same move by a team without that profile does not produce the same numbers.

Cloud waste is structural, not accidental

Before modelling rent-vs-own, it is worth understanding why cloud bills run high even before you consider workload fit. The Flexera 2025 State of the Cloud Report found that 27% of cloud spend was wasted — unused reservations, idle instances, over-provisioned resources, orphaned storage. The 2026 edition put that figure at 29%, the first year-on-year increase in five years, attributed largely to rapidly expanding AI workloads that teams are still learning to right-size. Separately, 84% of organisations in Flexera's 2025 survey reported struggling to manage cloud spend, and actual cloud budgets were exceeding planned limits by an average of 17%.

Estimated share of cloud budget wasted

Total cloud budget100%

Wasted spend (unused/over-provisioned)~29%

Source: Flexera State of the Cloud 2026

The over-provisioning problem is particularly stubborn in containerised environments. Cast.ai's 2024 Kubernetes benchmark found average CPU utilisation running at approximately 13% of provisioned capacity — roughly 87 cents of every dollar spent on compute buying headroom rather than work. Some of that headroom is legitimate (burst capacity, safety margin for latency SLOs), but most is inertia: engineers size for worst-case, clusters do not auto-scale down aggressively, and no one revisits the numbers between planning cycles.

Building the real TCO model

Most rent-vs-own analyses fail because they compare the cloud bill against a hardware quote. A real model needs to account for at least six cost dimensions over a three-to-five year horizon.

| Cost dimension | Cloud | Owned hardware | |---|---|---| | Compute | Pay-per-use, elastic, no capex | One-time capex, linear depreciation | | Storage | Per-GB/month + IOPS + API calls | Capex + maintenance; near-zero marginal cost | | Egress | Per-GB billed ($0.08–$0.09/GB on AWS) | Near-zero within colo; peering costs vary | | People (ops, on-call, SRE) | Lower — vendor manages hypervisor layer | Higher — hardware failure, capacity planning | | Facilities (power, colo, network) | Included in cloud pricing | Separate line item; $50–$200/kW/month typical | | DR and redundancy | Simple (multi-AZ), expensive | Requires second site; adds meaningful capex |

Worked example — storage-heavy analytics workload:

Assume a data analytics platform storing 500 TB, reading approximately 20 TB per day out to application servers, with steady-state compute of around 200 cores and 2 TB RAM. Volume is flat year-over-year.

Cloud cost estimate (AWS, reserved 1-year pricing):

S3 storage: 500 TB x $0.023/GB/month = approx. $11,750/month
Egress: 20 TB/day x 30 days x $0.085/GB = approx. $51,000/month
EC2 compute (approximately 8 x r6i.4xlarge, reserved): approx. $8,000/month
Subtotal after typical commitment discounts: approx. $71,000/month, or approx. $850,000/year

Owned hardware estimate (three-year horizon, US colo):

Storage servers (3 x 200 TB NVMe): approx. $180,000 one-time capex
Compute cluster (4 x 48-core, 512 GB RAM servers): approx. $90,000 capex
Colo hosting (2 racks, power, cross-connect): approx. $4,500/month
Hardware maintenance contract: approx. $18,000/year
Additional 0.5 FTE SRE time (partial allocation, $200K fully-loaded): approx. $100,000/year
Year 1 total: $270,000 capex + $172,000 opex = approx. $442,000
Years 2 and 3: approx. $172,000/year opex only

Three-year totals:

Cloud: $850,000 x 3 = $2,550,000
Owned: $270,000 + ($172,000 x 3) = $786,000

That is roughly $1.76M in savings over three years, driven almost entirely by eliminating egress fees and right-sizing storage economics. Note the assumption: flat, predictable load. If volume were expected to triple over three years, the capex would need to triple accordingly; the cloud option scales gracefully and the economics reverse.

Does your workload qualify?

The worked example above fits a specific profile. Before running numbers, check whether your workload shares it.

stay

Cloud-native fit — stay

Demand is spiky, seasonal, or genuinely unpredictable
Product is pre-PMF or growing at double-digit monthly rates
Engineering team under roughly 15 people
Multi-region presence needed without existing colo relationships
Managed services (RDS, BigQuery, SageMaker, Kafka) deliver real leverage

consider

Repatriation candidate — model it

Steady, predictable baseline load; flat growth curve
Storage- or egress-heavy workload (data platforms, media delivery, ML training)
Team already operates or has operated infrastructure
Three-year TCO shows at least 40% savings after people and facilities costs
Compliance or data-sovereignty constraints that dedicated colo handles cleanly

A workload-level decision — not a company-wide religionSource: ClimsTech Engineering analysis

The practical threshold: a three-year owned cost under 60% of the three-year cloud cost, after accounting for the realistic operational burden. At margins thinner than that, the operational risk and execution cost rarely justify the move.

Architecture is sometimes the bigger lever

The 37signals story is about rented vs. owned infrastructure. A 2023 story from Amazon's own Prime Video team makes a different but equally important point: sometimes the cost problem is architectural, not infrastructural.

Prime Video's audio/video monitoring service was originally built as a distributed step-function pipeline — individual Lambda functions processing video frames and passing state through S3 as intermediate storage. At production scale, the cost of moving large media payloads between isolated functions became prohibitive. The team refactored the service into a monolithic process that handled the entire pipeline in-process. Their reported result: infrastructure costs fell by over 90%, while scaling characteristics actually improved — and the service stayed entirely in AWS.

The lesson is not that monoliths beat microservices. It is that the cost structure of your architecture matters as much as the cost structure of your infrastructure. A decomposition that looks sensible at the function-call level can generate massive costs at the data-movement level. If your cloud bill is dominated by data-transfer charges between internal services, the architecture deserves examination before the cloud contract does.

Executing the migration if you decide to proceed

If the TCO model clears your threshold and you have the operational capability, the following sequence avoids the common failure modes.

Cloud repatriation execution sequence

01
Audit and baseline
Export three months of cloud billing data at the resource level. Tag every instance, bucket, and data-transfer line item to a specific service. The top 10 cost drivers almost always tell you whether repatriation will move the needle.
02
Right-size before you spec hardware
Run the workload in the cloud at its actual steady-state consumption for four to eight weeks with monitoring. Deploy VPA in recommendation mode for Kubernetes workloads. The hardware spec must be based on measured utilisation, not the over-provisioned cloud allocation.
03
Spec hardware to measured load
Size at 1.5x peak measured load (not cloud allocation). Add 20% headroom for the hardware's depreciation window. Buy for three years, not five — the server market moves, and refreshing sooner is cheaper than over-buying now.
04
Build and validate in parallel
Run the owned environment alongside the cloud environment under real production traffic for at least four weeks. Validate throughput, latency percentiles, backup and restore procedures, and hardware failure handling before a single production byte is cut over.
05
Migrate with rollback ready
Route a small percentage of traffic to the owned environment first. Automated rollback to the cloud environment must be defined and tested before any traffic moves. Keep the cloud environment warm and ready to accept full traffic for at least 60 days post-migration.
06
Wind down cloud commitments deliberately
Reserved instances and Savings Plans have termination rules. Egress costs spike during the migration overlap period. Plan the cloud wind-down schedule as carefully as the hardware ramp-up — it is easy to pay double for six months if this is treated as an afterthought.

Source: ClimsTech Engineering

The parallel-run step is where most teams compress time and pay for it. Four weeks is a minimum for stateless services; eight weeks is the right budget for anything processing customer data or running with stateful storage.

Right-sizing tooling patterns

Measuring actual steady-state utilisation before specifying hardware:

# Export per-hour CPU utilisation for a specific EC2 instance over 90 days
# Run for every instance in the target workload; pipe output to a file for analysis
 
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc1234567890def \
  --start-time $(date -u -v-90d +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 3600 \
  --statistics Average Maximum \
  --output json | jq '[.Datapoints[] | {ts: .Timestamp, avg: .Average, max: .Maximum}] | sort_by(.ts)'

For Kubernetes workloads, Vertical Pod Autoscaler in recommendation mode gives you the equivalent signal without touching the running configuration:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: workload-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: your-workload
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: "100m"
        memory: 128Mi
      maxAllowed:
        cpu: "8"
        memory: 16Gi

After two weeks of running with updateMode: "Off", the status.recommendation section reports lowerBound, target, and upperBound CPU and memory estimates for each container. The target value (p50 recommendation) and upperBound (p95) are the inputs to your hardware sizing — not the resources.requests values that were set at cluster creation and never revisited.

For egress specifically, use AWS Cost Explorer grouped by Usage Type with the filter DataTransfer-Out to see exactly where your outbound transfer dollars are going before you commit to a colo transit arrangement.

Pitfalls with their fixes

Repatriation projects fail in predictable ways. These are the ones most worth knowing before you start.

Egress cost during migration. When routing production traffic to the new environment while keeping the cloud warm, you pay cloud egress and colo transit simultaneously. On a data-intensive workload this can add $50,000–$150,000 to the migration cost. Fix: account for the overlap period explicitly in the TCO model. Typically three to six months of double-paying; factor it into your break-even calculation.

Forgetting the managed-service dependencies. Compute and storage are the visible cost items, but many systems are wired to managed services that don't appear on the EC2 bill: RDS, Aurora, ElastiCache, SQS, Cognito, CloudFront, WAF. An exit from EC2 does not move you off these. Audit every AWS service your system touches before committing to a migration scope.

Under-sizing the ops team. A colo rack needs someone who responds at 2 AM when a drive array fails. If that person does not currently exist on your team, their fully-loaded cost needs to be in the model from the start. A 0.5 FTE allocation at $200K fully-loaded is $100K per year — material on a migration that pencils out at $300K per year in savings.

No hardware refresh plan. Servers depreciate. A three-year depreciation schedule is standard; the capital budget for the next hardware generation should be modelled from day one. Teams that treat the initial capex as a one-time event discover the refresh cost at the worst possible time, usually when the original hardware is failing and the savings case is being reassessed.

Comparing owned cost against cloud list price. If you have Reserved Instances, Savings Plans, Enterprise Discount Programs, or any outstanding credits, your effective cloud rate is materially lower than list price. Model your actual blended rate — export the last three months of effective unit rates from Cost Explorer. Teams that compare hardware quotes against cloud list prices systematically overstate the savings case.

Treating colo as free operations. Colocation eliminates facilities capex but not operational burden. Power monitoring, network redundancy, physical security, and the logistical overhead of shipping replacement hardware to a remote data centre are all real costs. Budget time, not just dollars.

The hybrid outcome most organisations reach

Companies that have approached this carefully — 37signals, Stack Overflow, Wikimedia Foundation — have landed in roughly the same place: cloud for the unpredictable, owned or leased for the stable baseline. This is not a compromise. It is the technically correct answer to the fact that workloads within the same organisation have different demand profiles.

A practical hybrid pattern:

Cloud for elasticity-justified workloads. CI/CD pipelines, staging environments, preview deployments, and burst capacity are inherently spiky. Elasticity earns its premium there; fighting it is waste.
Cloud for managed services that would require significant investment to self-operate. Managed Kubernetes control planes, managed databases for lower-volume services, global CDN, and DDoS mitigation are worth buying, not building.
Owned or leased hardware for steady high-volume baseline. Data platforms with flat growth, ML training pipelines with predictable weekly cadence, and bulk object storage where egress economics are punishing.
Bare-metal cloud as a middle path. Providers such as Hetzner, OVH, and Equinix Metal offer physical or near-physical server economics — significantly better instance cost than public cloud — without requiring you to own and maintain the physical layer. This is often the right answer for teams that lack an established colo relationship.

Key data points for the repatriation decision

~$2M

Annual saving

37signals, 2024

$10M+

5-yr projection

37signals

29%

Cloud spend wasted

Flexera 2026

3–5 yr

TCO horizon to model

standard practice

Source: 37signals public reporting (2024); Flexera State of the Cloud 2026

The nuance worth preserving: none of these case studies are an argument against the cloud. They are an argument against assuming the cloud is the correct cost structure for every workload regardless of its demand profile. The cloud is an excellent solution for variable, unpredictable, fast-changing demand. It is an expensive solution for large, stable, predictable baseline load — because you are paying the elasticity premium for elasticity you are not using. Identifying which of your workloads falls into which bucket, and then modelling both options honestly, is not heresy. It is standard FinOps practice, and it is overdue at most organisations.

What to remember

37signals' exit now projects over $10M in five-year savings — driven by stable high-volume load, deep operational experience, and a storage bill dominated by egress. Replicate the conditions, not just the move.
Flexera puts cloud waste at 29% in 2026 — right-size in the cloud first. A leaner cloud bill changes the rent-vs-own maths before any hardware is purchased.
Build the TCO over three to five years, including egress during migration, facilities, hardware refresh capex, and the fully-loaded cost of the operational burden — not just the hardware quote against the cloud bill.
Architecture is sometimes the bigger lever: the Prime Video monolith refactor cut infrastructure costs by over 90% without leaving AWS. If data-transfer charges dominate, examine the architecture before the cloud contract.
Migrate in parallel, keep rollback defined and tested, maintain the cloud environment warm for at least 60 days post-cutover, and model the double-paying overlap period explicitly.
The mature outcome is almost always hybrid: cloud for elastic and managed-service workloads, owned or leased for stable predictable baseline. Neither extreme is the answer.