The cloud repatriation debate generates far more heat than light. Every few months a team publishes exit numbers, and two camps form: cloud-is-always-too-expensive and cloud-is-the-only-sane-choice. Both camps are making the same mistake — treating a workload-level cost-structure question as an identity question. The correct frame is narrower and more productive: does this specific workload, at its current scale, have the demand profile that rented infrastructure is designed to serve? If the answer is clearly no, repatriation deserves a serious engineering analysis, not a religious objection. If the answer is yes, the cloud earns its premium and you should invest in making it cheaper rather than escaping it.
The 37signals numbers, three years on
The most-cited cloud-exit story has now accumulated a meaningful track record. 37signals — the company behind Basecamp and HEY email — began their cloud exit in late 2022 with an initial hardware order of approximately $600,000 in Dell servers. By the end of 2023, that hardware had paid for itself entirely. By 2024, they reported their annual cloud bill had dropped from the original $3.2M run-rate to $1.3M — a saving of just under $2M in a single year. Their five-year projection, per DHH's own public accounting, sits above $10M in cumulative savings.
The hardware they built: 20 Dell servers across two data centres, totalling 4,000 vCPUs, 7,680 GB of RAM, and 384 TB of NVMe storage. They have also announced plans to replace AWS S3 with a dual-datacenter Pure Storage configuration — roughly 18 PB of capacity — at a hardware cost roughly equivalent to a single year of their AWS S3 bill.
These numbers are real, public, and reproducible. They are also workload-specific. 37signals runs software products with large, mostly predictable load on teams that have operated their own servers for nearly two decades. The same move by a team without that profile does not produce the same numbers.
Cloud waste is structural, not accidental
Before modelling rent-vs-own, it is worth understanding why cloud bills run high even before you consider workload fit. The Flexera 2025 State of the Cloud Report found that 27% of cloud spend was wasted — unused reservations, idle instances, over-provisioned resources, orphaned storage. The 2026 edition put that figure at 29%, the first year-on-year increase in five years, attributed largely to rapidly expanding AI workloads that teams are still learning to right-size. Separately, 84% of organisations in Flexera's 2025 survey reported struggling to manage cloud spend, and actual cloud budgets were exceeding planned limits by an average of 17%.
The over-provisioning problem is particularly stubborn in containerised environments. Cast.ai's 2024 Kubernetes benchmark found average CPU utilisation running at approximately 13% of provisioned capacity — roughly 87 cents of every dollar spent on compute buying headroom rather than work. Some of that headroom is legitimate (burst capacity, safety margin for latency SLOs), but most is inertia: engineers size for worst-case, clusters do not auto-scale down aggressively, and no one revisits the numbers between planning cycles.
Building the real TCO model
Most rent-vs-own analyses fail because they compare the cloud bill against a hardware quote. A real model needs to account for at least six cost dimensions over a three-to-five year horizon.
| Cost dimension | Cloud | Owned hardware | |---|---|---| | Compute | Pay-per-use, elastic, no capex | One-time capex, linear depreciation | | Storage | Per-GB/month + IOPS + API calls | Capex + maintenance; near-zero marginal cost | | Egress | Per-GB billed ($0.08–$0.09/GB on AWS) | Near-zero within colo; peering costs vary | | People (ops, on-call, SRE) | Lower — vendor manages hypervisor layer | Higher — hardware failure, capacity planning | | Facilities (power, colo, network) | Included in cloud pricing | Separate line item; $50–$200/kW/month typical | | DR and redundancy | Simple (multi-AZ), expensive | Requires second site; adds meaningful capex |
Worked example — storage-heavy analytics workload:
Assume a data analytics platform storing 500 TB, reading approximately 20 TB per day out to application servers, with steady-state compute of around 200 cores and 2 TB RAM. Volume is flat year-over-year.
Cloud cost estimate (AWS, reserved 1-year pricing):
- S3 storage: 500 TB x $0.023/GB/month = approx. $11,750/month
- Egress: 20 TB/day x 30 days x $0.085/GB = approx. $51,000/month
- EC2 compute (approximately 8 x r6i.4xlarge, reserved): approx. $8,000/month
- Subtotal after typical commitment discounts: approx. $71,000/month, or approx. $850,000/year
Owned hardware estimate (three-year horizon, US colo):
- Storage servers (3 x 200 TB NVMe): approx. $180,000 one-time capex
- Compute cluster (4 x 48-core, 512 GB RAM servers): approx. $90,000 capex
- Colo hosting (2 racks, power, cross-connect): approx. $4,500/month
- Hardware maintenance contract: approx. $18,000/year
- Additional 0.5 FTE SRE time (partial allocation, $200K fully-loaded): approx. $100,000/year
- Year 1 total: $270,000 capex + $172,000 opex = approx. $442,000
- Years 2 and 3: approx. $172,000/year opex only
Three-year totals:
- Cloud: $850,000 x 3 = $2,550,000
- Owned: $270,000 + ($172,000 x 3) = $786,000
That is roughly $1.76M in savings over three years, driven almost entirely by eliminating egress fees and right-sizing storage economics. Note the assumption: flat, predictable load. If volume were expected to triple over three years, the capex would need to triple accordingly; the cloud option scales gracefully and the economics reverse.
Does your workload qualify?
The worked example above fits a specific profile. Before running numbers, check whether your workload shares it.
Cloud-native fit — stay
- Demand is spiky, seasonal, or genuinely unpredictable
- Product is pre-PMF or growing at double-digit monthly rates
- Engineering team under roughly 15 people
- Multi-region presence needed without existing colo relationships
- Managed services (RDS, BigQuery, SageMaker, Kafka) deliver real leverage
Repatriation candidate — model it
- Steady, predictable baseline load; flat growth curve
- Storage- or egress-heavy workload (data platforms, media delivery, ML training)
- Team already operates or has operated infrastructure
- Three-year TCO shows at least 40% savings after people and facilities costs
- Compliance or data-sovereignty constraints that dedicated colo handles cleanly
The practical threshold: a three-year owned cost under 60% of the three-year cloud cost, after accounting for the realistic operational burden. At margins thinner than that, the operational risk and execution cost rarely justify the move.
Architecture is sometimes the bigger lever
The 37signals story is about rented vs. owned infrastructure. A 2023 story from Amazon's own Prime Video team makes a different but equally important point: sometimes the cost problem is architectural, not infrastructural.
Prime Video's audio/video monitoring service was originally built as a distributed step-function pipeline — individual Lambda functions processing video frames and passing state through S3 as intermediate storage. At production scale, the cost of moving large media payloads between isolated functions became prohibitive. The team refactored the service into a monolithic process that handled the entire pipeline in-process. Their reported result: infrastructure costs fell by over 90%, while scaling characteristics actually improved — and the service stayed entirely in AWS.
The lesson is not that monoliths beat microservices. It is that the cost structure of your architecture matters as much as the cost structure of your infrastructure. A decomposition that looks sensible at the function-call level can generate massive costs at the data-movement level. If your cloud bill is dominated by data-transfer charges between internal services, the architecture deserves examination before the cloud contract does.
Executing the migration if you decide to proceed
If the TCO model clears your threshold and you have the operational capability, the following sequence avoids the common failure modes.
- 01
Audit and baseline
Export three months of cloud billing data at the resource level. Tag every instance, bucket, and data-transfer line item to a specific service. The top 10 cost drivers almost always tell you whether repatriation will move the needle.
- 02
Right-size before you spec hardware
Run the workload in the cloud at its actual steady-state consumption for four to eight weeks with monitoring. Deploy VPA in recommendation mode for Kubernetes workloads. The hardware spec must be based on measured utilisation, not the over-provisioned cloud allocation.
- 03
Spec hardware to measured load
Size at 1.5x peak measured load (not cloud allocation). Add 20% headroom for the hardware's depreciation window. Buy for three years, not five — the server market moves, and refreshing sooner is cheaper than over-buying now.
- 04
Build and validate in parallel
Run the owned environment alongside the cloud environment under real production traffic for at least four weeks. Validate throughput, latency percentiles, backup and restore procedures, and hardware failure handling before a single production byte is cut over.
- 05
Migrate with rollback ready
Route a small percentage of traffic to the owned environment first. Automated rollback to the cloud environment must be defined and tested before any traffic moves. Keep the cloud environment warm and ready to accept full traffic for at least 60 days post-migration.
- 06
Wind down cloud commitments deliberately
Reserved instances and Savings Plans have termination rules. Egress costs spike during the migration overlap period. Plan the cloud wind-down schedule as carefully as the hardware ramp-up — it is easy to pay double for six months if this is treated as an afterthought.
Source: ClimsTech Engineering
The parallel-run step is where most teams compress time and pay for it. Four weeks is a minimum for stateless services; eight weeks is the right budget for anything processing customer data or running with stateful storage.
Right-sizing tooling patterns
Measuring actual steady-state utilisation before specifying hardware:
# Export per-hour CPU utilisation for a specific EC2 instance over 90 days
# Run for every instance in the target workload; pipe output to a file for analysis
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0abc1234567890def \
--start-time $(date -u -v-90d +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 \
--statistics Average Maximum \
--output json | jq '[.Datapoints[] | {ts: .Timestamp, avg: .Average, max: .Maximum}] | sort_by(.ts)'For Kubernetes workloads, Vertical Pod Autoscaler in recommendation mode gives you the equivalent signal without touching the running configuration:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: workload-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: your-workload
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: "100m"
memory: 128Mi
maxAllowed:
cpu: "8"
memory: 16GiAfter two weeks of running with updateMode: "Off", the status.recommendation section reports lowerBound, target, and upperBound CPU and memory estimates for each container. The target value (p50 recommendation) and upperBound (p95) are the inputs to your hardware sizing — not the resources.requests values that were set at cluster creation and never revisited.
For egress specifically, use AWS Cost Explorer grouped by Usage Type with the filter DataTransfer-Out to see exactly where your outbound transfer dollars are going before you commit to a colo transit arrangement.
Pitfalls with their fixes
Repatriation projects fail in predictable ways. These are the ones most worth knowing before you start.
Egress cost during migration. When routing production traffic to the new environment while keeping the cloud warm, you pay cloud egress and colo transit simultaneously. On a data-intensive workload this can add $50,000–$150,000 to the migration cost. Fix: account for the overlap period explicitly in the TCO model. Typically three to six months of double-paying; factor it into your break-even calculation.
Forgetting the managed-service dependencies. Compute and storage are the visible cost items, but many systems are wired to managed services that don't appear on the EC2 bill: RDS, Aurora, ElastiCache, SQS, Cognito, CloudFront, WAF. An exit from EC2 does not move you off these. Audit every AWS service your system touches before committing to a migration scope.
Under-sizing the ops team. A colo rack needs someone who responds at 2 AM when a drive array fails. If that person does not currently exist on your team, their fully-loaded cost needs to be in the model from the start. A 0.5 FTE allocation at $200K fully-loaded is $100K per year — material on a migration that pencils out at $300K per year in savings.
No hardware refresh plan. Servers depreciate. A three-year depreciation schedule is standard; the capital budget for the next hardware generation should be modelled from day one. Teams that treat the initial capex as a one-time event discover the refresh cost at the worst possible time, usually when the original hardware is failing and the savings case is being reassessed.
Comparing owned cost against cloud list price. If you have Reserved Instances, Savings Plans, Enterprise Discount Programs, or any outstanding credits, your effective cloud rate is materially lower than list price. Model your actual blended rate — export the last three months of effective unit rates from Cost Explorer. Teams that compare hardware quotes against cloud list prices systematically overstate the savings case.
Treating colo as free operations. Colocation eliminates facilities capex but not operational burden. Power monitoring, network redundancy, physical security, and the logistical overhead of shipping replacement hardware to a remote data centre are all real costs. Budget time, not just dollars.
The hybrid outcome most organisations reach
Companies that have approached this carefully — 37signals, Stack Overflow, Wikimedia Foundation — have landed in roughly the same place: cloud for the unpredictable, owned or leased for the stable baseline. This is not a compromise. It is the technically correct answer to the fact that workloads within the same organisation have different demand profiles.
A practical hybrid pattern:
- Cloud for elasticity-justified workloads. CI/CD pipelines, staging environments, preview deployments, and burst capacity are inherently spiky. Elasticity earns its premium there; fighting it is waste.
- Cloud for managed services that would require significant investment to self-operate. Managed Kubernetes control planes, managed databases for lower-volume services, global CDN, and DDoS mitigation are worth buying, not building.
- Owned or leased hardware for steady high-volume baseline. Data platforms with flat growth, ML training pipelines with predictable weekly cadence, and bulk object storage where egress economics are punishing.
- Bare-metal cloud as a middle path. Providers such as Hetzner, OVH, and Equinix Metal offer physical or near-physical server economics — significantly better instance cost than public cloud — without requiring you to own and maintain the physical layer. This is often the right answer for teams that lack an established colo relationship.
~$2M
Annual saving
37signals, 2024
$10M+
5-yr projection
37signals
29%
Cloud spend wasted
Flexera 2026
3–5 yr
TCO horizon to model
standard practice
Source: 37signals public reporting (2024); Flexera State of the Cloud 2026
The nuance worth preserving: none of these case studies are an argument against the cloud. They are an argument against assuming the cloud is the correct cost structure for every workload regardless of its demand profile. The cloud is an excellent solution for variable, unpredictable, fast-changing demand. It is an expensive solution for large, stable, predictable baseline load — because you are paying the elasticity premium for elasticity you are not using. Identifying which of your workloads falls into which bucket, and then modelling both options honestly, is not heresy. It is standard FinOps practice, and it is overdue at most organisations.