Digital media · Cloud architecture & production modernisation

Built for traffic that does not arrive gradually

A cloud-native platform designed to absorb sudden live-event demand, protect the database layer, and give engineers greater control over availability, deployment and recovery.

CricRadioNamed with the client's permission.

AWSKubernetesRedisMongoDB

120,000+

concurrent-user load-test capacity

42%

faster average API response

65%

fewer manual production interventions

99.97%

peak-event availability

In brief

A real-time sports platform needed to support highly concentrated demand during live matches, where traffic could multiply within minutes around match starts and key moments. The existing environment relied on fixed-capacity compute, shared application resources and direct database reads, so latency rose and engineers intervened manually under load. ClimsTech redesigned the platform around containerised services on AWS — Kubernetes for independent scaling and recovery, Redis to absorb repeated reads, and observability across latency, cache, workload health and database pressure.

Working constraints

Highly variable, event-driven traffic
Live production users throughout the migration
Existing MongoDB application dependency
Limited release windows during major events
No complete application rewrite
Mixed stateless and stateful workloads
Small internal operations team
Strict response-time expectations

Verified outcomes

What changed for the business

Area	Before	After
Scaling	Manual server changes	Automated workload scaling
Recovery	Engineer-led restarts	Automatic workload replacement
Database demand	Repeated direct reads	Redis-supported caching
Deployment	Long, manual release process	Controlled rolling deployment
Visibility	Infrastructure-only alerts	Application and platform dashboards

The problem

What was actually going wrong

Live sports products have a distinctive demand profile. Capacity cannot be planned only around averages because the most important customer moments also create the sharpest demand. The platform needed to remain responsive during sudden user surges while avoiding permanent over-provisioning during quieter periods.

What discovery surfaced

1Multiple services were scaled together despite very different demand patterns.
2Frequently requested score and session data repeatedly reached the primary database.
3Application capacity was defined by server size rather than service behaviour.
4Alerts showed infrastructure pressure but did not identify the user-facing application path.
5Releases lacked a consistent rollback model.
6Resource allocation varied between environments.

The engineering

What we built and changed

1Container platform

Application services were packaged into Docker images and deployed onto Kubernetes, with workloads separated so high-demand APIs could scale independently from background workers. Health probes, resource limits, and rolling deployment strategies were defined as production requirements.

2Performance and data protection

Redis was introduced as a caching layer for frequently accessed information, reducing repeated MongoDB reads and stabilising response times during demand spikes.

3Elastic scaling

Horizontal scaling policies were configured using CPU, memory, and application-level indicators, calibrated through staged load tests rather than default settings.

4Delivery automation

A standard delivery workflow automated container build, validation, registry publication, deployment, and rollback, with environment-specific configuration separated from application images.

5Production observability

Dashboards were created covering request volume, response time, error rate, cache hit ratio, database utilisation, pod health, and scaling events, with alerts aligned to customer impact rather than infrastructure fluctuation alone.

Before the engagement, senior engineers were needed for scaling, release coordination, and incident recovery. After the transformation, the team had standardised pipelines, shared dashboards, automated recovery, and clearer operational ownership.

The architecture

Before and after

Before

Single load balancer
Fixed application servers
MongoDB
Local logs
Manual scaling and recovery

After

CDN and edge protection
Application load balancer
Kubernetes application platform
API workloads
Background workloads
Redis
MongoDB
Metrics, logs, traces and alerts

Judgement calls

Decisions that shaped the outcome

Why Kubernetes instead of larger virtual machines?

Larger machines would have increased total capacity but would not have allowed application components to scale independently. Kubernetes provided workload-level elasticity, standardised deployment, and automated replacement of unhealthy services.

Why Redis?

The platform served high volumes of repeated information. Caching reduced avoidable demand on MongoDB and improved response consistency during traffic spikes.

Why rolling deployment?

The product needed to release updates without taking the complete platform offline. Rolling deployment allowed new versions to be introduced gradually and reversed more safely.

How it ran

Phase 1
Weeks 1–2
Discovery and dependency mapping
Phase 2
Weeks 3–5
Kubernetes foundation and network design
Phase 3
Weeks 6–9
Containerisation and Redis integration
Phase 4
Weeks 10–12
CI/CD and observability
Phase 5
Weeks 13–15
Load testing and performance tuning
Phase 6
Week 16
Production readiness and handover

What this engagement proves

Scalability was not achieved by adding compute alone. The largest gains came from separating workloads, protecting the database, and aligning scaling decisions with actual service behaviour.