ClimsTechStress-test your architecture

Multi-tenant SaaS · Logging transformation

Following one request across dozens of services

A central logging platform designed around correlation, structured fields, retention, access and actionable investigation.

ELKElasticsearchKibana

180 GB

of logs processed daily

45 min

investigation (from 3h)

36

services covered

84%

less time on individual servers

In brief

A multi-tenant SaaS platform generated logs across applications, servers and cloud services, and engineers connected to individual machines during incidents. ClimsTech centralised collection, enriched events with structured context and request correlation, created dashboards and alerts, and set retention and access policy — moving investigation from machine-by-machine searching to shared, correlated analysis.

Working constraints

Multiple applications and formats
High daily log volume
Sensitive tenant information
Different retention requirements
Existing server access practices
Need for correlation
Search performance and cost

The problem

What was actually going wrong

A multi-tenant SaaS platform generated logs across applications, servers, and cloud services. Engineers connected to individual machines during incidents and struggled to trace a request across service boundaries.

What discovery surfaced

1Log formats differed significantly.
2Timestamps and identifiers were inconsistent.
3Some logs contained unnecessary sensitive data.
4Important logs could be overwritten locally.
5Engineers relied on server access.
6Retention was not based on operational need.

The engineering

What we built and changed

1Source inventory

Applications, servers, databases, and cloud services were mapped.

2Collection and parsing

Log shippers and processing rules transformed data into consistent fields.

3Correlation

Request, tenant, environment, service, and severity identifiers were added where appropriate.

4Retention and access

Different categories received different retention and role-based access.

5Operational use

Dashboards, search patterns, and alerts supported common investigation scenarios.

Incident investigation moved from machine-by-machine searching to shared, structured, and correlated analysis.

The architecture

Before and after

Before

Applications and services — inconsistent log formats
Local server log storage — overwritable
No request correlation identifiers
Per-server access for incident investigation
No structured retention or access policy

After

Applications, servers, databases, cloud services
Log shippers
Processing and enrichment
Elasticsearch
Kibana dashboards
Alerts

Judgement calls

Decisions that shaped the outcome

Why not retain everything indefinitely?

Retention cost and search performance must align with business and operational need.

Why prioritise request identifiers?

Correlation enables engineers to follow one transaction through multiple services.

Why remove secrets and unnecessary personal data?

Logs are operational records, not unrestricted data stores.

What this engagement proves

Central logging becomes valuable when data is structured around investigation, not merely collected in one place.

Field notes on this class of problem

All field notes

Illustration for field note

SRE & reliability

Observability at scale: when telemetry becomes a deletion problem

At scale, observability becomes a deletion problem — put backpressure on telemetry.

16 min read

Illustration for field note

SRE & reliability

OpenTelemetry: instrument once, route anywhere

Separate instrumentation from destination and vendor lock-in stops setting the bill.

17 min read

Illustration for field note

Kubernetes & platform

Kubernetes multi-tenancy: namespace isolation vs cluster isolation

Namespace or cluster isolation, decided by your actual trust boundaries.

19 min read

Related capability

Cloud architecture & migration

Secure, scalable cloud foundations — and safe ways to move onto them.

Struggling to trace production incidents?

See more engagements

Discuss centralised logging