For years, adopting an observability vendor meant threading their SDK or agent through every service you owned. Switching later cost more than the initial integration — it meant re-instrumenting the entire fleet — so most teams stayed put, even when the bill climbed into six figures per month. OpenTelemetry ends that calculus. It is a vendor-neutral standard for generating, collecting, and routing traces, metrics, and logs, and it has become the industry default not through marketing but through genuine engineering quality. In May 2026, CNCF graduated OpenTelemetry, making it one of the most production-proven open-source projects in the cloud-native ecosystem. Getting it right in production takes more care than the quick-start suggests. The Collector is the real cost-control surface, cardinality is the silent bill multiplier almost nobody warns you about, and tail sampling is the most underused mechanism in most stacks. This post covers all three with enough specifics to act on.
One standard, any backend
OpenTelemetry is best understood as a pipeline with three distinct roles.
The SDK lives inside your service. It generates telemetry — spans for traces, measurements for metrics, log records — using a standardised data model. The SDK emits over OTLP, the OpenTelemetry Protocol, a compact gRPC or HTTP/protobuf wire format that every major observability vendor now accepts natively. The SDK is what your engineers interact with; the rest of the stack is below the application layer.
The Collector sits between your services and your backends. It is a standalone binary — or a DaemonSet, sidecar, or standalone gateway deployment in Kubernetes — that receives OTLP from your services, runs a configurable processing pipeline (batching, filtering, sampling, enriching, redacting), and exports to one or more backends. The critical point: the backend is a configuration decision, not a code change.
The backend is whatever fits your requirements: Jaeger, Grafana Tempo, Prometheus, Mimir, Datadog, Honeycomb, Grafana Cloud, or a self-hosted Clickhouse table. You can route the same signal to two backends in parallel during a migration evaluation, and you can swap vendors without touching a single line of application code.
OTel covers four signal types today. The following table shows how to choose between them:
| Signal | Best suited for | Primary cost driver | Stability (mid-2026) | |--------|-----------------|---------------------|----------------------| | Traces | Distributed request flows, latency root-cause, service dependency mapping | Span volume multiplied by attribute count | Stable | | Metrics | Aggregated performance dashboards, alerting on rates and percentiles | Cardinality — distinct label combinations | Stable | | Logs | Unstructured and semi-structured event data, audit trails | Log volume and indexing depth | Stable | | Profiles | CPU and memory hotspot analysis in production without code changes | Sample rate multiplied by fleet size | Beta |
The architectural advantage OTel has over assembling four separate tools is correlated telemetry: a log record that links to the trace that caused it, a metric data point stamped with the span context it came from, via shared TraceID and SpanID propagation. That correlation is what makes a single OTel pipeline more useful than a patchwork of vendor-specific agents that never quite talk to each other.
The CNCF graduation in May 2026 confirmed what contributor activity had already shown. OTel is the second most active CNCF project by velocity, behind only Kubernetes, with over 12,000 contributors from more than 2,800 companies. In the 12 months preceding graduation, the JavaScript API package was downloaded more than 1.36 billion times and the Python API package surpassed 1.3 billion downloads (CNCF graduation announcement, May 2026). That download volume is the ecosystem flywheel: auto-instrumentation libraries for every major framework, native OTLP exporters baked into every major vendor, and a talent pool that treats OTel as table stakes rather than something to evaluate.
12,000+
Contributors
from 2,800+ companies
1.36B
JS API downloads
in 12 months to May 2026
48%
Orgs using OTel
in production or active POC
#2
CNCF project velocity
behind Kubernetes only
Source: CNCF, May 2026; Grafana Labs Observability Survey, 2025
The observability bill is larger than most teams admit
Observability has quietly become a serious infrastructure cost centre. Grafana Labs' 2025 Observability Survey found that observability spend averages 17% of total compute infrastructure spend, with the most common single reported figure at 10% and a meaningful tail of organisations that spend more on watching their systems than on running them. Half of respondents expected to spend more the following year — not primarily because vendor prices were rising (only about a quarter cited that), but because of broader internal adoption driving volume higher.
The 80% vs. 14% gap between expert and early-stage organisations using OTel is not coincidental. Teams with mature observability practices reach for OTel precisely because it gives them the cost-control pipeline that proprietary agents do not. The two biggest bill drivers are cardinality in metrics and volume in traces and logs. A well-configured Collector addresses both, upstream of any paid ingestion endpoint — which is the key insight that makes the Collector the most important single component in an OTel deployment for most organisations.
Inside the Collector: receivers, processors, exporters
A Collector configuration has three sections mapping to the three pipeline stages — receivers, processors, exporters — plus a service section that wires them into named pipelines. Here is a working baseline that covers the essentials:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
processors:
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
batch:
send_batch_size: 8192
timeout: 200ms
filter/drop_health:
spans:
exclude:
match_type: regexp
attributes:
- key: http.route
value: "/health.*"
resource:
attributes:
- key: environment
value: "production"
action: upsert
exporters:
otlphttp/tempo:
endpoint: "https://tempo.internal:4318"
prometheusremotewrite:
endpoint: "https://prometheus.internal/api/v1/write"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, filter/drop_health, resource]
exporters: [otlphttp/tempo]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [prometheusremotewrite]Four things worth calling out in order of importance:
memory_limiter must be first in every pipeline. Without it, a traffic spike causes the Collector process to OOM and silently drop all telemetry. This processor applies backpressure — signalling upstream senders to slow down — before memory allocation becomes a crisis. Make it the first processor in every pipeline without exception.
batch goes second. Batching reduces the number of export calls and is critical for both performance and cost. Most backends price on ingestion events or API calls, not raw bytes. A send_batch_size of 4,096–8,192 spans is a reasonable starting point at moderate traffic. Higher values reduce export overhead but increase memory footprint and tail latency in the processing stage.
Filter before export, not after. The filter processor drops data before it reaches exporters — meaning you never pay to ingest it. Health-check endpoints, Kubernetes readiness probes, and liveness checks are almost always worthless in traces. Filter them at the Collector, not inside the vendor's pipeline where ingestion cost has already been incurred.
resource attaches metadata once for the entire fleet. Adding environment, cluster, or region attributes here gives every service on the fleet consistent tagging without application code changes. Inconsistent labelling across services is one of the most common reasons dashboards and multi-service alerts break silently in production. Fix it in one place, not fifty.
On Kubernetes, three deployment topologies exist. A DaemonSet Collector runs one instance per node and receives telemetry over localhost — low network overhead, good for log and metric collection. A gateway deployment is a centralised Collector fleet behind a load balancer — better for tail sampling, which requires all spans for a trace to reach the same instance. A sidecar Collector runs one per pod — maximum isolation, but high resource overhead at scale. The practical default is DaemonSet for metrics and logs, gateway for traces with tail sampling enabled.
Cardinality: the silent bill multiplier
Cardinality is the count of distinct time series your metrics backend must track simultaneously. A low-cardinality label like http.method with four possible values adds four time series per metric — completely manageable. A high-cardinality label like user.id with one million distinct users adds one million time series per metric.
Here is the worked math. Suppose you have a http.server.request.duration histogram with three dimensions: http.method (4 values), http.status_code (12 values), http.route (50 parameterised routes). That produces at most 2,400 unique label combinations — manageable. Now add a user.id label drawn from 200,000 active users. The combination count jumps from 2,400 to roughly 480 million unique time series for that one metric. A Prometheus-style backend tracking 480 million active time series will degrade under query load, cost far more, and may fall over entirely. This is not a hypothetical failure mode — it happens routinely when developers add request IDs, session tokens, or un-parameterised URL paths to metric attributes without understanding the downstream consequence.
The Collector's transform processor strips or hashes high-cardinality attributes before they reach the metrics exporter, with no application code changes required:
processors:
transform/strip_high_cardinality:
metric_statements:
- context: datapoint
statements:
- delete_key(attributes, "user.id")
- delete_key(attributes, "request.id")
- delete_key(attributes, "session.id")
- delete_key(attributes, "http.url")The OTTL (OpenTelemetry Transformation Language) statements inside metric_statements run at the data point level before export. Running this config on a staging Collector and comparing time series counts before and after commonly shows a 70–90% reduction for services with any user-facing traffic. The economic impact of that reduction is direct: most metrics backends price on active time series count, not on query volume.
Tail sampling: keep the signal, drop the noise
Most traces from a healthy production service carry no useful diagnostic signal at full fidelity. At 50,000 requests per minute with a 0.5% error rate, you generate 50,000 traces per minute — 49,750 of which are structurally identical successful requests at normal latency. Ingesting all of them costs the same as ingesting 250 error traces, while providing almost no incremental diagnostic value over a 1% random sample of the successes.
The naive approach is head sampling: flip a coin at trace initiation, propagate the keep/drop decision downstream via the W3C traceparent header. Head sampling is stateless and adds zero Collector memory overhead, but it is blind to outcomes. A request that starts normally and fails 800 milliseconds later is dropped based on a coin flip made when the request arrived. You lose exactly the traces you most want to keep.
Tail sampling inverts the decision. The Collector buffers spans for a complete trace and makes the keep/drop decision only after the trace is complete, based on actual outcome. Define policies: keep 100% of error traces, keep 100% of traces above a latency threshold, sample everything else aggressively.
The worked math: service at 50,000 requests per minute, 0.5% error rate, with a 300ms latency threshold at the p99 boundary:
- Errors: 250 requests per minute — policy keeps 100%
- Slow traces above 300ms: approximately 500 requests per minute — policy keeps 100%
- All other traces: roughly 49,250 per minute — probabilistic policy samples at 1%, keeping about 493
Result: approximately 1,243 traces per minute forwarded to the backend instead of 50,000 — a 97.5% volume reduction, with near-complete retention of every error trace and every latency outlier.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: keep-slow-traces
type: latency
latency:
threshold_ms: 300
- name: probabilistic-baseline
type: probabilistic
probabilistic:
sampling_percentage: 1Two operational requirements: decision_wait must exceed your longest expected trace duration. If a distributed transaction spans 8 seconds end-to-end, a decision_wait of 5 seconds makes decisions on incomplete traces, producing incorrect results. Set it to the 99th percentile of your actual trace duration with meaningful headroom. And num_traces bounds the in-memory buffer — at 100,000 in-flight traces with typical span sizes and attribute density, plan for 1–2 GB of Collector memory.
For horizontally scaled Collector deployments, the load-balancing exporter routes all spans for a given trace ID to the same Collector instance. Tail sampling requires all spans belonging to a trace to be co-located in one process; without consistent routing, policies evaluate on incomplete traces and the sampling decisions are wrong:
exporters:
loadbalancing:
protocol:
otlp:
timeout: 1s
resolver:
dns:
hostname: otel-collector-headless.observability.svc.cluster.local
port: 4317Tail sampling is the single highest-leverage Collector feature for cost control. It is almost always the last thing teams configure, which explains why most observability bills are higher than they need to be.
Auto-instrumentation vs. manual spans: when each fits
OTel supports two instrumentation strategies that are meant to be combined, not chosen between.
Auto-instrumentation works by attaching a language-specific agent — a Java agent JAR, a Python site-package import, a Node.js --require loader — that patches standard I/O libraries at process startup. For Java, the entire instrumentation is a JVM flag:
java -javaagent:/opt/otel/opentelemetry-javaagent.jar \
-Dotel.service.name=orders-service \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
-jar app.jarZero application code changes. The agent automatically instruments JDBC, HTTP clients, gRPC, Kafka, Redis, and most major application frameworks. For existing services where touching application code is disruptive, or for migrations where breadth of coverage matters more than depth, auto-instrumentation is the right starting point.
The limitation is that auto-instrumentation produces generic technical spans — HTTP POST /api/v1/orders — with no business context. It cannot tell you that a slow request belongs to an enterprise-tier customer, that a failed operation was the third automatic retry, or that this particular code path was triggered by a batch job rather than an interactive user session. Generic infrastructure telemetry is useful for debugging database timeouts. It is not useful for understanding which customer segments are affected by a degradation.
Manual instrumentation adds that context. This is where OTel's clean API earns its keep:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
)
tracer := otel.Tracer("orders-service")
func ProcessOrder(ctx context.Context, order Order) error {
ctx, span := tracer.Start(ctx, "orders.process",
trace.WithAttributes(
attribute.String("customer.tier", order.CustomerTier),
attribute.Int64("order.item_count", int64(len(order.Items))),
attribute.Bool("order.is_retry", order.IsRetry),
attribute.String("order.trigger_source", order.TriggerSource),
),
)
defer span.End()
if err := fulfil(ctx, order); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return err
}
return nil
}The attributes on that span — customer.tier, order.item_count, order.is_retry, order.trigger_source — are what transform trace data from infrastructure telemetry into something product and engineering teams can act on. A trace showing 2-second latency is interesting. A trace showing it affects only enterprise-tier orders with more than 50 items and always originates from the batch API path is actionable.
Auto-instrumentation only
- Zero code changes — agent or loader attachment only
- Full I/O coverage: HTTP, DB, messaging, gRPC automatically
- Spans describe infrastructure events and call boundaries
- Cannot answer: which customers? which code path? which retry attempt?
Auto + manual instrumentation
- Auto-instrumentation for complete I/O baseline
- Manual spans at business logic boundaries
- Custom attributes: customer tier, feature flags, retry state, trigger source
- Traces answer business questions, not just infrastructure ones
The right sequencing is to get auto-instrumentation working across the fleet first, then identify the five to ten business-critical code paths where context attributes would change how you respond to an incident, and add manual spans there. Do not try to manually instrument everything upfront — the marginal value of each additional manual span diminishes quickly once the high-value paths are covered.
A migration path that avoids the big bang
The most common adoption mistake is treating OTel as a wholesale replacement that has to happen in a single project. It does not. The Collector's multi-export capability means old and new instrumentation can coexist throughout the migration, existing backends keep working from day one, and the migration happens incrementally alongside normal engineering work.
- 01
Deploy a Collector in front of existing backends
Stand up a Collector that receives OTLP and exports directly to your current backend unchanged. No application changes yet. This validates the Collector works in your environment and gives you pipeline flexibility immediately — cost controls can be configured before a single service is migrated.
- 02
Instrument all new services with the OTel SDK by default
From this point, every new service uses OTel auto-instrumentation as the default. Route OTLP through the Collector. Existing services continue with proprietary agents. The fleet migrates naturally as the new-service share grows.
- 03
Configure cost controls in the Collector
Add memory_limiter, batch, filter, tail_sampling, and transform processors. Strip high-cardinality attributes from metrics. Run tail sampling on traces. Measure the observability bill before and after — this step frequently pays for the entire migration effort in reduced ingestion costs.
- 04
Migrate existing services as you touch them for other reasons
Swap proprietary agents for OTel auto-instrumentation during routine feature work, dependency upgrades, or refactors. No dedicated migration sprint needed. At a typical engineering velocity, most fleets reach full OTel coverage within 3 to 6 months without any additional project overhead.
- 05
Evaluate and swap backends via Collector config alone
Once the fleet is fully on OTel, changing backends is a Collector configuration change. Run two backends in parallel during an evaluation period — the Collector fans out to both. Cut over when confident, with zero re-instrumentation.
Source: ClimsTech Engineering practice
Three things that trip up migrations in practice:
Context propagation breaks at uninstrumented boundaries. If a service in the middle of a call chain uses a proprietary agent that does not propagate the W3C traceparent header, the trace splits into disconnected fragments at that boundary. Map the full service call graph before migrating and identify propagation gaps. In most cases, bridging the gap is a one-line SDK initialisation change per service — but finding it mid-incident is expensive.
Kubernetes attribute enrichment needs matching RBAC. The k8sattributes processor enriches spans and metrics with pod name, namespace, deployment, and node metadata by querying the Kubernetes API. It requires a ServiceAccount with get and list access on pods and nodes. Missing RBAC permissions produce silent enrichment failures that are easy to miss until you notice inconsistent dimension coverage in dashboards.
The Collector is stateful during tail sampling. The tail sampling buffer lives in memory. A Collector pod restart drops all buffered in-flight traces. Apply a PodDisruptionBudget to prevent simultaneous restarts, and set maxUnavailable: 1 on rolling updates. If you run trace-based SLOs, a brief gap in coverage during a deploy will be visible — set stakeholder expectations accordingly.