Jaeger insights
Open SourceVersions: [current]2 metricsIn Jaeger v2 (OTEL-based), when otelcol_receiver_accepted_spans increases but otelcol_exporter_sent_spans does not match the rate, spans are being lost in the processing pipeline between reception and storage.
Incorrect sampling configuration (especially default 1-in-1000 in legacy Jaeger SDKs) causes most traces to be dropped at the client, creating observability gaps that appear as missing spans in the backend.
When Jaeger collector's internal queue exceeds 70-80% capacity, spans begin queueing and risk being dropped, resulting in incomplete traces and data loss.
Slow storage write operations block collector workers, causing span reception to slow and queues to back up, ultimately leading to dropped traces.
High P95/P99 query latencies (>5 seconds) make Jaeger UI unusable during incident troubleshooting, typically caused by slow storage reads, overloaded shards, or inefficient trace queries.
Distributed agent architectures require trace correlation across multiple context windows and parallel execution paths. Without proper instrumentation, teams lose visibility into subagent activities, making root cause analysis impossible when investigations fail.
Network namespace issues prevent spans from reaching Jaeger collectors, manifesting as zero spans received despite applications generating traces. Common in Docker/Kubernetes deployments.