Cassandra insights
Open SourceVersions: [current]30 metricsSlow storage write operations block collector workers, causing span reception to slow and queues to back up, ultimately leading to dropped traces.
High P95/P99 query latencies (>5 seconds) make Jaeger UI unusable during incident troubleshooting, typically caused by slow storage reads, overloaded shards, or inefficient trace queries.
Frequent or long garbage collection pauses appear as query timeouts and high cassandra_read_timeouts / cassandra_write_timeouts, but root cause is JVM memory pressure, not database overload. GC pauses over 500ms translate to client-side timeouts.
Grafana instances using Cassandra as a backend can experience silent performance degradation when JMX metrics (compaction tasks, heap usage, GC pauses) are not collected. Without JMX visibility, operators miss early warnings of heap exhaustion, compaction backlog, or GC thrashing that manifest as sudden dashboard failures.