Compaction Backlog Causing Read Latency Cascade

critical

Resource ContentionUpdated Jan 10, 2026

When pending compaction tasks accumulate faster than they can complete, reads must scan excessive SSTables, driving p99 latency upward and eventually causing timeout exceptions. This silent storage debt compounds over time.

Sources

Apache Cassandra Monitoring: Tools, Challenges & Best Practiceslast9.io

Apache Cassandra Troubleshooting Guidewww.developerindian.com

Cassandra troubleshooting guide - Site24x7www.site24x7.com

Apache Cassandra Monitoring with OpenTelemetry [including dashboards and alerts] | SigNozsignoz.io

Cassandra Monitoring: Metrics, Troubleshooting, and Observability with CubeAPM - CubeAPMcubeapm.com

Technologies:

CassandraThe root cause of this issue originates in Cassandra

How to detect:

Monitor cassandra_compaction_tasks_pending continuously climbing without corresponding drops, combined with rising cassandra_sstable count per read (cassandra_live_ss_table) and increasing cassandra_client_request_read_time_50p/99p percentiles. Confirm with cassandra_client_request_error increases showing ReadTimeoutException patterns.

Recommended action:

Immediately check disk I/O capacity with nodetool tpstats and iostat. If compaction threads are saturated, temporarily increase concurrent_compactors in cassandra.yaml. Review compaction strategy choice (SizeTiered vs Leveled) based on read/write ratio. Scale storage or add nodes if writes consistently outpace compaction capacity.