Cassandra

Compaction Backlog Causing Read Latency Cascade

critical
Resource ContentionUpdated Jan 10, 2026

When pending compaction tasks accumulate faster than they can complete, reads must scan excessive SSTables, driving p99 latency upward and eventually causing timeout exceptions. This silent storage debt compounds over time.

How to detect:

Monitor cassandra_compaction_tasks_pending continuously climbing without corresponding drops, combined with rising cassandra_sstable count per read (cassandra_live_ss_table) and increasing cassandra_client_request_read_time_50p/99p percentiles. Confirm with cassandra_client_request_error increases showing ReadTimeoutException patterns.

Recommended action:

Immediately check disk I/O capacity with nodetool tpstats and iostat. If compaction threads are saturated, temporarily increase concurrent_compactors in cassandra.yaml. Review compaction strategy choice (SizeTiered vs Leveled) based on read/write ratio. Scale storage or add nodes if writes consistently outpace compaction capacity.