Compaction Backlog Causing Read Latency Cascade
criticalWhen pending compaction tasks accumulate faster than they can complete, reads must scan excessive SSTables, driving p99 latency upward and eventually causing timeout exceptions. This silent storage debt compounds over time.
Monitor cassandra_compaction_tasks_pending continuously climbing without corresponding drops, combined with rising cassandra_sstable count per read (cassandra_live_ss_table) and increasing cassandra_client_request_read_time_50p/99p percentiles. Confirm with cassandra_client_request_error increases showing ReadTimeoutException patterns.
Immediately check disk I/O capacity with nodetool tpstats and iostat. If compaction threads are saturated, temporarily increase concurrent_compactors in cassandra.yaml. Review compaction strategy choice (SizeTiered vs Leveled) based on read/write ratio. Scale storage or add nodes if writes consistently outpace compaction capacity.