JVM GC Pauses Masquerading as Database Slowness
criticalFrequent or long garbage collection pauses appear as query timeouts and high cassandra_read_timeouts / cassandra_write_timeouts, but root cause is JVM memory pressure, not database overload. GC pauses over 500ms translate to client-side timeouts.
Monitor JVM GC metrics showing major GC events > 500ms or young gen GCs firing every second, correlated with spikes in cassandra_read_timeouts and cassandra_write_timeouts. Heap usage (jvm.memory.heap.used) stays at 80-90% without sawtooth drops indicating successful GC.
Tune JVM heap size (MAX_HEAP_SIZE and HEAP_NEWSIZE in cassandra-env.sh). Review GC logs to identify allocation patterns. If heap exhaustion persists at 80-90%, either reduce memtable sizes, reduce key/row cache sizes, or add nodes to reduce per-node data volume. Consider switching garbage collectors or upgrading to newer JVM versions with better GC algorithms.