JVM Heap Pressure Manifesting as Write Timeouts
criticalWhen heap usage climbs above 80-90% without GC recovery, or when GC pause times exceed 500ms, Cassandra cannot process requests during stop-the-world pauses. This manifests as WriteTimeoutException at the application layer despite healthy disk and network.
Monitor cassandra heap memory usage trending above 85% for more than 5 minutes without corresponding drops. Correlate with cassandra_write_timeouts increasing and check for GC pause times (via JMX jvm.gc.collections.elapsed) exceeding 500ms. Confirm with cassandra_client_request_error showing WriteTimeoutException or OverloadedException patterns.
Immediately review JVM heap size in jvm.options, ensuring -Xms and -Xmx are set appropriately (typically 8-16GB, never exceeding 50% of system RAM). Analyze GC logs to identify young vs old generation pressure. Tune garbage collector strategy (G1GC recommended). If memtables are oversized, reduce memtable_heap_space_in_mb. Consider horizontal scaling if write throughput consistently exceeds single-node capacity.