JVM Heap Pressure Cascading Failure
criticalWhen JVM heap usage stays above 85% for extended periods, garbage collection pauses increase dramatically, leading to node unresponsiveness, cluster state propagation failures, and potential split-brain scenarios.
jvm.memory.heap.used / jvm.memory.heap.max ratio sustained above 0.85 for 15+ minutes, combined with increasing jvm.gc.collections.elapsed and rising elasticsearch.cluster.pending_tasks
Immediate: Add nodes to distribute heap load and reduce per-node memory pressure. Short-term: Investigate heap consumers via heap dump analysis - check for field data cache (elasticsearch.node.cache.memory.usage), indexing buffers (elasticsearch.indexing_pressure.memory.limit), or query cache issues. Medium-term: Review field data cache settings (indices.fielddata.cache.size), enable doc values for aggregations instead of field data, allocate minimum 50% of system RAM to JVM heap (up to 32GB max) while leaving rest for OS file system cache. Monitor jvm.gc.collections.count for frequency of GC cycles.