JVM Heap Pressure Cascading Failure

critical

Resource ContentionUpdated Feb 6, 2026

When JVM heap usage stays above 85% for extended periods, garbage collection pauses increase dramatically, leading to node unresponsiveness, cluster state propagation failures, and potential split-brain scenarios.

Sources

How to Monitor Elasticsearch Cluster Health with the OpenTelemetry ...oneuptime.com

Elasticsearch performance optimization | Severalninesseveralnines.com

10 Elasticsearch Production Issues (and How Postgres Avoids Them)www.tigerdata.com

How to Debug an Unresponsive Elasticsearch Clusterwww.moesif.com

Technologies:

ElasticsearchSymptoms of this issue are visible in Elasticsearch metrics and logs

elasticsearch.cluster.pending_tasks

How to detect:

jvm.memory.heap.used / jvm.memory.heap.max ratio sustained above 0.85 for 15+ minutes, combined with increasing jvm.gc.collections.elapsed and rising elasticsearch.cluster.pending_tasks

Recommended action:

Immediate: Add nodes to distribute heap load and reduce per-node memory pressure. Short-term: Investigate heap consumers via heap dump analysis - check for field data cache (elasticsearch.node.cache.memory.usage), indexing buffers (elasticsearch.indexing_pressure.memory.limit), or query cache issues. Medium-term: Review field data cache settings (indices.fielddata.cache.size), enable doc values for aggregations instead of field data, allocate minimum 50% of system RAM to JVM heap (up to 32GB max) while leaving rest for OS file system cache. Monitor jvm.gc.collections.count for frequency of GC cycles.