Hot Thread Query Starvation

warning

Resource ContentionUpdated Feb 6, 2026

Expensive queries or indexing operations monopolize thread pool workers, causing benign requests to queue indefinitely. Manifests as stuck tasks in _cat/tasks with millisecond operations taking minutes while thread pools show 100% utilization.

Sources

How to Monitor Elasticsearch Cluster Health with the OpenTelemetry ...oneuptime.com

Profile search requests | Referenceelastic.co

How to Debug an Unresponsive Elasticsearch Clusterwww.moesif.com

Top 10 Elasticsearch Metrics to Monitor Performancesematext.com

Technologies:

ElasticsearchSymptoms of this issue are visible in Elasticsearch metrics and logs

elasticsearch.thread_pool.flush.completed.count

elasticsearch.thread_pool.index.completed.count

elasticsearch.thread_pool.bulk.threads.count

elasticsearch.thread_pool.fetch_shard_started.queue

elasticsearch.thread_pool.listener.rejected.count

How to detect:

Use _nodes/hot_threads?snapshots=1000 to identify code paths appearing in 50%+ of snapshots with high CPU%. Cross-reference with _cat/tasks?detailed showing operations stuck for minutes/hours and elasticsearch.thread_pool.*.queue metrics showing sustained saturation.

Recommended action:

Identify expensive queries via hot threads API and optimize them (reduce aggregation depth, use filters instead of queries, limit result size). Use X-Opaque-Id header to trace request sources. If refresh operations dominate, increase elasticsearch_index_refresh_time_seconds interval beyond default 1s. Consider adding dedicated coordinating-only nodes to isolate query coordination from data node work.