Elasticsearch

Hot Thread Query Starvation

warning
Resource ContentionUpdated Feb 6, 2026

Expensive queries or indexing operations monopolize thread pool workers, causing benign requests to queue indefinitely. Manifests as stuck tasks in _cat/tasks with millisecond operations taking minutes while thread pools show 100% utilization.

How to detect:

Use _nodes/hot_threads?snapshots=1000 to identify code paths appearing in 50%+ of snapshots with high CPU%. Cross-reference with _cat/tasks?detailed showing operations stuck for minutes/hours and elasticsearch.thread_pool.*.queue metrics showing sustained saturation.

Recommended action:

Identify expensive queries via hot threads API and optimize them (reduce aggregation depth, use filters instead of queries, limit result size). Use X-Opaque-Id header to trace request sources. If refresh operations dominate, increase elasticsearch_index_refresh_time_seconds interval beyond default 1s. Consider adding dedicated coordinating-only nodes to isolate query coordination from data node work.