Hot Thread Query Starvation
warningExpensive queries or indexing operations monopolize thread pool workers, causing benign requests to queue indefinitely. Manifests as stuck tasks in _cat/tasks with millisecond operations taking minutes while thread pools show 100% utilization.
Use _nodes/hot_threads?snapshots=1000 to identify code paths appearing in 50%+ of snapshots with high CPU%. Cross-reference with _cat/tasks?detailed showing operations stuck for minutes/hours and elasticsearch.thread_pool.*.queue metrics showing sustained saturation.
Identify expensive queries via hot threads API and optimize them (reduce aggregation depth, use filters instead of queries, limit result size). Use X-Opaque-Id header to trace request sources. If refresh operations dominate, increase elasticsearch_index_refresh_time_seconds interval beyond default 1s. Consider adding dedicated coordinating-only nodes to isolate query coordination from data node work.