Trino

Cluster intermittently enters idle state after initial query execution

critical
performanceUpdated Apr 21, 2025
Technologies:
How to detect:

After initial queries execute successfully, the Trino cluster enters an idle state where no queries process despite resources being available. CPU utilization drops below 30% and memory below 40%. The cluster spontaneously recovers after 2-90 minutes, but may return to idle state. GC is not the cause (no full GC, all cycles under 110ms).

Recommended action:

Monitor coordinator logs for query scheduling or dispatch errors. Check query queue depth and coordinator resource utilization. Review network connectivity between coordinator and workers. Examine query.remote-task.max-error-duration (1m) to see if tasks are prematurely failing. Verify metastore connectivity and response times. Check if retry-policy=QUERY and query-retry-attempts=2 are causing retry loops.