Trino

PageTransportTimeoutException occurs without worker failure or obvious resource exhaustion

warning
Connection ManagementUpdated Dec 25, 2024
Technologies:
How to detect:

Queries fail with PageTransportTimeoutException (Total timeout 10000 ms elapsed) when coordinator cannot fetch results from worker nodes. Multiple retry failures (e.g., 7 failures over 62.26s) occur even when worker is running, has low CPU load (under 30%), and only brief GC pauses (424ms). Issue may correlate with data skew or large amount of data going to single worker.

Recommended action:

Check for data skew by examining query plan and partition distribution. Consider increasing HTTP client timeout configuration if available. Add partitioning to queries to reduce data volume per worker. Monitor worker network connectivity and HTTP response times. Check worker logs for EofException messages around the same time. Retry query as error is often transient.