Trino

PAGE_TRANSPORT_TIMEOUT causes query failures after 1 minute

critical
Connection ManagementUpdated Nov 17, 2024(via Exa)
Technologies:
How to detect:

Queries running longer than approximately 1 minute fail with PAGE_TRANSPORT_TIMEOUT error when processing large datasets (>30GB). Default exchange.http-client.max-content-length is insufficient for data transfers between coordinator and workers. Manifests as repeated failures (e.g., '104 failures, failure duration 60.02s') especially with ORDER BY operations.

Recommended action:

Increase exchange.http-client.max-content-length to 4GB on both coordinator and workers (this is the key setting). Also increase query.remote-task.max-error-duration to 10m, query.remote-task.max-request-size to 10GB, and query.max-memory-per-node to 6GB. Optionally tune exchange.http-client.request-timeout (30-60s) and exchange.http-client.idle-timeout (60s-2m). Apply to both coordinator and worker config files.