Presto

Spot Instance Interruption Causing Query Failures

warning
reliabilityUpdated Feb 23, 2026

Presto queries fail unexpectedly due to worker nodes being terminated by cloud provider spot instance interruptions, detected faster when spot interruption metadata is available.

How to detect:

Monitor presto_execution_failed_queries_one_minute_rate spikes correlated with worker node losses. Check query info logs for spot instance interruption warnings. Track presto_memory_nodes decreasing suddenly. Spot interruptions appear in query traces with specific warning messages indicating node IP and interruption time.

Recommended action:

Implement automatic query retry logic for queries affected by spot interruptions. Use a mix of spot and on-demand instances for workers to balance cost and reliability. Enable query checkpointing if available. Monitor spot interruption notices and proactively drain nodes before termination when possible.