Request timeout configuration prevents long-running inference
warningconfigurationUpdated Mar 7, 2026(via Exa)
Technologies:
How to detect:
TimeoutMiddleware enforces traffic.timeout limit on all requests. Long-running model inference or batch processing exceeding this threshold results in 503 Service Unavailable errors, even when the operation is healthy.
Recommended action:
Set traffic.timeout appropriately for your longest expected inference time. Monitor request duration distribution to identify p99 latency. For operations with variable duration, consider implementing streaming responses or splitting into async jobs.