Webserver Crashing on Timeout During DAG Load
criticalreliabilityUpdated Apr 3, 2019
Webserver repeatedly crashes and restarts (503 errors) when DAG loading exceeds timeout threshold, preventing UI access even while scheduler and workers continue operating.
Technologies:
How to detect:
Monitor for repeated webserver restarts, 503 HTTP errors, and webserver memory usage. Check if webserver logs show timeout errors during DAG parsing or initialization phase.
Recommended action:
Increase web_server_master_timeout and web_server_worker_timeout configuration; allocate more CPU/memory resources to webserver (minimum 5 AU / 0.5 CPU / 1.88 GiB); move heavy API calls and database queries inside operators rather than at DAG top-level; optimize DAG parsing performance.