Luigi worker fails to connect to central scheduler causing task cascade failure
criticalConnection ManagementUpdated Oct 10, 2017(via Exa)
Technologies:
How to detect:
Worker unable to establish connection to central scheduler at localhost:8082. Max retries exceeded (default 3 attempts with 30s wait between retries). Connection refused error indicates scheduler is unavailable or not listening on expected port. Causes all downstream tasks to fail.
Recommended action:
Increase RPC connection timeout from default 10s to 60s, retry attempts from 3 to 10, and retry wait from 30s to 60s in luigi.cfg [core] section. Verify central scheduler is running and accessible on configured port. Alternatively, use --local-scheduler flag for single-worker scenarios. Consider adding process monitoring to auto-restart scheduler on crash.