Luigi

Worker disconnection causes parent task to complete prematurely with pending children

warning
availabilityUpdated Apr 21, 2016(via Exa)
Technologies:
How to detect:

When a parent task requires multiple child tasks, the scheduler marks the parent as finished after the first child completes, leaving remaining children pending. This occurs when the worker disconnects from the scheduler due to heartbeat timeout (default 60s) or heartbeat transmission failure.

Recommended action:

Increase worker-disconnect-delay in [scheduler] config section above default 60s, or use multiple workers (--workers 2 or higher) to maintain scheduler connectivity. Monitor luigi.worker.active and luigi.scheduler.pending_tasks metrics to detect worker disconnections.