Dead workers block concurrency slots indefinitely

warning

Resource ContentionUpdated Nov 27, 2024(via Exa)

Sources

heartbeat for runners for better execution in environments with autoscaling · Issue #16142 · PrefectHQ/prefectgithub.com

Technologies:

Prefectsubject

How to detect:

When workers die without notifying the server, they continue holding concurrency slots. The server refuses to schedule new work until the dead worker's run is manually set to Failed. Per-deployment and global concurrency settings exacerbate this issue.

Recommended action:

Avoid using per-deployment or global concurrency limits without runner heartbeat detection enabled. Implement timeout automations to forcibly fail stuck runs and free concurrency slots. Monitor work queue depth for blocking conditions.