PrefectKubernetes

Flow runs stuck in PENDING state after worker crash

critical
availabilityUpdated Feb 5, 2025(via Exa)
How to detect:

When a Kubernetes worker exits unexpectedly while a flow run is marked PENDING but before the K8s Job is scheduled, the flow run remains stuck in PENDING state forever. This is a race condition related to the non-atomic nature of marking flows as pending and submitting jobs to Kubernetes.

Recommended action:

Monitor for flows stuck in PENDING state beyond expected duration. Manual intervention required to reset stuck flows. Prevention: implement PREFECT_CLIENT_RETRY_EXTRA_CODES to reduce worker crashes. Long-term: requires storing state locally or implementing transactional job submission.