PrefectKubernetes

Kubernetes pods remain Pending and never start after 60 seconds

critical
availabilityUpdated Nov 18, 2022(via Exa)
How to detect:

Kubernetes pods created for Prefect flow runs remain in Pending status for 60 seconds and then fail with 'Pod never started' error, causing flow run failures. This indicates pod scheduling issues in the Kubernetes cluster.

Recommended action:

Investigate Kubernetes cluster capacity and scheduling: check node availability and resources with kubectl get nodes, verify resource requests/limits in Base Job Manifest are not too high, check for image pull errors with kubectl describe pod, ensure sufficient cluster capacity exists. Review pod events for specific scheduling failure reasons.