Technologies/Prefect/prefect.flow_run.crash

PrefectMetric

prefect.flow_run.crash

Flow runs crashing unexpectedly

Dimensions:None

Available on:Native (1)

Interface Metrics (1)

Native

flow_run_crash_total

Total number of flow runs that crashed unexpectedly

Dimensions:None

Sources

flow_run_crash_totaldocs.prefect.io

Technical Annotations (64)

Configuration Parameters (9)

PREFECT_CLIENT_RETRY_EXTRA_CODESrecommended: 500,421

Enables retry logic for HTTP 500 responses to prevent worker crashes

concurrency_limitrecommended: 5-100 depending on capacity

Per-deployment cap prevents runaway concurrent executions

for_eachrecommended: ['prefect.resource.id'] or ['client_id']

Deduplicates runs per resource or client during floods

cluster-autoscaler.kubernetes.io/safe-to-evictrecommended: false

Prevents pod eviction during node scale-down

PREFECT_API_URLrecommended: https://api.prefect.cloud/api/accounts/...

Must point to Prefect Cloud API with correct account/workspace IDs

PREFECT_API_KEY

Must be valid API key (pnu_ for users, pnb_ for service accounts)

Base Job Manifest

custom Kubernetes job configuration that may specify resource requests/limits

pool_size

SQLAlchemy connection pool size per Prefect process; limits base connections

max_overflow

SQLAlchemy max additional connections beyond pool_size; caps connection burst

Error Signatures (11)

500 Internal Server Errorhttp status

prefect.exceptions.PrefectHTTPStatusErrorexception

unhandled errors in a TaskGroupexception

TypeError: 'MockValSer' object cannot be converted to 'SchemaSerializer'exception

ExceptionGroup: unhandled errors in a TaskGroupexception

Pod never startedlog pattern

Pod has status 'Pending'log pattern

remaining connection slots are reserved for non-replication superuser connectionslog pattern

asyncpg.exceptions.TooManyConnectionsErrorexception

connection-limit reached errorexception

httpx crashexception

CLI Commands (6)

prefect work-pool set-concurrency-limit my-pool 5remediation

prefect work-queue set-concurrency-limit my-queue 5 --pool my-poolremediation

prefect config viewdiagnostic

prefect cloud workspace lsdiagnostic

kubectl describe poddiagnostic

kubectl get nodesdiagnostic

Technical References (38)

ETL Task3componentlakefs_test_latest_file.csvfile pathread_deploymentcomponent_submit_runcomponent_check_flow_runcomponentconcurrency_limitcomponentfor_eachcomponentbackpressureconceptMockValSercomponentSchemaSerializercomponent_mark_flow_run_as_cancelledcomponentset_flow_run_statecomponentrunnercomponentghost runsconceptKueuecomponentSIGTERMconceptSIGKILLconceptkubernetes-jobcomponentPendingconceptzombie flow runconceptterminal stateconceptCrashedcomponentLazarus ServicecomponentFailed stateconceptCloud hookscomponentagentcomponentworkercomponentflow runconcepton_failureconceptasyncpgcomponentSQLAlchemycomponentpgbouncercomponentprefect-servercomponentprefect-workercomponentrun_deploymentcomponenthttpxcomponentstate handlersconceptAutomations APIcomponent

Related Insights (16)

Pipeline task fails when required data column is missing from input filecritical

▸

Worker exits unexpectedly when API server returns HTTP 500critical

▸

Missing pipeline metrics delay data quality issue detectionwarning

▸

Alert flooding kills workers without concurrency limitscritical

▸

Pydantic MockValSer serialization error prevents container flow startupcritical

▸

Ghost runs persist when runners die without server notificationcritical

▸

Kubernetes autoscaler evicts running Prefect jobs during scale-downwarning

▸

Incorrect API configuration causes flow run failurescritical

▸

Empty flow logs indicate infrastructure startup failurecritical

▸

Kubernetes pods remain Pending and never start after 60 secondscritical

▸

Zombie flow runs prevent terminal state due to infrastructure or network failurescritical

▸

Distressed flows fail permanently after 3 Lazarus retry cyclescritical

▸

Flow run submission failure sets incorrect Failed state instead of Crashedwarning

▸

PostgreSQL connection pool exhaustion causes Prefect server errors and worker crashescritical

▸

Connection limit reached when orchestrating parallel sub-flows via run_deploymentcritical

▸

False crash notifications triggered during resource provisioning delaywarning

▸