Technologies/PostgreSQL/dagster.daemon.healthy
PostgreSQLPostgreSQLMetric

dagster.daemon.healthy

Daemon health status
Dimensions:None

Technical Annotations (44)

Configuration Parameters (6)
minimum_interval_secondsrecommended: 120 or higher
Sensor polling interval; higher values reduce GIL contention but may miss events
readinessProbe.enabledrecommended: false
Temporary workaround to prevent crashloops; disables health checking
readinessProbe.timeoutSecondsrecommended: 10
Default timeout for health check command execution
readinessProbe.failureThresholdrecommended: 15
Allows ~300 seconds before pod marked unready
DAGSTER_RUN_QUEUE_PAGE_SIZErecommended: >100 for large queues
Default 100 processes queue too slowly when hundreds/thousands of runs queued (1.11.3+)
dagsterDaemon.heartbeatTolerance
Decreasing this value to trigger auto-restart does not resolve daemon hangs
Error Signatures (8)
StatusCode.DEADLINE_EXCEEDEDerror code
Deadline Exceededlog pattern
DagsterUserCodeUnreachableErrorexception
Readiness probe failed: command "dagster api grpc-health-check -p 3030" timed outlog pattern
User code server request timed out due to taking longer than 2 seconds to completelog pattern
DagsterExecutionInterruptedErrorexception
SIGINTerror code
SIGTERMerror code
CLI Commands (11)
dagster api grpc-health-check -p 3030diagnostic
time dagster api grpc-health-check -p 3030diagnostic
py-spy top --pid <dagster-process-id>diagnostic
docker compose stop dagster-daemonremediation
docker compose start dagster-daemonremediation
SELECT asset_key, dagster_event_type, timestamp FROM event_logs WHERE asset_key = '["my_group", "my_layer", "removed_asset"]' ORDER BY timestamp DESC;diagnostic
SELECT asset_key, last_materialization_timestamp, wipe_timestamp FROM asset_keys WHERE asset_key = '["my_group", "my_layer", "removed_asset"]';diagnostic
dagster dev -w workspace.yamldiagnostic
kill -9remediation
dagster-daemon rundiagnostic
dagster api grpc-health-check -p 3010diagnostic
Technical References (19)
Python GILconceptgRPC servercomponentrun_status_sensorcomponentcode locationcomponentThreadPoolExecutorcomponentfreshness daemoncomponentwipeAssetscomponentevent_logscomponentasset_keyscomponentFRESHNESS_STATE_CHANGEconceptRun Queue Coordinator DaemoncomponentQueued Run Coordinatorcomponentpg_logfile pathpg_stat_activitycomponentDockerRunLaunchercomponentDAGSTER_HOMEfile pathworkspace.ymlfile pathcode location podcomponentgrpc servercomponent
Related Insights (6)
Sensors and schedules block gRPC health checks via GIL contentioncritical
Removed assets permanently reappear in catalog after wipeAssets callwarning
Run Queue Daemon hangs with stopped heartbeats and queued jobs accumulatecritical
Dagster daemon heartbeat stops without error logscritical
Dagster daemon terminated by OS signal due to resource exhaustioncritical
Code location pod loses gRPC connection with sustained readiness failurewarning