Snapshot Recovery Delays Cluster Startup
inforeliabilityUpdated Feb 23, 2026
Snapshot recovery operations (snapshot_recovery_running) block shard activation during pod restarts, extending downtime and reducing cluster availability during deployments.
Technologies:
How to detect:
Monitor snapshot_recovery_running during pod initialization. Track time between kube_pod_start_time and kube_pod_status_ready transitions. Compare against typical startup times.
Recommended action:
Optimize snapshot size by reducing wal_segments_ahead or implementing incremental snapshots. Consider pre-warming replacement pods or increasing replica count to maintain availability during recoveries.