Snapshot Operations Causing Latency Spikes
infolatencyUpdated Jun 4, 2025
Periodic etcd snapshots can cause temporary latency spikes, especially on HDDs, as the snapshot process competes for disk I/O with normal operations.
Sources
How to detect:
Monitor etcd_snap_fsync_time_seconds and etcd_snap_database_save_count_time_seconds_datadog histograms for latency spikes. Correlate snapshot timing with observed API server latency increases.
Recommended action:
Use SSDs instead of HDDs to minimize snapshot impact. Adjust snapshot count threshold (--snapshot-count) to balance snapshot frequency with database recovery time. Schedule high-priority workloads to avoid snapshot windows if predictable.