etcd

Network Latency Between etcd Peers Degrading Consensus

warning
latencyUpdated Feb 9, 2026

High network round-trip time between etcd cluster members (>50ms) delays Raft consensus operations, causing proposal timeouts, failed heartbeats, and potential leader elections.

How to detect:

Monitor etcd_network_peer_round_trip_time_seconds histogram. P99 should be <50ms (less than 3x heartbeat interval). Check for 'rafthttp: failed to dial' errors and 'request timed out' messages in logs.

Recommended action:

Verify network connectivity between control plane nodes using ping. Check for network partitions, firewall rules blocking ports 2379/2380, or bandwidth saturation. Consider increasing heartbeat-interval and election-timeout parameters if high latency is unavoidable. Ensure etcd members are deployed in the same region/datacenter.