Network Latency Between etcd Peers Degrading Consensus

warning

latencyUpdated Feb 9, 2026

High network round-trip time between etcd cluster members (>50ms) delays Raft consensus operations, causing proposal timeouts, failed heartbeats, and potential leader elections.

Sources

How to Troubleshoot etcd Issues in Kubernetesoneuptime.com

Simple troubleshooting guide for etcd performance issuesknowledge.broadcom.com

How to Monitor etcd Latency and Disk IO for Cluster Healthoneuptime.com

Key metrics for monitoring etcd - Datadogwww.datadoghq.com

Technologies:

etcdSymptoms of this issue are visible in etcd metrics and logs

How to detect:

Monitor etcd_network_peer_round_trip_time_seconds histogram. P99 should be <50ms (less than 3x heartbeat interval). Check for 'rafthttp: failed to dial' errors and 'request timed out' messages in logs.

Recommended action:

Verify network connectivity between control plane nodes using ping. Check for network partitions, firewall rules blocking ports 2379/2380, or bandwidth saturation. Consider increasing heartbeat-interval and election-timeout parameters if high latency is unavoidable. Ensure etcd members are deployed in the same region/datacenter.