Apache ZooKeeper

Network Interface Errors Cause Intermittent Disconnects

warning
Connection ManagementUpdated May 20, 2022

NIC misconfigurations, bad firmware, or hardware issues can cause packet loss and high latency under load, leading to client disconnects and poor ZooKeeper performance even when other metrics appear normal.

How to detect:

Monitor network interface error counters from ifconfig (RX/TX errors, drops, overruns) and correlate with ZooKeeper client disconnect events. Alert when error rate exceeds 0.1% of total packets or when sudden spikes occur during high load periods.

Recommended action:

Use ifconfig and ethtool to examine NIC statistics and configuration. Test network throughput between ensemble members with iperf. Verify NIC drivers and firmware are up to date. Check for duplex mismatches or speed negotiation issues. Replace or reconfigure faulty NICs.