Apache Kafka

ZooKeeper Session Expiration Causing Broker Instability

critical
reliabilityUpdated Mar 2, 2026

When ZooKeeper sessions expire, brokers re-register causing controller changes, partition leadership changes, and temporary unavailability. Frequent expirations indicate ZooKeeper or network issues.

Technologies:
How to detect:

Monitor kafka.zookeeper.expire_rate > 0 or kafka.server.SessionExpireListener.ZooKeeperExpiresPerSec.OneMinuteRate > 0. Cross-reference with kafka.zookeeper.disconnect_rate and kafka.replication.leader_election_rate to confirm impact.

Recommended action:

1. Check ZooKeeper health: Verify ZooKeeper ensemble is healthy and has quorum. 2. Review network connectivity: Check network stability between brokers and ZooKeeper. 3. Tune ZooKeeper timeouts: Increase zookeeper.session.timeout.ms if network latency is high. 4. Monitor ZooKeeper load: Ensure ZooKeeper is not overloaded. 5. Check for GC pauses: Long JVM GC pauses can cause session timeouts. 6. Migrate to KRaft: Eliminate ZooKeeper dependency with KRaft mode. 7. Dedicated ZooKeeper nodes: Ensure ZooKeeper runs on dedicated hardware.