Heartbeat thread stability masks application thread processing failures

warning

performanceUpdated Mar 5, 2026(via Exa)

Sources

The Rebalance Spiral: Debugging Cooperative Sticky Assigner Livelocks in Kafka Consumer Groupsazguards.com

Technologies:

Confluent Platformsubject

Apache KafkaThe root cause of this issue originates in Apache Kafka

How to detect:

Since KIP-62, Kafka consumer uses separate threads: Heartbeat Thread sends keepalives to Group Coordinator, while Application Thread calls poll() and processes records. When application thread stalls but JVM remains running, heartbeat thread continues operating normally. Coordinator thinks member is stable while processing has actually halted. Low heartbeat-response-time-max combined with high rebalance-rate indicates application thread issue, not network/crash.

Recommended action:

Monitor heartbeat-response-time-max alongside rebalance-rate. If heartbeat-response-time-max is low (<100ms) but rebalance-rate is high (>10/hr), diagnose application thread: check for GC pauses, slow database queries, or heavy compute in poll() loop. Do not increase session.timeout.ms to fix this - that detects network/crash issues. Instead, increase max.poll.interval.ms or offload processing to worker threads.