Heartbeat thread stability masks application thread processing failures
warningSince KIP-62, Kafka consumer uses separate threads: Heartbeat Thread sends keepalives to Group Coordinator, while Application Thread calls poll() and processes records. When application thread stalls but JVM remains running, heartbeat thread continues operating normally. Coordinator thinks member is stable while processing has actually halted. Low heartbeat-response-time-max combined with high rebalance-rate indicates application thread issue, not network/crash.
Monitor heartbeat-response-time-max alongside rebalance-rate. If heartbeat-response-time-max is low (<100ms) but rebalance-rate is high (>10/hr), diagnose application thread: check for GC pauses, slow database queries, or heavy compute in poll() loop. Do not increase session.timeout.ms to fix this - that detects network/crash issues. Instead, increase max.poll.interval.ms or offload processing to worker threads.