Missing monitoring of ISR shrinks and consumer lag delays incident response
warningperformanceUpdated Mar 13, 2025(via Exa)
Sources
Technologies:
How to detect:
Failing to monitor key metrics like ISR shrinks, broker/consumer lag, network throughput, and under-replicated partitions leads to delayed incident response and unnoticed performance degradation or failures.
Recommended action:
Alert on consumer lag, under-replicated partitions, ISR shrinks, and broker downtime. Monitor with Prometheus/Grafana or Confluent Control Center to track network throughput and broker health.