Confluent PlatformPrometheusGrafana

Missing monitoring of ISR shrinks and consumer lag delays incident response

warning
performanceUpdated Mar 13, 2025(via Exa)
How to detect:

Failing to monitor key metrics like ISR shrinks, broker/consumer lag, network throughput, and under-replicated partitions leads to delayed incident response and unnoticed performance degradation or failures.

Recommended action:

Alert on consumer lag, under-replicated partitions, ISR shrinks, and broker downtime. Monitor with Prometheus/Grafana or Confluent Control Center to track network throughput and broker health.