Per-partition monitoring generates excessive cardinality
warningperformanceUpdated Dec 19, 2025(via Exa)
Technologies:
How to detect:
Monitoring lag for every partition generates excessive time series that overwhelm storage. Example: 100 topics × 10 partitions × 5 groups × 3 metrics = 15,000 time series. At scale: 50,000 partitions × 3 metrics = 150,000 time series, producing 864 million data points daily, causing 10GB/day (300GB/month) storage impact.
Recommended action:
Track max lag per consumer group instead of per partition. Drill into partition-level metrics only when actively troubleshooting. Use P4 priority (on-demand only) for per-partition metrics.