ETCD Database Size Approaching Limit
criticalreliabilityUpdated Nov 10, 2024
ETCD database size (etcd_mvcc_db_total_size_in_use_in_bytes) exceeding 2GB indicates control plane stress, which can cause API server slowness and cluster instability.
Technologies:
How to detect:
Alert when etcd_mvcc_db_total_size_in_use_in_bytes >2GB. This is a Kubernetes control plane metric indicating storage pressure.
Recommended action:
Reduce event retention, clean up unused ConfigMaps/Secrets, reduce watch/list frequency from controllers. Consider cluster upgrade if on older version with known ETCD issues.