Azure AKSetcd

ETCD Database Size Approaching Limit

critical
reliabilityUpdated Nov 10, 2024

ETCD database size (etcd_mvcc_db_total_size_in_use_in_bytes) exceeding 2GB indicates control plane stress, which can cause API server slowness and cluster instability.

How to detect:

Alert when etcd_mvcc_db_total_size_in_use_in_bytes >2GB. This is a Kubernetes control plane metric indicating storage pressure.

Recommended action:

Reduce event retention, clean up unused ConfigMaps/Secrets, reduce watch/list frequency from controllers. Consider cluster upgrade if on older version with known ETCD issues.