KVStore Quorum Loss Cascade
criticalWhen cilium_kvstore_quorum_errors_datadog increments, the cluster has lost consensus with the backing KVStore (etcd/consul). This prevents policy propagation, service discovery updates, and can cause cluster-wide connectivity failures as agents cannot sync state.
Monitor cilium_kvstore_quorum_errors_datadog for non-zero values. Check cilium_kvstore_initial_sync_completed to verify agents have completed initial sync. High cilium_kvstore_events_queue_seconds_datadog indicates events are backing up due to KVStore unavailability.
Verify etcd cluster health and network connectivity to etcd endpoints. Check 'cilium status' for KVStore connection status. Review etcd logs for consensus failures or leadership elections. Ensure etcd has sufficient resources and is not experiencing split-brain. Consider implementing etcd monitoring with proper alerting on leader elections and member health.