CiliumKubernetes

Operator CES Sync Error Cascade

critical
reliabilityUpdated Feb 23, 2026

cilium_operator_ces_sync_errors indicates failures in synchronizing CiliumEndpointSlice resources. This breaks endpoint aggregation, causing operators to fail updating global service state and potentially leading to incomplete service load balancing across the cluster.

How to detect:

Monitor cilium_operator_ces_sync_errors for increments. Check cilium_operator_ces_queueing_delay_seconds_datadog to identify if sync processing is lagging. High cilium_operator_count_ceps_per_ces_datadog may indicate scalability issues with large endpoint sets.

Recommended action:

Review Cilium operator logs for specific sync failure reasons. Verify CRD versions are compatible with operator version. Check operator CPU and memory resources. Increase operator replicas if queueing delay is high. Consider breaking up large services into smaller endpoint groups if CES size is the bottleneck.