IstioKubernetes

Istio Control Plane Config Push Stall

critical
reliabilityUpdated Jan 7, 2026

When the istiod control plane experiences high load or lock contention, configuration updates to Envoy proxies are delayed by minutes, causing traffic to continue routing to terminated pods and resulting in 503 errors.

How to detect:

Monitor istio_pilot_xds_push_time and istio_pilot_proxy_convergence_time metrics. If push times exceed 30 seconds or proxy convergence exceeds 60 seconds during deployments, configuration propagation is stalled. Check for STALE or NOT SYNCED proxies in istioctl proxy-status output. Look for istio_pilot_xds_push_timeout_failures increasing.

Recommended action:

Investigate istiod CPU and memory usage with istio_go_memstats_heap_allocated_size and istio_process_cpu_seconds metrics. Scale istiod horizontally or increase resource limits. Use Sidecar resources to limit configuration scope per namespace, reducing the amount of config each proxy receives. Check for excessive Kubernetes events causing controller queue buildup.