When the istiod control plane experiences high load or lock contention, configuration updates to Envoy proxies are delayed by minutes, causing traffic to continue routing to terminated pods and resulting in 503 errors.
Envoy sidecar proxies consuming 2GB+ memory per pod, causing OOMKills and degraded service performance. This occurs when Istio pushes massive configurations to sidecars in large clusters or with poorly scoped routing rules.
Expired or failing-to-rotate certificates cause widespread service-to-service authentication failures, resulting in complete traffic loss between services when mTLS is in STRICT mode.
Istio proxy CPU overhead reaches 40%+ of total cluster CPU during high traffic periods, causing throttling and increased latency. Each request going through Envoy routing logic adds 50-60ms p95 latency.
Incorrectly configured VirtualService routing rules cause traffic to be silently dropped or routed to wrong destinations. Common issues include case-sensitive mismatches, missing DestinationRules, or conflicting route priorities.
Galley component fails to discover pod endpoints for services, preventing Istio from routing traffic correctly. This manifests as istio_galley_endpoint_no_pod errors and results in 503 upstream connection failures.
Mixer adapter configuration errors prevent telemetry and policy enforcement from working correctly. This causes metrics gaps and policy violations to go undetected.
The istiod controller queue builds up to 20K+ events during high pod churn, causing minutes of delay before endpoint updates are processed. This results in traffic being sent to terminated pods long after they've been deleted.