Service Mesh CPU Tax Under Load
warningIstio proxy CPU overhead reaches 40%+ of total cluster CPU during high traffic periods, causing throttling and increased latency. Each request going through Envoy routing logic adds 50-60ms p95 latency.
Monitor istio_mesh_agent_process_cpu_seconds and compare to application container CPU usage. If sidecar CPU usage exceeds 50% of application CPU or if istio_pilot_go_gc_time_seconds shows excessive GC time, the mesh is causing significant overhead. Track request latency increases correlated with istio_go_goroutines spikes.
Review whether service mesh features being used justify the CPU overhead. Disable unused features like detailed telemetry or tracing sampling. Consider using Istio Ambient mode to reduce sidecar overhead for L4 traffic. Scale up node resources or optimize routing rules to reduce Envoy processing time. For high-throughput services, evaluate if direct service-to-service communication is more appropriate.