Ingress Controller Performance Under High Traffic
warningCapacity Planning
Determine if NGINX Ingress Controller is the bottleneck for application latency under high load.
Prompt: “We're seeing increased response times during peak traffic and I think the NGINX Ingress Controller might be overwhelmed — how do I tell if we need to scale up ingress replicas or tune the configuration?”
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
When diagnosing NGINX Ingress Controller performance under high traffic, start by checking if the ingress pods themselves are resource-saturated, then verify traffic is distributed evenly across replicas. Before scaling up ingress, rule out backend saturation showing up as queue buildup, and check for network errors or bandwidth saturation that might indicate configuration issues rather than capacity problems.
1Check ingress controller resource saturation
First, look at `kubernetes_cpu_usage` and `kubernetes_memory_usage` for your NGINX ingress controller pods and compare them to `kubernetes_cpu_limits` and `kubernetes_memory_limits`. If CPU is consistently above 80% of limits during peak traffic or memory is approaching limits, you've found your bottleneck and need to either increase resource limits or scale horizontally. If resources are well below limits (say, 40-60% CPU), the problem is likely elsewhere—don't just throw more replicas at it.
2Verify traffic distribution across ingress pods
Check your Ingress resource definitions for wildcard host configurations (e.g., `host: '*'`) and compare `kubernetes_network_rx_size` across ingress pods to see if one pod is handling disproportionate traffic. The `single-ingress-container-overload-during-traffic-spikes` insight shows this is a common misconfiguration where all traffic hits a single container despite having multiple replicas, creating a single point of failure. If you see one pod receiving 80%+ of traffic while others are idle, you need to fix your Ingress rules to use specific hosts rather than wildcards.
3Distinguish ingress saturation from backend saturation
This is critical and often misdiagnosed: check your backend application pods' `kubernetes_cpu_usage` alongside ingress metrics. If backend pods show moderate CPU (40-70%) while response times degrade, you're likely hitting backend concurrency limits from event loop blocking, not ingress capacity issues. The `load-balancer-queue-buildup-from-backend-concurrency-limits` insight explains how NGINX queues requests when backends appear busy despite having CPU headroom. In this case, scaling ingress won't help—you need to address backend concurrency or add backend replicas instead.
4Check network bandwidth and error rates
Look at `kubernetes_network_rx_size` and `kubernetes_network_transaction_size` to see if you're hitting network bandwidth limits (typically 1-10 Gbps depending on your setup), and check `kubernetes_network_errors` during peak periods. High error rates (>1% of connections) when you're not at bandwidth limits often indicate configuration issues like insufficient worker connections, low keepalive settings, or timeout mismatches. These are tuning problems, not capacity problems—scaling won't fix them.
5Rule out pod churn as the root cause
Check if latency spikes and `kubernetes_network_errors` correlate with deployment events or pod restarts rather than traffic volume itself. The `network-error-rate-spike-during-pod-churn` insight shows connection failures often spike when pod IPs change during rollouts, as traffic continues hitting terminating pods. If your errors happen during deployments regardless of traffic level, you need proper preStop hooks with 10-30 second delays and connection draining configured, not more ingress capacity.
Technologies
Related Insights
Single Ingress Container Overload During Traffic Spikes
critical
Configuring Ingress with wildcard host directs all traffic to one container, overwhelming it during spikes and potentially taking down the entire cluster. Load balancing capabilities are unused, creating single point of failure.
Load Balancer Queue Buildup from Backend Concurrency Limits
critical
When backend FastAPI workers experience event loop blocking, NGINX ingress accumulates requests in upstream queues as backends appear busy despite having available CPU capacity. This results in increased queue wait times and eventually gateway timeouts for clients, creating a cascading latency problem.
Network Error Rate Spike During Pod Churn
warning
Connection failures and network errors spike during rolling deployments or node failures when pod IPs change faster than service mesh or load balancer updates can propagate. Downstream services continue sending traffic to terminating pods or stale endpoints.