Worker Connections Exhausted During Traffic Spike

criticalCapacity Planning

NGINX worker processes hit connection limits, causing dropped connections and 503 errors during high traffic.

Prompt: “My NGINX server is showing 'worker_connections are not enough' errors and dropping connections. How do I determine if I need to increase worker_connections or add more worker processes based on my current traffic patterns?”

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When diagnosing worker_connections exhaustion in NGINX, start by confirming the limit is actually being hit by comparing current connections to your configured limits. Then determine where connections are stuck (reading, writing, or keep-alive), whether the bottleneck is at NGINX or upstream, and finally decide whether to increase worker_connections, add worker processes, or tune upstream capacity based on CPU utilization and traffic patterns.

1Confirm worker_connections limit is being reached

Check `nginx_backend_current` against your configured worker_connections × worker_processes limit. If current connections are at or near this ceiling during traffic spikes, you've confirmed the issue. Also verify the gap between `nginx_backend_accepted` and `nginx_backend_handled` — if accepted exceeds handled, NGINX is actively rejecting connections due to resource exhaustion. This is your smoking gun.

nginx_backend_currentnginx_backend_acceptednginx_backend_handled

2Identify where connections are getting stuck

Break down current connections by state using `nginx_net_reading`, `nginx_net_writing`, and `nginx_net_waiting`. If `nginx_net_waiting` is consuming most connections, keep-alive settings are too aggressive for your traffic volume — consider lowering keepalive_timeout. If `nginx_net_writing` dominates, slow responses or large payloads are tying up worker connections. This tells you whether to tune timeouts or increase capacity.

nginx_net_readingnginx_net_writingnginx_net_waiting

3Check if upstream saturation is the root cause

Compare `nginx_server_zone_processing` to `nginx_upstream_peers_active` during the spike. If both are elevated and requests are queuing, your backend workers are saturated, not just NGINX. The `request-queue-buildup-indicates-worker-exhaustion` pattern applies here — moderate CPU but rising tail latency and 502/504 errors indicate worker thread starvation at the upstream. If this is the case, increasing NGINX worker_connections won't help until you scale the backend.

Request Queue Buildup Indicates Worker Exhaustion nginx_server_zone_processingnginx_upstream_peers_active

4Analyze error patterns and connection drops

Look at `nginx_server_zone_responses_5xx` for 503 errors that correlate with the exhaustion window. Also check `nginx_server_zone_discarded` for requests completed without responses — these indicate connections dropped or timed out. High 5xx during connection exhaustion confirms user impact, while discarded requests point to connection table overflow. A sudden spike in both during peak traffic confirms capacity constraints.

nginx_server_zone_responses_5xxnginx_server_zone_discarded

5Correlate with traffic spike magnitude

Check `nginx_net_request_per_s` during the incident to quantify the traffic spike. Calculate your connections-per-request ratio to understand connection efficiency. If the request rate doubled but connections exhausted, your worker_connections setting is too conservative for realistic traffic variance. Also helps distinguish between a true traffic surge vs a connection leak or slowness issue.

nginx_net_request_per_s

6Decide: increase worker_connections vs add worker_processes

If CPU utilization during the spike was <60% and connections were evenly distributed across workers, simply increase worker_connections (default is 512, try 2048 or 4096). If CPU was >70% or you're already at 10K+ connections per worker, add more worker_processes instead (typically set to number of CPU cores). Don't forget to raise worker_rlimit_nofile to match. If upstream saturation is the root cause per step 3, scale backend workers first before touching NGINX settings.