Keepalive Connection Pooling to Upstream Servers

warningProactive Health

Configuring upstream keepalive connections to reduce TCP handshake overhead and prevent port exhaustion when connecting to backends.

Prompt: I'm seeing a lot of TIME_WAIT sockets and my NGINX server is running out of ephemeral ports when connecting to backends. How should I configure upstream keepalive connections to reuse connections and avoid port exhaustion?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When investigating upstream port exhaustion in NGINX, start by confirming high connection churn rates and checking whether keepalive is actually configured. Then verify the HTTP/1.1 protocol requirements are met, tune the keepalive pool size to match your traffic volume, and finally check for connection pool saturation or backend health issues that prevent connection reuse.

1Confirm connection churn and lack of keepalive reuse
Check `nginx-net-backend-opened-per-s` to see how rapidly new connections are being established to backends. If this rate is high (hundreds or thousands per second) while `nginx-upstream-keepalive` shows zero or very low idle connections, you're creating a new TCP connection for every request instead of reusing them. Cross-reference with `nginx-upstream-peers-requested` to calculate the ratio—ideally you want 10-100+ requests per new connection opened.
2Verify upstream keepalive is configured
The most common cause of this issue is simply missing the keepalive directive in your upstream block. Check your NGINX config for `keepalive N` within the upstream block (e.g., `keepalive 32`). If it's absent, that's your root cause—NGINX defaults to closing connections after each request. As noted in the `nginx-keepalive-upstream-configuration` insight, without this directive, every request incurs full TCP handshake overhead.
3Ensure HTTP/1.1 and proper headers are set
Even with keepalive configured, NGINX defaults to HTTP/1.0 for proxy connections, which doesn't support persistent connections. Verify you have `proxy_http_version 1.1;` and `proxy_set_header Connection "";` in your location blocks. The empty Connection header overrides any "close" directive from clients. Without these, backends will close connections after each response regardless of your keepalive setting.
4Size the keepalive pool for your traffic volume
Compare `nginx-upstream-peers-active` (concurrent active connections) to your configured keepalive pool size. If active connections consistently exceed your keepalive value (e.g., 50 concurrent connections but keepalive 16), you're forcing connection creation during traffic spikes. For high-throughput APIs, start with keepalive 64-128. Monitor `nginx-net-backend-opened-per-s`—it should drop significantly after increasing the pool size to match your concurrency needs.
5Check for connection pool saturation blocking workers
Watch `nginx-upstream-peers-active` approaching backend capacity limits—if your PHP-FPM has pm.max_children=12 but NGINX tries to open 50 concurrent connections, you'll exhaust the backend pool. The `upstream-connection-pool-saturation-blocks-nginx-workers` insight warns this causes connection failures and forces reconnection attempts. Check `nginx-upstream-peers-fails` for rising failure counts, which indicate backends rejecting connections. Set per-upstream limits matching backend capacity to fail fast rather than saturating pools.
6Tune keepalive timeout for your workload pattern
If `nginx-upstream-keepalive` shows connections being maintained but `nginx-net-backend-opened-per-s` is still elevated during traffic valleys, your keepalive_timeout may be too low, causing premature closure. The `keepalive-misconfiguration-exhausts-connections` insight recommends 2-5s for API endpoints with short requests, 5-15s for static content. Too high wastes resources; too low defeats the purpose. Monitor `nginx-upstream-peers-response-time` to ensure timeout exceeds typical response latency by 2-3x.
7Investigate backend health if connection reuse remains low
If keepalive is properly configured but `nginx-upstream-keepalive` stays near zero while `nginx-upstream-peers-fails` increases, backends may be actively closing connections or timing out. Check `nginx-net-backend-dropped-per-s` for abnormal drops and correlate with backend logs. Backends that return `Connection: close` headers, have aggressive timeout settings, or are restarting frequently will prevent NGINX from maintaining the keepalive pool, forcing constant reconnection regardless of your NGINX configuration.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

NGINX Datadog
NGINX OpenTelemetry