Memory Leak Diagnosis in NGINX Workers

criticalIncident Response

Diagnosing and resolving memory leaks in NGINX worker processes that cause gradual memory growth and eventual performance degradation.

Prompt: “My NGINX worker processes are showing steadily increasing memory usage over time, eventually leading to OOM kills. How do I determine if this is a configuration issue, an application memory leak, or an actual NGINX bug?”

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When diagnosing NGINX worker memory leaks, start by establishing whether all workers or only some are affected to differentiate config issues from request-specific leaks. Then check for OOM kills and shared memory zone exhaustion, which are the most common NGINX-specific causes. Finally, rule out upstream application leaks and configuration issues like unbounded caching or request bodies before suspecting an actual NGINX bug.

1Establish the memory growth pattern across workers

Track individual worker RSS over time using `ps aux | grep nginx:worker` or your monitoring stack. If all workers grow at the same rate, you're looking at a configuration issue affecting all traffic (shared memory, cache, or config-driven buffering). If only some workers grow while others stay stable, it's likely specific requests or connections triggering the leak, pointing toward either an NGINX bug or upstream application issue.

2Check for OOM kills and worker respawning

Monitor the `nginx-processes-respawned` metric to see if workers are being abnormally terminated and restarted. If this counter is increasing, the OS is killing workers due to memory exhaustion. Cross-reference with dmesg or /var/log/syslog for 'Out of memory: Kill process' messages to confirm OOM kills. Frequent respawning confirms you have a real memory leak, not just high but stable memory usage.

nginx_processes_respawned

3Inspect shared memory zone exhaustion

Check `nginx-slab-slot-fails` - if this counter is incrementing, you have shared memory exhaustion in zones like limit_req, limit_conn, ssl_session_cache, or custom zones. Compare `nginx-slab-pages-used` against `nginx-slab-pages-free` to see utilization percentage. This is one of the most common NGINX-specific memory issues and indicates you need to increase zone sizes or reduce the data being stored (e.g., shorter SSL session timeouts, smaller key sizes).

nginx_slab_slot_failsnginx_slab_pages_usednginx_slab_pages_freenginx_slab_slot_used

4Verify proxy cache isn't growing unbounded

Monitor `nginx-cache-size` and compare it to `nginx-cache-max-size`. If cache size is consistently at or near max, the cache manager may be struggling to evict old entries fast enough, causing memory pressure. If `nginx-cache-max-size` is not set or cache size keeps growing, you have unbounded cache growth. Also check if `proxy_cache_path` has `inactive` and `max_size` properly configured.

nginx_cache_sizenginx_cache_max_size

5Test for unbounded request body memory exhaustion

The `unbounded-upload-memory-exhaustion` insight shows that large uploads without size limits can exhaust memory. Verify `client_max_body_size` is set to a reasonable value (not overly large). Test by sending large POST/PUT requests while monitoring worker memory - if memory spikes and doesn't release after request completion, the upstream application may be buffering entire request bodies in memory rather than streaming them.

Unbounded upload size exhausts memory in Starlette applications

6Differentiate NGINX leaks from upstream application leaks

The `gunicorn-multiple-sites-memory-exhaustion` and `gunicorn-memory-swapping-causes-504-timeouts` insights highlight that upstream workers (Gunicorn, uWSGI, etc.) often leak memory, not NGINX itself. Monitor upstream process memory separately using `ps aux | grep gunicorn` or equivalent. If upstream workers are growing but NGINX workers are stable, the leak is in your application code, not NGINX. Check for swap usage with `free -m` - if swap exceeds 20-30% of total RAM, you have a capacity problem.

Hosting multiple sites on 1GB RAM server causes memory exhaustion Memory swapping causes slow response times and Nginx 504 timeouts

7Test gzip compression buffer exhaustion

Per the `nginx-gzip-large-response-buffer-exhaustion` insight, gzip compression of large responses can exhaust buffers and cause memory spikes. Temporarily disable gzip in your config (`gzip off;`) and reload NGINX, then monitor if memory growth stops or slows. If it does, tune `gzip_buffers`, reduce `gzip_min_length`, or selectively disable gzip for large responses rather than globally enabling it.

Nginx gzip compression with large responses may exhaust buffers causing CPU spike