NGINX

HTTP Error Rate Spikes Require Multi-Layer Analysis

critical
reliabilityUpdated Feb 5, 2026

Increases in nginx_server_zone_responses_4xx or nginx_server_zone_responses_5xx require differentiation between client errors (4xx), NGINX configuration issues (502/503), and upstream failures (504, backend 5xx). The same metric can indicate completely different root causes depending on code distribution.

How to detect:

Alert when nginx_server_zone_responses_4xx or nginx_server_zone_responses_5xx rates exceed baseline thresholds. Differentiate: 502 suggests upstream connection failures (check nginx_upstream_peers_fails); 503 indicates server overload (check nginx_server_zone_processing vs capacity); 504 indicates timeout (check nginx_upstream_peers_response_time); 4xx may indicate client issues or upstream validation failures.

Recommended action:

Analyze error logs to determine specific HTTP codes. For 502: investigate nginx_upstream_peers_unavail and backend connectivity. For 503: check worker capacity (nginx_server_zone_processing). For 504: review nginx_upstream_peers_response_time and ProxyTimeout settings. For 4xx: correlate with application deployment events using nginx_generation or nginx_load_timestamp. Implement per-code alerting thresholds rather than aggregate error rates.