Response Time Degradation Without Resource Saturation
warningWhen nginx_upstream_peers_response_time p95/p99 percentiles increase but nginx_upstream_peers_active stays low and backend resources (CPU, memory, database) appear healthy, the issue often lies in configuration inefficiencies: suboptimal keepalives, excessive DNS lookups, or poorly tuned caching.
Alert when nginx_upstream_peers_response_time_histogram shows p95 > 250ms or p99 > 500ms while nginx_upstream_peers_active < 50% of max_backend. Confirm nginx_cache_bypass_responses is high (indicating cache inefficiency) or nginx_resolver_responses_timedout is non-zero (DNS issues).
Disable HostnameLookups if enabled. Use IP addresses instead of hostnames in configuration. Optimize caching strategy: review nginx_cache_hit_responses vs nginx_cache_miss_responses ratio. Tune nginx_cache_max_size and implement cache warming for frequently accessed content. If proxying to backends, ensure upstream keepalive connections are enabled (keepalive directive in upstream block).