Slow Response Time and Latency Diagnosis

warningIncident Response

Diagnosing whether slow response times originate from NGINX configuration, network issues, or backend application performance.

Prompt: My application is experiencing slow response times and I need to figure out if the bottleneck is in NGINX itself or in my backend services. How do I use request_time vs upstream_response_time to isolate where the latency is coming from?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When diagnosing slow response times with NGINX, the critical first step is comparing upstream response time to total request time to isolate whether the bottleneck is in NGINX or the backend. Then check if the latency affects all requests uniformly or just tail percentiles, which reveals whether you're dealing with capacity saturation, configuration issues, or intermittent backend problems. Finally, investigate connection pooling and backend-specific issues like event loop blocking.

1Compare upstream response time to identify the bottleneck location
Start by comparing `nginx-upstream-peers-response-time` (time waiting for backend) to your total request time logged in NGINX ($request_time). If upstream response time accounts for >80% of total request time, the backend is your bottleneck. If upstream time is low but total time is high, the issue is in NGINX's handling—network, SSL termination, client connection speed, or NGINX configuration. Also check `nginx-upstream-peers-header-time`—if it's significantly lower than response time, the backend processes quickly but sends large responses slowly.
2Check if latency is uniform or concentrated in tail percentiles
Look at `nginx-upstream-peers-response-time-histogram` to see the distribution—compare `nginx-upstream-peers-response-time-histogram-median` to p95/p99 percentiles. If median is acceptable (say <100ms) but p95 >250ms or p99 >500ms, you have tail latency issues rather than systemic slowness. Per the insight on response-time-degradation-without-resource-saturation, this pattern with low `nginx-upstream-peers-active` often indicates configuration inefficiencies like poor keepalive settings, excessive DNS lookups, or cache misses rather than true capacity problems.
3Verify if backend capacity is saturated or artificially limited
Check `nginx-upstream-peers-active` against your backend's maximum connection limit and compare request rate (`nginx-upstream-peers-requested` delta) to response rate (`nginx-upstream-peers-responses` delta)—they should match closely. If active connections are low (<50% of backend max) but latency is high, you likely have event-loop-blocking-causes-serial-request-processing: your async backend is making blocking I/O calls that cause serial-like request handling despite async infrastructure. Also check `nginx-server-zone-processing`—if it's growing, requests are queuing in NGINX waiting for upstream capacity.
4Distinguish between backend processing delay and data transfer issues
Compare `nginx-upstream-peers-header-time` to `nginx-upstream-peers-response-time`. If header time is high, your backend is slow to start processing requests (database queries, authentication, business logic). If header time is acceptable but total response time is much higher, the issue is transferring the response body—either the responses are very large, network bandwidth is constrained, or there's compression overhead. This distinction tells you whether to optimize backend logic or adjust response sizes and buffering.
5Investigate connection pooling and keepalive efficiency
Check `nginx-net-writing` (connections waiting on upstream or writing responses) alongside `nginx-upstream-peers-active`. If net-writing is high relative to active upstream connections, NGINX is spending significant time writing responses back to slow clients or waiting on backends. Review your keepalive settings (keepalive directive in upstream block)—insufficient keepalive connections force NGINX to repeatedly establish new connections to backends, adding latency. The response-time-degradation-without-resource-saturation insight specifically calls out suboptimal keepalives as a common cause of tail latency.
6Profile backend application if upstream is confirmed as the bottleneck
If steps above confirm high `nginx-upstream-peers-response-time` is driving the problem, instrument your backend application. For async frameworks (FastAPI, Node.js), measure event loop lag to detect blocking operations—synchronous database calls, CPU-heavy JSON processing, or blocking SDK calls that violate async patterns. The elevated-request-duration-degradation insight suggests correlating backend latency with database query performance, external API calls, and worker capacity. Use application profiling tools (py-spy for Python, clinic.js for Node) to identify hot code paths.

Technologies

Related Insights

Response Time Degradation Without Resource Saturation
warning
When nginx_upstream_peers_response_time p95/p99 percentiles increase but nginx_upstream_peers_active stays low and backend resources (CPU, memory, database) appear healthy, the issue often lies in configuration inefficiencies: suboptimal keepalives, excessive DNS lookups, or poorly tuned caching.
Express handling non-application tasks reduces specialized capacity
warning
Elevated request duration indicates performance degradation
warning
Latency Distribution Drift Under Sustained Load
warning
Services with hidden event loop blocking show subtle latency drift over time, where median response times remain acceptable but p95/p99 percentiles creep upward over weeks. This gradual degradation becomes acute during traffic spikes, revealing that the service is operating with marginal concurrency headroom.
Event Loop Blocking Causes Serial Request Processing
critical
When NGINX proxies to async application servers (FastAPI, Node.js) but those backends make blocking I/O calls, the event loop stalls, causing serial-like request processing despite async infrastructure. Symptoms include flat throughput curves and rising tail latency even when CPU is moderate.

Relevant Metrics

Monitoring Interfaces

NGINX Datadog
NGINX OpenTelemetry