Load Balancing Algorithm Selection and Upstream Health

warningProactive Health

Choosing the right load balancing algorithm (round-robin, least_conn, ip_hash) and configuring health checks for optimal traffic distribution.

Prompt: My NGINX load balancer is using round-robin but some backend servers are getting overloaded while others are idle. How do I choose between least_conn, ip_hash, and weighted round-robin based on my application characteristics and connection patterns?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When round-robin causes uneven load despite equal connection distribution, the root cause is usually long-lived connections (HTTP/2, WebSockets, gRPC) where requests-per-connection vary wildly, not connection count. Start by comparing active connections to total request distribution, then check if backends have different performance characteristics or if your application requires session affinity before choosing between least_conn, ip_hash, or weighted balancing.

1Compare active connections vs request distribution across backends
First, look at `nginx-upstream-peers-active` and `nginx-upstream-peers-requested` across all upstreams. If active connections are roughly equal (say 50, 48, 52 per backend) but total requests are wildly different (100K vs 500K vs 50K), you've identified the problem: round-robin distributes connections evenly, but with long-lived HTTP/2, WebSocket, or gRPC connections, individual connections carry vastly different request loads. This is the most common cause of 'uneven' round-robin and points you toward least_conn as the solution.
2Check if backends have different performance characteristics
Compare `nginx-upstream-peers-response-time` across backends—if some are consistently >2x slower than others, you have capacity imbalance. Also check `nginx-upstream-peers-fails` and the ratio of `nginx-upstream-peers-responses-5xx` to `nginx-upstream-peers-responses-2xx`. Backends showing elevated failures or slow response times are under-provisioned or degraded, which makes them poor candidates for equal distribution. In this case, either use weighted round-robin with `nginx-upstream-peers-weight` favoring the faster backends, or switch to least_conn to automatically route to less-loaded servers.
3Review backend health check status for degradation signals
Check `nginx-upstream-peers-health-checks-fails` and `nginx-upstream-peers-health-checks-unhealthy` on the 'overloaded' backends. The insight on `health-check-failures-indicate-upstream-degradation` shows that health check failures often precede visible user errors—if these backends are intermittently failing health checks or showing elevated `nginx-upstream-peers-unavail` counts, they're hitting resource limits (CPU, memory, event loop saturation). This confirms they can't handle equal load and need either capacity adjustments or least_conn to adaptively reduce their traffic.
4Identify if connections are long-lived and causing request imbalance
If `nginx-upstream-peers-active` stays persistently high (connections lasting minutes or hours) and correlates with uneven request distribution, you have long-lived connection patterns—common with HTTP/2 multiplexing, gRPC, or WebSocket workloads. The `load-balancer-queue-buildup-from-backend-concurrency-limits` insight describes how backends can appear busy despite available CPU when event loops are saturated with requests over long-lived connections. For these workloads, least_conn is essential because it routes new connections to backends with fewer active connections, automatically balancing the load that round-robin cannot see.
5Determine if your application requires session affinity
Check if your application uses stateful sessions or persistent WebSocket connections. The insights on `load-balancer-no-sticky-sessions` and `session-affinity-not-configured` show that without session persistence, Socket.io clients oscillate between backends causing handshake resets, and session-based apps break when requests hit different servers. If session affinity is required, use ip_hash (for client IP-based stickiness) or configure cookie-based sticky sessions—do not use least_conn for stateful workloads, as it will break session continuity. Session affinity overrides performance-based routing.
6Select the appropriate algorithm based on your findings
Based on the diagnostic results: use least_conn for long-lived connections (HTTP/2, gRPC) or when backends have varying response times; use ip_hash or cookie-based sticky sessions when session affinity is required (WebSockets, stateful sessions); use weighted round-robin with `nginx-upstream-peers-weight` when backends have different capacities but connections are short-lived. For most modern applications with HTTP/2 and connection pooling, least_conn outperforms round-robin because it adapts to actual backend load, not just connection count. Monitor `nginx-upstream-peers-active` post-change to verify more even distribution.

Technologies

Related Insights

Health Check Failures Indicate Upstream Degradation
warning
Upstream backend health check failures (nginx_stream_upstream_peers_health_checks_fails, nginx_stream_upstream_peers_health_checks_unhealthy) provide early warning of backend degradation before user-facing errors occur. These often precede increases in nginx_upstream_peers_fails and nginx_upstream_peers_downtime.
Load Balancer Queue Buildup from Backend Concurrency Limits
critical
When backend FastAPI workers experience event loop blocking, NGINX ingress accumulates requests in upstream queues as backends appear busy despite having available CPU capacity. This results in increased queue wait times and eventually gateway timeouts for clients, creating a cascading latency problem.
Load balancer without sticky sessions causes repeated handshake resets
critical
Session requests routed to wrong worker break user sessions
warning

Relevant Metrics

Monitoring Interfaces

NGINX Datadog
NGINX OpenTelemetry