Health Check Failures Indicate Upstream Degradation

warning

reliabilityUpdated Mar 16, 2017

Upstream backend health check failures (nginx_stream_upstream_peers_health_checks_fails, nginx_stream_upstream_peers_health_checks_unhealthy) provide early warning of backend degradation before user-facing errors occur. These often precede increases in nginx_upstream_peers_fails and nginx_upstream_peers_downtime.

Sources

Monitoring Apache web server performance | Datadogwww.datadoghq.com

Technologies:

NGINXSymptoms of this issue are visible in NGINX metrics and logs

How to detect:

Alert when nginx_stream_upstream_peers_health_checks_fails increases or nginx_stream_upstream_peers_health_checks_unhealthy > 0 for sustained periods. Correlate with nginx_upstream_peers_downtime and nginx_upstream_peers_downstart to identify flapping upstreams. Watch nginx_stream_upstream_peers_health_checks_last_passed timestamp for staleness.

Recommended action:

Investigate failing upstream servers for resource exhaustion, application errors, or network issues. Review health check configuration: ensure check intervals, timeouts, and failure thresholds match backend SLAs. Monitor nginx_upstream_peers_backup to confirm backup servers activate during primary failures. Consider implementing circuit breakers or rate limiting on degraded upstreams rather than full removal.