Health Check Failures Indicate Upstream Degradation
warningUpstream backend health check failures (nginx_stream_upstream_peers_health_checks_fails, nginx_stream_upstream_peers_health_checks_unhealthy) provide early warning of backend degradation before user-facing errors occur. These often precede increases in nginx_upstream_peers_fails and nginx_upstream_peers_downtime.
Alert when nginx_stream_upstream_peers_health_checks_fails increases or nginx_stream_upstream_peers_health_checks_unhealthy > 0 for sustained periods. Correlate with nginx_upstream_peers_downtime and nginx_upstream_peers_downstart to identify flapping upstreams. Watch nginx_stream_upstream_peers_health_checks_last_passed timestamp for staleness.
Investigate failing upstream servers for resource exhaustion, application errors, or network issues. Review health check configuration: ensure check intervals, timeouts, and failure thresholds match backend SLAs. Monitor nginx_upstream_peers_backup to confirm backup servers activate during primary failures. Consider implementing circuit breakers or rate limiting on degraded upstreams rather than full removal.