HTTP 5xx error rate exceeds 1% threshold causing user-facing failures
criticalavailabilityUpdated Feb 20, 2026
Technologies:
How to detect:
User-facing error rate (HTTP 5xx responses) exceeds 1% for at least 5 minutes, indicating service degradation affecting users.
Recommended action:
1. Check which endpoints are returning errors using: sum by (endpoint) (rate(http_requests_total{status=~"5.."}[5m])). 2. Check recent deployments in the last 2 hours. 3. Check downstream service health. 4. Check database connectivity and query latency. 5. If caused by bad deployment: roll back. If caused by downstream service: check that service status. If caused by database issues: check connection pool and query performance. 6. Escalate to service owner if not resolved within 30 minutes.