Retry storms overwhelm recovering backend services
criticalavailabilityUpdated Jan 23, 2026(via Exa)
Technologies:
How to detect:
When multiple clients retry simultaneously during service recovery, amplified request volume prevents the backend from recovering and can cause cascading failures.
Recommended action:
Reduce retry attempts to 2, increase initialInterval to 500ms minimum, apply circuit breaker middleware after retry middleware, enable rate limiting before retry middleware, and rely on Traefik's built-in jitter for retry timing randomization.