PrometheusGitLab CIGrafana

Change Failure Rate Increase from Inadequate Pre-Production Testing

critical
reliabilityUpdated Dec 10, 2025

Deployments to production fail or require rollback at a higher rate than expected, indicating gaps in pre-production testing coverage, environment parity issues, or insufficient deployment validation.

How to detect:

Monitor deployment outcomes and track failed deployments or rollbacks within 24 hours of deployment. Calculate Change Failure Rate as (failed deployments + rollbacks) / total deployments. Alert when CFR exceeds team threshold (e.g., >15% for mature teams). Correlate with error rates and incident creation timestamps post-deployment.

Recommended action:

Implement deployment validation gates using GitLab's synthetic monitoring or external observability checks. Add smoke tests that run immediately post-deployment. Improve staging environment parity with production. Implement progressive deployment strategies with automatic rollback on error rate increases. Review incidents tagged to deployments to identify common failure patterns and add corresponding tests.