Celery

Monitoring system failure creates blind spot in operations

critical
availabilityUpdated Dec 17, 2025(via Exa)
Technologies:
How to detect:

The monitoring system itself (Kanchi) becomes unavailable, creating a blind spot where task failures go undetected. Health check endpoint fails to respond within 5 seconds.

Recommended action:

Implement monitoring of the monitoring system using a separate channel. Create cron job or external health check that queries monitoring system health endpoint. Alert via independent channel (SMS, PagerDuty) if health check fails. Deploy monitoring in high availability mode with replicas: 2. Set up database backups with documented RTO of 15 minutes and RPO of 24 hours.