MinIO

Silent Healing Backlog

critical
reliabilityUpdated May 2, 2025

MinIO automatically heals corrupted or degraded objects in the background, but healing operations can fall behind during high load or drive failures, risking data durability without visible alerts.

How to detect:

Monitor healing metrics for growing backlog: objects_scanned vs objects_healed divergence, healing queue depth increasing over time, or healing_active_workers at maximum capacity while objects_remaining_to_heal grows.

Recommended action:

Alert when healing backlog exceeds threshold (e.g., 10,000 objects pending for >1 hour). Investigate drive health with dperf or fio to identify slow/failing drives. Check network bandwidth for distributed healing. Consider increasing healing worker pool if CPU/network headroom exists. Use 'mc admin heal' to manually trigger healing on critical buckets.