MinIO

Drive Performance Outlier Detection

critical
Resource ContentionUpdated May 2, 2025

In distributed MinIO deployments, a single slow or failing drive can bottleneck all write operations due to erasure coding requirements, but standard monitoring may not surface per-drive latency.

How to detect:

Use dperf or fio to baseline individual drive throughput. Alert when any single drive shows >30% lower throughput than peers, or when drive latency (p99) exceeds 2x cluster median. Monitor for drives with increasing error counts or timeouts in MinIO logs.

Recommended action:

Run 'mc admin speedtest drive' or use dperf to identify slow drives. Check drive health with smartctl. Verify drive is properly connected (check NUMA/PCIe topology with hwloc-ls). Replace failing drives immediately. Use 'mc support diag' to collect detailed drive diagnostics for MinIO support analysis.