Ceph

Unbalanced OSD Utilization Creates Hot Spots

warning
Resource ContentionUpdated Jan 7, 2026

When data distribution across OSDs is significantly uneven (high variance in ceph_osd_pct_used), some OSDs become hotspots handling disproportionate load while others remain underutilized. This reduces overall cluster performance and accelerates filling of busy OSDs.

How to detect:

Run `ceph osd df` to check utilization variance across OSDs. Calculate standard deviation from output tail. Alert when variance is high or when difference between highest and lowest utilized OSDs exceeds 20%. Monitor ceph_osd_pct_used distribution across all OSDs.

Recommended action:

Review CRUSH map for imbalanced weight distribution or topology issues. Enable ceph balancer module to automatically redistribute PGs. Consider reweighting OSDs manually based on actual capacity/performance. Ensure similar devices have similar weights in CRUSH hierarchy.