Unbalanced OSD Utilization Creates Hot Spots
warningWhen data distribution across OSDs is significantly uneven (high variance in ceph_osd_pct_used), some OSDs become hotspots handling disproportionate load while others remain underutilized. This reduces overall cluster performance and accelerates filling of busy OSDs.
Run `ceph osd df` to check utilization variance across OSDs. Calculate standard deviation from output tail. Alert when variance is high or when difference between highest and lowest utilized OSDs exceeds 20%. Monitor ceph_osd_pct_used distribution across all OSDs.
Review CRUSH map for imbalanced weight distribution or topology issues. Enable ceph balancer module to automatically redistribute PGs. Consider reweighting OSDs manually based on actual capacity/performance. Ensure similar devices have similar weights in CRUSH hierarchy.