Ceph

High Recovery Operations Compete with Client I/O

info
Resource ContentionUpdated Jan 7, 2026

When recovery operations (rebalancing, backfill) occur due to OSD failures or reweighting, they compete with client I/O for disk and network resources. High recovery rates indicate ongoing data movement that degrades client-facing performance.

How to detect:

Monitor ceph_recovery_objects_per_sec, ceph_recovery_keys_per_sec, and ceph_recovery_size_per_sec for elevated values. Check for 'recovery_wait' or 'backfilling' PG states. Correlate high recovery activity with client latency increases (ceph_apply_time_ms, ceph_commit_time_ms).

Recommended action:

During planned maintenance, pause recovery with `ceph osd set noout/norebalance` before removing OSDs. Tune osd_max_backfills (default 1) and osd_recovery_max_active (default 3) to balance recovery speed vs client impact. Consider scheduling heavy recovery operations during low-traffic periods.