Ceph

Slow OSD Operations Signal Performance Degradation

warning
latencyUpdated Jan 7, 2026

When OSDs report slow operations (requests taking >30s by default), it indicates disk I/O bottlenecks, journal issues, or network problems affecting cluster performance. This is one of the most common Ceph performance issues.

How to detect:

Monitor for slow operation warnings in cluster health output or logs. Check ceph_osd perf output for elevated commit/apply latencies. Use `ceph health detail` to identify OSDs with slow ops, and `ceph daemon osd.X dump_historic_slow_ops` to analyze patterns.

Recommended action:

Identify the operation type (osd_op for disk I/O, osd_repop for network/replica issues). For disk issues, check disk latency and journal device health. For replica issues, verify network connectivity between OSDs. Consider tuning osd_op_threads or investigating specific OSD hardware problems.