Slow OSD Operations Signal Performance Degradation

warning

latencyUpdated Jan 7, 2026

When OSDs report slow operations (requests taking >30s by default), it indicates disk I/O bottlenecks, journal issues, or network problems affecting cluster performance. This is one of the most common Ceph performance issues.

Sources

How to Troubleshoot Ceph Performance Issues - OneUptimeoneuptime.com

Troubleshooting Guide Red Hat Ceph Storage 3 | Red Hat Customer Portalaccess.redhat.com

Chapter 4. Troubleshooting Ceph Monitors - Red Hat Documentationdocs.redhat.com

Technologies:

CephSymptoms of this issue are visible in Ceph metrics and logs

How to detect:

Monitor for slow operation warnings in cluster health output or logs. Check ceph_osd perf output for elevated commit/apply latencies. Use `ceph health detail` to identify OSDs with slow ops, and `ceph daemon osd.X dump_historic_slow_ops` to analyze patterns.

Recommended action:

Identify the operation type (osd_op for disk I/O, osd_repop for network/replica issues). For disk issues, check disk latency and journal device health. For replica issues, verify network connectivity between OSDs. Consider tuning osd_op_threads or investigating specific OSD hardware problems.