Monitor Store Growth Causes Query Delays

warning

Resource ContentionUpdated Jun 4, 2024

When the monitor store (RocksDB database) grows excessively large (default warning at 15GB), monitor queries slow down, potentially causing client timeouts and leader election delays. If /var partition fills completely, monitors terminate.

Sources

Troubleshooting Guide Red Hat Ceph Storage 3 | Red Hat Customer Portalaccess.redhat.com

Chapter 4. Troubleshooting Ceph Monitors - Red Hat Documentationdocs.redhat.com

Chapter 4. Troubleshooting Ceph Monitorsdocs.redhat.com

Technologies:

CephSymptoms of this issue are visible in Ceph metrics and logs

How to detect:

Check `ceph health detail` for 'store is getting too big' warnings. Inspect monitor store size at /var/lib/ceph/mon-<hostname>/store.db. Alert when store exceeds 15GB or /var partition usage exceeds 80%.

Recommended action:

Use `ceph-monstore-tool` to compact the store - never manually delete monitor data. Increase mon_data_avail_warn threshold if legitimate. If store corruption occurs ('Corruption: error in middle of record'), follow monitor recovery procedures or replace the monitor.