Ceph

Monitor Store Growth Causes Query Delays

warning
Resource ContentionUpdated Jun 4, 2024

When the monitor store (RocksDB database) grows excessively large (default warning at 15GB), monitor queries slow down, potentially causing client timeouts and leader election delays. If /var partition fills completely, monitors terminate.

How to detect:

Check `ceph health detail` for 'store is getting too big' warnings. Inspect monitor store size at /var/lib/ceph/mon-<hostname>/store.db. Alert when store exceeds 15GB or /var partition usage exceeds 80%.

Recommended action:

Use `ceph-monstore-tool` to compact the store - never manually delete monitor data. Increase mon_data_avail_warn threshold if legitimate. If store corruption occurs ('Corruption: error in middle of record'), follow monitor recovery procedures or replace the monitor.