Chroma

Database Size Bloat Without Vacuuming

warning
storageUpdated Feb 18, 2025

Chroma's SQLite database accumulates deleted/updated record space without automatic cleanup. Before v0.5.6, no automatic pruning existed. Post v0.5.6, automatic pruning is enabled after manual vacuum, but legacy databases and high-churn workloads still require periodic manual intervention to reclaim disk space.

How to detect:

Database directory size grows significantly beyond expected data volume. Growth rate exceeds new data ingestion rate, indicating accumulation of deleted space. Disk usage is 2-5x larger than active data size. For v0.5.6+, check if initial vacuum has been run to enable auto-pruning.

Recommended action:

1. Assess: Check current database size: `du -sh <chroma_persist_directory>`. Estimate expected size: (num_vectors * dimensions * 4 bytes) + metadata overhead. Calculate bloat ratio: actual_size / expected_size. 2. Check version: If Chroma >= 0.5.6, verify if initial vacuum has been run to enable auto-pruning. Check logs for vacuum events. If < 0.5.6, manual vacuuming is required. 3. Perform vacuum: Use Chroma CLI: `chroma utils vacuum` or programmatically: client.vacuum(). This reclaims deleted space and enables auto-pruning in v0.5.6+. Expect significant disk space reclamation (30-70% in high-churn environments). 4. Schedule maintenance: For v0.5.6+, vacuum once to enable auto-pruning. Monitor to verify auto-pruning works. For < 0.5.6 or very high churn, schedule periodic vacuum: weekly for high update rate (>10% daily), monthly for moderate update rate. 5. Monitor growth: Track database size daily. Alert if growth rate exceeds 2x ingestion rate. Alert if size exceeds 3x expected. 6. Prevent: Upgrade to v0.5.6+ for auto-pruning. Design update patterns to minimize churn: batch updates, avoid frequent metadata updates. Plan disk capacity for 2-3x working set size to accommodate bloat between vacuums.