Database Size Bloat Without Vacuuming
warningChroma's SQLite database accumulates deleted/updated record space without automatic cleanup. Before v0.5.6, no automatic pruning existed. Post v0.5.6, automatic pruning is enabled after manual vacuum, but legacy databases and high-churn workloads still require periodic manual intervention to reclaim disk space.
Database directory size grows significantly beyond expected data volume. Growth rate exceeds new data ingestion rate, indicating accumulation of deleted space. Disk usage is 2-5x larger than active data size. For v0.5.6+, check if initial vacuum has been run to enable auto-pruning.
1. Assess: Check current database size: `du -sh <chroma_persist_directory>`. Estimate expected size: (num_vectors * dimensions * 4 bytes) + metadata overhead. Calculate bloat ratio: actual_size / expected_size. 2. Check version: If Chroma >= 0.5.6, verify if initial vacuum has been run to enable auto-pruning. Check logs for vacuum events. If < 0.5.6, manual vacuuming is required. 3. Perform vacuum: Use Chroma CLI: `chroma utils vacuum` or programmatically: client.vacuum(). This reclaims deleted space and enables auto-pruning in v0.5.6+. Expect significant disk space reclamation (30-70% in high-churn environments). 4. Schedule maintenance: For v0.5.6+, vacuum once to enable auto-pruning. Monitor to verify auto-pruning works. For < 0.5.6 or very high churn, schedule periodic vacuum: weekly for high update rate (>10% daily), monthly for moderate update rate. 5. Monitor growth: Track database size daily. Alert if growth rate exceeds 2x ingestion rate. Alert if size exceeds 3x expected. 6. Prevent: Upgrade to v0.5.6+ for auto-pruning. Design update patterns to minimize churn: batch updates, avoid frequent metadata updates. Plan disk capacity for 2-3x working set size to accommodate bloat between vacuums.