Chroma

Collection Deletion Not Freeing Disk Space

info
storageUpdated Mar 3, 2026

Deleting a collection in Chroma removes metadata and marks vectors as deleted but may not immediately reclaim disk space, especially if vacuum has not been run. SQLite DELETE operations mark space as free but don't shrink the database file. HNSW indices may leave orphaned files. Disk usage remains high after collection deletion until explicit cleanup (vacuum, manual file deletion).

Technologies:
How to detect:

Collection deletion succeeds but disk usage does not decrease proportionally. Deleted collection space remains allocated in database files. Disk usage continues growing despite stable active collection count. Expected space reclamation (collection_size * 1.5x) does not occur within hours of deletion.

Recommended action:

1. Understand behavior: Collection deletion in Chroma: removes metadata from SQLite, marks HNSW index files for deletion, but SQLite doesn't auto-compact. Disk space marked as 'free' internally but file size unchanged. Vacuum is required to reclaim space: compacts SQLite database, rebuilds index files, actually frees disk space. 2. Perform vacuum: Run vacuum after deleting collections: `chroma utils vacuum` (CLI) or programmatically via API. This rebuilds database and reclaims space. Expect disk usage to drop proportionally to deleted data size. Vacuum is I/O intensive: schedule during low-traffic periods, may take minutes to hours for large databases. 3. Manual cleanup (if needed): For HNSW index files, check persist directory for orphaned files. Index files typically named by collection UUID. Delete files for deleted collections if not cleaned up automatically. Verify collection UUID before deleting files to avoid data loss. 4. Monitor space reclamation: Track disk usage before and after vacuum. Verify expected space freed: approximately deleted_collection_size * 1.5x (includes metadata overhead). Alert if space not reclaimed after vacuum — may indicate corruption or orphaned files. 5. Automate cleanup: Schedule periodic vacuum for systems with frequent collection deletion: weekly if deletions are common, monthly for stable environments. Implement post-deletion vacuum in application logic: async vacuum after delete if space reclamation is critical. 6. Capacity planning: Account for delayed space reclamation in capacity planning: provision 1.5-2x working set size to accommodate deleted space before vacuum. Plan vacuum windows to ensure space reclamation before capacity limits.