HNSW Index Fragmentation from Updates

warning

performanceUpdated Feb 18, 2025

HNSW indices are optimized for insert-heavy workloads. Frequent update operations (excluding pure adds) fragment the graph structure over time, creating suboptimal routing paths, increasing query latency, reducing recall accuracy, and bloating disk usage. Updates don't modify the graph in place but mark old vectors deleted and add new versions, leaving dead graph nodes.

Sources

Performance Tips - Chroma Cookbookcookbook.chromadb.dev

[1603.09320] Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphsarxiv.org

Technologies:

ChromaThe root cause of this issue originates in Chroma

How to detect:

High frequency of update operations (exclude pure adds). Query accuracy (recall@k) degrades over time from baseline. Query latency increases by >30% compared to fresh index. Disk usage grows faster than new data addition. Index fragmentation metrics show high percentage of deleted nodes (>20%).

Recommended action:

1. Diagnose: Track update operation frequency and type (metadata-only vs vector updates). Measure query recall accuracy over time using ground truth evaluation set. Compare query latency now vs when index was fresh. Check for accuracy degradation patterns — fragmentation affects complex queries more than simple ones. 2. Assess impact: Establish baseline metrics when index is fresh (post-rebuild). Set thresholds: recall < 95% baseline = action needed. Latency > 1.5x baseline = action needed. Prioritize based on business impact of degraded accuracy. 3. Rebuild index: Use Chroma CLI: `chops hnsw rebuild --collection <name>` or programmatically via rebuild API. This reconstructs the HNSW graph from current vectors, eliminating deleted nodes and optimizing routing paths. Expect significant improvements: latency reduction 30-50%, accuracy restoration to baseline, disk space reclamation 20-40%. 4. Schedule rebuilds: For update-heavy workloads (>5% daily updates), schedule weekly rebuilds during low-traffic windows. For moderate updates (>5% weekly), monthly rebuilds sufficient. Automate rebuild scheduling based on fragmentation metrics. 5. Optimize update patterns: Batch updates where possible to reduce rebuild frequency. Consider metadata-only updates vs full vector updates — metadata changes don't fragment HNSW. Design application to minimize unnecessary updates. 6. Monitor: Track update rate, query accuracy, latency trends. Alert when accuracy drops below threshold or latency increases >30%. Dashboard showing time since last rebuild and estimated fragmentation.