Chroma

Query Performance Degradation on High-Cardinality Metadata Filters

warning
performanceUpdated Mar 2, 2026

Chroma supports WHERE clause filtering on metadata during queries. High-cardinality metadata fields (e.g., document_id, user_id with millions of unique values) without proper indexing cause full table scans in SQLite, dramatically increasing query latency. This is especially problematic when combining vector similarity search with metadata filters.

Technologies:
How to detect:

Query latency spikes when using WHERE clauses with high-cardinality metadata fields. Latency correlates with collection size and metadata cardinality rather than n_results. Query performance without filters is normal but degrades significantly with filters. SQLite query plans show SCAN operations on metadata tables.

Recommended action:

1. Diagnose: Identify slow queries using WHERE clauses. Check metadata field cardinality: count distinct values. Use SQLite EXPLAIN QUERY PLAN to identify full table scans. 2. Optimize: Create SQLite indexes on frequently filtered metadata fields (Chroma may not auto-index all metadata). Reduce metadata cardinality if possible (e.g., bucket values, use enums). Consider pre-filtering with separate metadata index before vector search. 3. Query optimization: Limit WHERE clause complexity. Use IN clauses with small value sets rather than OR chains. Pre-filter by metadata before vector search if possible. 4. Architectural: For very high cardinality metadata, consider external metadata store (PostgreSQL, Elasticsearch) with two-phase query: metadata filter → vector IDs → Chroma query by IDs. 5. Monitor: Track query latency by filter presence/complexity. Profile queries to identify bottlenecks.