Chroma

Query Timeout from Inefficient Distance Calculations

warning
performanceUpdated Mar 3, 2026

Complex distance functions (L2, cosine similarity) on high-dimensional vectors are computationally expensive. Large n_results with high ef_search parameters force HNSW to compute distances for thousands of candidates, causing query timeouts (>30s). This is worse without SIMD optimizations, on under-provisioned CPU, or when querying multiple collections sequentially rather than in parallel.

Technologies:
How to detect:

Query latency exceeds timeout threshold (typically 30-60s). Timeouts correlate with high n_results (>100) or large ef_search (>500). CPU utilization spikes to 100% during queries. Query latency increases proportionally with dimensionality and n_results. Issue worse on generic HNSW builds without SIMD optimizations.

Recommended action:

1. Diagnose: Profile queries to confirm distance calculation bottleneck (vs I/O, metadata filtering). Check n_results and ef_search parameters in slow queries. Measure CPU utilization during queries — 100% indicates CPU-bound. Review HNSW build optimization (see 'cross-architecture-hnsw-performance-degradation'). 2. Optimize query parameters: Reduce n_results to minimum needed by application. Most RAG applications use top 10-50, not 100+. Reduce ef_search from default 10 up to 50-100 only if recall requires it. Higher ef_search = more candidates = more distance calculations. Test recall vs latency trade-off for your dataset. 3. Enable SIMD optimizations: Rebuild HNSW with architecture-specific SIMD instructions (AVX2, AVX-512). See 'cross-architecture-hnsw-performance-degradation' insight. Expect 2-5x speedup in distance calculations. 4. Optimize distance function: If using cosine similarity, ensure vectors are pre-normalized to unit length. Normalized vectors allow cosine similarity to be computed as dot product (faster). Chroma may do this automatically — verify in documentation. 5. Scale compute: Increase CPU resources: more cores for parallel query processing, faster cores (higher GHz) for single-query latency. Cloud options: compute-optimized instances (AWS c6i, GCP c2, Azure Fsv2). Consider GPU acceleration: some vector databases support GPU for distance calculations, check Chroma roadmap for GPU support. 6. Parallelize queries: If querying multiple collections, parallelize rather than sequential: use ThreadPoolExecutor or asyncio for concurrent queries. Each query is independent, can run in parallel. Improves throughput but not single-query latency. 7. Cache frequent queries: Cache results for frequently executed queries: use content hash of query vector + parameters as key. Store in Redis, Memcached, or in-memory cache. Measure cache hit rate — aim for >20% for stable query patterns. 8. Monitor: Track query timeout rate, latency p95/p99, CPU utilization during queries. Alert on timeout rate >0.5%, latency >5s (before timeout). Profile query parameters distribution to identify inefficient queries.