ChromaOpenAI

Embedding Dimensionality Storage Overhead

info
Resource ContentionUpdated Feb 18, 2025

High-dimensional embeddings (e.g., OpenAI text-embedding-3-large with 3072 dimensions) significantly increase storage, memory, and query costs compared to lower-dimensional alternatives. Many embedding models support dimensionality reduction without major accuracy loss. Reducing from 3072 to 1536 or 768 dimensions can halve storage and memory requirements while maintaining 95-98% retrieval accuracy.

How to detect:

Embedding dimensionality is very high (>2000 dimensions) relative to retrieval accuracy requirements. Storage growth rate and memory usage are higher than necessary for application needs. Cost per query is elevated due to large vector operations. Alternative embedding models or dimensionality reduction could meet accuracy requirements at lower resource cost.

Recommended action:

1. Assess: Review current embedding dimensionality and model choice. Estimate storage cost: (collection_size * dimensions * 4 bytes). Estimate memory cost: same calculation plus HNSW graph overhead. Review retrieval accuracy requirements — does application need top-k exact matches or are approximate neighbors sufficient? 2. Test dimensionality reduction: For OpenAI text-embedding-3 models, test with reduced dimensions parameter: 1536 (50% reduction), 768 (75% reduction), 512 (83% reduction). Evaluate retrieval accuracy using representative queries and ground truth. Measure recall@k, MRR, NDCG metrics. Most applications maintain >95% accuracy at 1536 dimensions. 3. Evaluate alternative models: Consider models with native lower dimensionality that meet accuracy needs: Cohere embed-english-v3.0 (768d or 1024d), OpenAI text-embedding-3-small (1536d), sentence-transformers models (384d-768d). Benchmark accuracy vs current high-dimensional embeddings. 4. Migrate (if justified): If testing shows acceptable accuracy with lower dimensions, plan migration: re-embed entire collection with new settings, perform A/B test comparing retrieval quality, switch traffic once validated. 5. Calculate savings: Storage reduction: (old_dims - new_dims) / old_dims * 100%. Memory reduction: similar calculation. Cost reduction: storage + compute savings. Document expected ROI. 6. Prevent: Establish embedding dimensionality guidelines based on use case. Default to lower dimensions (768-1536) unless high accuracy requirements proven. Review embedding choices during architecture design.