LlamaIndex Retrieval Result Quality Degradation

warning

performanceUpdated Mar 2, 2026

LlamaIndex retrieval returns insufficient or irrelevant documents, degrading answer quality due to poor index coverage, misconfigured similarity thresholds, or index staleness.

Technologies:

LlamaIndexsubject

How to detect:

Monitor llama_index.retrieval.documents.count per query. Alert when average document count is below expected threshold (e.g., <3 documents retrieved when top_k=10 configured) or when count drops significantly from baseline. Correlate with llama_index.retrieval.duration to identify if performance constraints are limiting retrieval depth.

Recommended action:

1. Investigate: Query distribution of retrieval.documents.count across queries. Identify query patterns with consistently low retrieval counts. Check if similarity score thresholds are filtering too aggressively. 2. Diagnose: Sample queries with low document counts and manually verify if relevant documents exist in index. Check index freshness (when was it last updated?). Review embedding model quality and whether it matches domain. Test with different similarity thresholds. 3. Remediate: Adjust similarity score threshold if too restrictive. Implement hybrid search (keyword + semantic) to improve recall. Re-index with better chunking strategy or higher quality embeddings. Increase top_k parameter if latency permits. Add query expansion or rewriting for better semantic matching. 4. Prevent: Dashboard retrieval.documents.count distribution per query type. Set alerts for drops in average retrieval count. Implement A/B testing for retrieval parameter tuning. Monitor index coverage metrics (custom: documents indexed, index age).