LlamaIndex Query Engine Request Failure

critical

availabilityUpdated Mar 2, 2026

Query engine failures prevent users from receiving answers due to LLM API errors, retrieval failures, or agent execution errors without proper error handling and fallback mechanisms.

Technologies:

LlamaIndexsubject

How to detect:

Monitor query success rate by comparing llama_index.query_engine.requests against error traces. Alert when error rate >5% sustained or when llama_index.query_engine.duration shows extreme outliers (P99 > 30s suggesting timeouts). Complete query failure (requests drop to 0 during active user traffic) indicates critical outage.

Recommended action:

1. Investigate: Check query engine error logs for exception types (LLM API error, retrieval timeout, parsing failure, agent error). Verify all dependencies are healthy (LLM API, vector store, agent tools). Review recent deployments or config changes. 2. Diagnose: Use distributed tracing to identify failure point in query pipeline (retrieval → LLM → response parsing). Check if specific query types are failing consistently. Verify authentication and API keys are valid. 3. Remediate: Implement graceful degradation (e.g., return cached responses or simpler non-LLM responses when LLM fails). Add retry logic with exponential backoff for transient errors. Configure circuit breakers to prevent cascading failures. For persistent failures: rollback recent changes, verify configuration, contact LLM provider if API-side issue. 4. Prevent: Set up synthetic monitoring with test queries to detect failures before users are impacted. Configure alerts on query_engine.requests = 0 or error rate >1%. Implement health check endpoint that exercises full query path. Dashboard query success rate and failure reasons. Build fallback query modes (cached, simplified, alternative model).