LlamaIndex Query Engine Request Failure
criticalQuery engine failures prevent users from receiving answers due to LLM API errors, retrieval failures, or agent execution errors without proper error handling and fallback mechanisms.
Monitor query success rate by comparing llama_index.query_engine.requests against error traces. Alert when error rate >5% sustained or when llama_index.query_engine.duration shows extreme outliers (P99 > 30s suggesting timeouts). Complete query failure (requests drop to 0 during active user traffic) indicates critical outage.
1. Investigate: Check query engine error logs for exception types (LLM API error, retrieval timeout, parsing failure, agent error). Verify all dependencies are healthy (LLM API, vector store, agent tools). Review recent deployments or config changes. 2. Diagnose: Use distributed tracing to identify failure point in query pipeline (retrieval → LLM → response parsing). Check if specific query types are failing consistently. Verify authentication and API keys are valid. 3. Remediate: Implement graceful degradation (e.g., return cached responses or simpler non-LLM responses when LLM fails). Add retry logic with exponential backoff for transient errors. Configure circuit breakers to prevent cascading failures. For persistent failures: rollback recent changes, verify configuration, contact LLM provider if API-side issue. 4. Prevent: Set up synthetic monitoring with test queries to detect failures before users are impacted. Configure alerts on query_engine.requests = 0 or error rate >1%. Implement health check endpoint that exercises full query path. Dashboard query success rate and failure reasons. Build fallback query modes (cached, simplified, alternative model).