LlamaIndex Embedding Request Failure

critical

availabilityUpdated Mar 2, 2026

Embedding API failures during document indexing or query time cause incomplete indexes or degraded retrieval quality without proper error handling and retry logic.

Technologies:

LlamaIndexsubject

How to detect:

Monitor embedding request success rate by tracking llama_index.embedding.requests alongside error metrics (derived from traces/logs). Alert when error rate >5% sustained or when llama_index.embedding.duration shows timeouts (P99 > 10s). During indexing, complete embedding failure (requests = 0 when documents are being processed) indicates critical failure.

Recommended action:

1. Investigate: Check embedding API health and rate limits. Review error logs for API response codes (429 rate limit, 500 server error, timeout). Verify network connectivity to embedding service. 2. Diagnose: Identify if failure is transient (temporary API issue) or persistent (configuration error, auth failure). Check if batch size is overwhelming API. Review API quota/billing status. 3. Remediate: Implement exponential backoff retry for transient errors (max 3 retries with 2x backoff). Reduce batch size if API is rate-limiting. Switch to alternative embedding provider as fallback. For indexing failures: implement checkpoint/resume logic to avoid re-processing all documents. 4. Prevent: Set up embedding API health checks independent of application metrics. Configure alerts on embedding.requests = 0 during expected indexing windows. Implement circuit breaker pattern to fail fast when API is unhealthy. Dashboard embedding success rate (custom metric: successful requests / total attempts).