LangChainOpenAI

Embedding Latency from Token Encoding Download

warning
latencyUpdated Sep 2, 2023

First embedding requests can timeout waiting for tiktoken encoding files to download from external CDNs (e.g., openaipublic.blob.core.windows.net), causing initial request failures.

How to detect:

Monitor langchain_embedding_time for initial requests with unusually high latency (>10s). Track connection timeout errors to openaipublic.blob.core.windows.net or similar CDN endpoints. Alert on first-request failures in embedding operations.

Recommended action:

Pre-download and cache tiktoken encoding files during application initialization. Configure HTTP timeouts and retries for encoding file downloads. Use local encoding file caching to avoid CDN dependency. Monitor encoding cache hit rates and refresh stale caches proactively.