langchain_request_time
Request duration distribution.Dimensions:None
Available on:
Datadog (1)
Interface Metrics (1)
Related Insights (2)
Time-to-First-Token (TTFT) Spikes Under Loadcritical
TTFT combines scheduling delay and prompt processing time, making it highly sensitive to system load and prompt length. Spikes indicate resource contention (GPU memory, queuing) or unexpectedly large prompts, directly degrading user-perceived responsiveness.
▸
Error Rate Masking Latency Degradationwarning
High langchain_request_error or langchain_chain_error rates can suppress latency metrics (fast-failing requests skew averages downward), hiding underlying performance issues that affect successful requests.
▸