langchain_tokens_prompt
Number of tokens used in the prompt of a request.Dimensions:None
Available on:
Datadog (1)
Interface Metrics (1)
Related Insights (6)
Time-to-First-Token (TTFT) Spikes Under Loadcritical
TTFT combines scheduling delay and prompt processing time, making it highly sensitive to system load and prompt length. Spikes indicate resource contention (GPU memory, queuing) or unexpectedly large prompts, directly degrading user-perceived responsiveness.
▸
Prompt Cache Metrics Misreportingwarning
LangChain's usage_metadata for Anthropic prompt caching incorrectly aggregates input_tokens (includes cached reads/writes), requiring manual reconstruction. This breaks cost and token analysis in observability dashboards and alerts.
▸
Token Usage Forecast Drift from Model Changeswarning
LangSmith's token usage forecasting assumes stable model behavior. Untagged model version changes (e.g., GPT-4 → GPT-4-turbo, Claude updates) can shift token distributions, invalidating forecasts and triggering false cost alerts.
▸
Usage metadata extraction from serialized tracer outputs now supportedinfo
▸
OpenAI automatic server-side compaction now supportedinfo
▸
Anthropic max input tokens updated for 1M context betainfo
▸