langchain_llm_cost
Estimated USD cost of an LLM call based on token counts and model pricingDimensions:None
Available on:
OpenTelemetry (1)
Interface Metrics (1)
Dimensions:None
Knowledge Base (2 documents, 0 chunks)
referenceTime to First Token (TTFT) in LLM Inference2183 wordsscore: 0.75This page provides a comprehensive technical reference on Time to First Token (TTFT) as a performance metric for LLM inference systems. It covers TTFT's definition, components (scheduling delay and prompt processing time), relationship to other metrics like TBT and TPOT, optimization strategies including dynamic token pruning and cache management, and advanced temporal analysis approaches like fluidity-index for better user experience assessment.
troubleshootingAnthropic: Usage metadata is inaccurate for prompt cache reads/writes · Issue #32818 · langchain-ai/langchain · GitHub1181 wordsscore: 0.75This GitHub issue reports a bug in LangChain's Anthropic integration where usage_metadata incorrectly reports input_tokens when using prompt caching. The actual input tokens should be calculated by subtracting cache_read and cache_creation tokens, but LangChain reports the sum instead. This affects the accuracy of token usage tracking for Anthropic models with prompt caching enabled.
Related Insights (6)
Runaway Token Consumption Cost Spikecritical
Recursive chains, agent loops, or unbounded context windows can generate thousands of tokens in seconds, causing unexpected cost explosions (e.g., $12k-$30k bills).
▸
Prompt Cache Metrics Misreportingwarning
LangChain's usage_metadata for Anthropic prompt caching incorrectly aggregates input_tokens (includes cached reads/writes), requiring manual reconstruction. This breaks cost and token analysis in observability dashboards and alerts.
▸
Token Usage Forecast Drift from Model Changeswarning
LangSmith's token usage forecasting assumes stable model behavior. Untagged model version changes (e.g., GPT-4 → GPT-4-turbo, Claude updates) can shift token distributions, invalidating forecasts and triggering false cost alerts.
▸
Usage metadata extraction from serialized tracer outputs now supportedinfo
▸
OpenAI automatic server-side compaction now supportedinfo
▸
Anthropic cache_control now hoisted to tool_result levelwarning
▸