Cached prompt tokens counted twice in cost calculation

warning

configurationUpdated Jan 31, 2025(via Exa)

Sources

bug: Wrong cost computation in case of cached prompt · Issue #5312 · langfuse/langfusegithub.com

Technologies:

Langfusesubject

LangChainLangChain metrics correlate with this issue and help confirm diagnosis

OpenAIOpenAI metrics correlate with this issue and help confirm diagnosis

How to detect:

When LLM API calls use prompt caching (input_cache_read tokens), Langfuse incorrectly calculates costs by adding cached tokens to the input count. Input Usage displays as (input + input_cache_read) instead of just input, inflating reported costs. Affects langfuse-langchain v3.34.0 and earlier with models supporting prompt caching like gpt-4o-2024-08-06.

Recommended action:

Upgrade langfuse-js SDK to v3.35.3 or later. Verify cost calculations in Langfuse UI for traces with cached input after upgrade. Note: As of June 2025, similar issues may persist in other integrations (see related issues #6436 for OpenRouter, #10592 for general cached token counting).