gen_ai_anthropic_cache_read_input_tokens
Number of input tokens read from cacheKnowledge Base (6 documents, 0 chunks)
Related Insights (4)
Multi-agent research systems consume 15× more tokens than single chats, rapidly filling context windows and causing memory limit errors. Token usage alone explains 80% of performance variance but can exhaust budgets unexpectedly.
Multi-agent research systems consume 15× more tokens than single chat sessions, with individual agents using 4× chat baseline. Token usage explains 80% of performance variance but creates unsustainable cost trajectories without monitoring.
Low cache hit rates cause cached_read_input_tokens to remain low while cache_creation_input_tokens stays high, multiplying ITPM consumption. With effective caching (80%+ hit rate), effective throughput can be 5-10x higher since cached tokens don't count toward ITPM limits.
LangChain's usage_metadata for Anthropic prompt caching incorrectly aggregates input_tokens (includes cached reads/writes), requiring manual reconstruction. This breaks cost and token analysis in observability dashboards and alerts.