anthropic_time_time_to_first_token
Time from request to first token receivedKnowledge Base (5 documents, 0 chunks)
Related Insights (4)
Sequential tool execution in Claude Code agents causes 90% longer research times compared to parallel execution. Enabling parallel tool calling for both subagent spawning (3-5 agents) and tool usage (3+ tools) dramatically reduces latency.
Initial response latency (TTFT) increases when backend processing saturates, creating poor user experience even when total request time remains acceptable. Critical for streaming applications where perceived responsiveness depends on first token delivery.
High time-to-first-token from LLM providers indicates queuing, rate limiting, or model cold starts, causing user-perceived delays even when total generation time is acceptable.
Elevated anthropic_time_time_to_first_token indicates backend strain, throttling, or network issues. Latency above 500ms may signal infrastructure problems. This metric is distinct from total request time and specifically captures model initialization and first response delays.