gen_ai_client_token_usage
Measures number of input and output tokens usedInterface Metrics (1)
Knowledge Base (1 documents, 0 chunks)
Technical Annotations (35)
Configuration Parameters (6)
streamrecommended: Falsemodelrecommended: gpt-4o-minimax_tokensrecommended: 500temperaturerecommended: 0retrieval_depthrecommended: 2-3context_token_budgetrecommended: 2000Error Signatures (2)
429http statusquotalog patternCLI Commands (1)
curl -H "X-API-Key: $ORG_API_KEY" "https://api.cloudact.ai/api/v1/costs/acme_inc/genai/summary?period=last_30d"diagnosticTechnical References (26)
token consumptionconceptChatCompletion.createcomponentmessage_from_streamcomponenttiktokencomponentW&B Billing pagecomponentFOCUS 1.3protocolsystem promptconceptembeddingconceptcontent hashconceptclaude-opus-4-6componentclaude-sonnet-4-6componentclaude-haiku-4-5componentgpt-4ocomponentgpt-4o-minicomponentDeepSeek-V3componentmax_tokenscomponentHeliconecomponentLangSmithcomponentOpenAI Usage Dashboardcomponentreranking modelcomponentvector databasecomponentevaluationconceptleaderboardcomponentinput_tokenscomponentoutput_tokenscomponenttotal_costcomponentRelated Insights (26)
Multi-agent research systems consume 15× more tokens than single chats, rapidly filling context windows and causing memory limit errors. Token usage alone explains 80% of performance variance but can exhaust budgets unexpectedly.
Recursive chains, agent loops, or unbounded context windows can generate thousands of tokens in seconds, causing unexpected cost explosions (e.g., $12k-$30k bills).
LLM provider rate limits cause request failures that aren't retried with appropriate backoff, leading to cascading failures during usage spikes.
Uncontrolled token usage from buggy loops, malicious users, or missing input validation can cause unexpected cost spikes. Tracking per-request and aggregate token consumption enables budget protection.
When PydanticAI agents consume excessive tokens due to validation retries or complex tool interactions, costs spike and latency increases. This is detectable through token usage metrics and operation cost tracking.
Uncontrolled token usage from recursive chains, unbounded context windows, or validation retry loops can cause unexpected cost spikes. Without per-request and aggregate monitoring, organizations can face bill shock (e.g., $12k-$30k unexpected charges).
High gen_ai_client_token_usage and gen_ai_client_operation_time indicate expensive or slow AI model calls, causing both cost overruns and user-facing latency. Large context windows or inefficient prompt engineering amplify this issue.