LLM Token Budget Burn Rate Alert

critical

cost_managementUpdated Dec 19, 2025

Multi-agent CrewAI systems, especially hierarchical processes with extended deliberations, can consume API tokens at unsustainable rates, causing budget overruns and operational cost spikes without proper monitoring.

Sources

Core Concepts - AgentOpsdocs.agentops.ai

AgentOps Roadmap: A 6-Month Guide to Mastering AI Agentswww.analyticsvidhya.com

CrewAI: Scaling Human‑Centric AI Agents in Production - Mediummedium.com

Technologies:

CrewAIThe root cause of this issue originates in CrewAI

crewai.llm.tokens.total

crewai.llm.tokens.prompt

crewai.llm.tokens.completion

crewai.llm.cost.total

crewai.llm.cost.per_request

crewai.session.cost.total

OpenAIOpenAI metrics correlate with this issue and help confirm diagnosis

How to detect:

Monitor token consumption rate per agent, per task, and per session. Track daily burn rate against budget. Alert when cost per request exceeds threshold, when hierarchical agent deliberations exceed expected token counts, or when projected monthly spend exceeds budget by configurable percentage.

Recommended action:

Implement smart model selection (use cheaper models for simpler tasks). Optimize prompts for conciseness. Enable response caching for identical tool calls. Decompose large tasks into smaller sub-tasks. Set cost-based circuit breakers to halt crews exceeding budget thresholds. Monitor cost per agent and adjust model assignments accordingly.