CrewAI Tool Invocation Failure Detection

critical

reliabilityUpdated Jul 14, 2025

Agents appear to execute tools (generate Action/Observation traces) but tools are never actually invoked, resulting in fabricated observations and silent failures. This breaks the tool-use contract and leads to incorrect outputs without obvious errors.

Sources

Introduction - AgentOpsdocs.agentops.ai

Core Concepts - AgentOpsdocs.agentops.ai

[BUG] Agent does not actually invoke tools, only simulates ... - GitHubgithub.com

Technologies:

CrewAIThe root cause of this issue originates in CrewAI

crewai.agent.action.count

crewai.tool.invocation.count

crewai.tool.invocation.failure_rate

crewai.trace.span.tool_execution.count

How to detect:

Detect when CrewAI agents produce Action/Observation sequences in traces but corresponding tool execution spans are missing. Look for LLM-generated observations without matching tool execution logs, spans, or side effects. Monitor for discrepancies between agent trace count and actual tool invocation count.

Recommended action:

Enable verbose logging and distributed tracing to verify tool execution. Validate that Action events correlate with actual tool.run() calls. Implement internal validation to prevent agents from returning Final Answer without confirmed tool execution. Add monitoring for tool invocation success/failure rates and alert when observation count exceeds actual tool execution count.