CrewAI

CrewAI Tool Invocation Failure Detection

critical
reliabilityUpdated Jul 14, 2025

Agents appear to execute tools (generate Action/Observation traces) but tools are never actually invoked, resulting in fabricated observations and silent failures. This breaks the tool-use contract and leads to incorrect outputs without obvious errors.

How to detect:

Detect when CrewAI agents produce Action/Observation sequences in traces but corresponding tool execution spans are missing. Look for LLM-generated observations without matching tool execution logs, spans, or side effects. Monitor for discrepancies between agent trace count and actual tool invocation count.

Recommended action:

Enable verbose logging and distributed tracing to verify tool execution. Validate that Action events correlate with actual tool.run() calls. Implement internal validation to prevent agents from returning Final Answer without confirmed tool execution. Add monitoring for tool invocation success/failure rates and alert when observation count exceeds actual tool execution count.