CrewAI Tool Invocation Failure Detection
criticalAgents appear to execute tools (generate Action/Observation traces) but tools are never actually invoked, resulting in fabricated observations and silent failures. This breaks the tool-use contract and leads to incorrect outputs without obvious errors.
Detect when CrewAI agents produce Action/Observation sequences in traces but corresponding tool execution spans are missing. Look for LLM-generated observations without matching tool execution logs, spans, or side effects. Monitor for discrepancies between agent trace count and actual tool invocation count.
Enable verbose logging and distributed tracing to verify tool execution. Validate that Action events correlate with actual tool.run() calls. Implement internal validation to prevent agents from returning Final Answer without confirmed tool execution. Add monitoring for tool invocation success/failure rates and alert when observation count exceeds actual tool execution count.