Agent Task Success Rate Degradation

warning

reliabilityUpdated Dec 19, 2025

Non-deterministic LLM behavior and tool failures can cause agent task success rates to drift downward over time without obvious errors, impacting output quality and user satisfaction.

Sources

Introduction - AgentOpsdocs.agentops.ai

AgentOps Roadmap: A 6-Month Guide to Mastering AI Agentswww.analyticsvidhya.com

Agent Monitoring and Debugging with AgentOps - AG2docs.ag2.ai

CrewAI: Scaling Human‑Centric AI Agents in Production - Mediummedium.com

Technologies:

CrewAIThe root cause of this issue originates in CrewAI

crewai.task.success_rate

crewai.task.failure_rate

crewai.agent.task.completion.count

crewai.agent.task.failure.count

crewai.tool.invocation.failure_rate

How to detect:

Track task completion success/failure rates per agent and per task type. Monitor for degradation trends (rolling 7-day success rate dropping below baseline). Alert when success rate falls below SLO threshold (e.g., <95%). Correlate with LLM model changes, prompt modifications, or tool availability issues.

Recommended action:

Implement continuous evaluation using LLM-as-Judge or human review for quality validation. Create benchmark test suites to detect regressions. Monitor prompt effectiveness and iterate on prompt engineering. Implement retry logic with exponential backoff for transient failures. Set up regression tests to block deployments when quality drops.