Tool Error Rate as Dependency Health Proxy

warning

reliabilityUpdated Feb 12, 2026

langchain_tool_error spikes often reflect downstream API failures, rate limits, or network issues—not application bugs. Treating tool errors in isolation misses the root cause: external service degradation.

Sources

AI Observability — Dynatrace Docsdocs.dynatrace.com

Technologies:

LangChainSymptoms of this issue are visible in LangChain metrics and logs

How to detect:

Alert when langchain_tool_error rate exceeds baseline (e.g., >5% of langchain_tool_invocation). Group by tool name to identify which external services are failing. Cross-reference with external status pages (e.g., for APIs, SaaS tools).

Recommended action:

Implement retry logic with exponential backoff for transient tool failures. Add circuit breakers to prevent cascading failures. Monitor external service SLAs and set up uptime checks. Use LangSmith traces to capture error payloads and identify patterns (e.g., 429 rate limits, 503 unavailability).