Tool Error Rate as Dependency Health Proxy
warningreliabilityUpdated Feb 12, 2026
langchain_tool_error spikes often reflect downstream API failures, rate limits, or network issues—not application bugs. Treating tool errors in isolation misses the root cause: external service degradation.
How to detect:
Alert when langchain_tool_error rate exceeds baseline (e.g., >5% of langchain_tool_invocation). Group by tool name to identify which external services are failing. Cross-reference with external status pages (e.g., for APIs, SaaS tools).
Recommended action:
Implement retry logic with exponential backoff for transient tool failures. Add circuit breakers to prevent cascading failures. Monitor external service SLAs and set up uptime checks. Use LangSmith traces to capture error payloads and identify patterns (e.g., 429 rate limits, 503 unavailability).