LlamaIndex

LlamaIndex Agent Workflow Observability Gap

critical
availabilityUpdated Oct 28, 2025

Multi-agent LlamaIndex systems lack visibility into agent handoffs, tool execution timing, and workflow failures without proper instrumentation, leading to slow debugging and degraded production reliability.

How to detect:

Monitor for missing trace data across agent interactions: llama_index.agent.steps.count = 0 when requests are occurring, undefined agent handoff reasons, absence of per-agent latency metrics via llama_index.agent.step.duration, and inability to correlate llama_index.agent.tool.calls to specific agents in multi-agent workflows.

Recommended action:

1. Investigate: Check if OpenTelemetry instrumentation is properly initialized and trace context is propagated across async boundaries. Verify agent callbacks are registered. 2. Diagnose: Enable debug logging to confirm agent step execution is actually happening. Check for missing trace IDs in logs. 3. Remediate: Implement structured logging with trace IDs linking agent decisions to outcomes. Configure OpenTelemetry auto-instrumentation or manual spans for each agent operation. 4. Prevent: Establish baseline dashboards showing agent.steps.count and agent.step.duration per workflow type, alerting when zero activity occurs during expected execution windows.