Prometheus

OpenTelemetry Collector Backpressure and Data Loss

warning
reliabilityUpdated Feb 23, 2026

AgentOps uses OpenTelemetry Collector for trace and metric ingestion. Under high agent load, the collector can experience backpressure (buffer overflow), leading to dropped traces or delayed metric delivery, especially if exporters to Supabase/ClickHouse are slow.

How to detect:

Monitor OTEL Collector queue length, exporter send latency, and dropped spans/metrics. Alert when queue utilization exceeds 80% or when dropped_spans_count increases. Check for slow exporter backends (Supabase, ClickHouse) causing backpressure.

Recommended action:

Increase OTEL Collector memory and queue size, enable persistent queues, optimize exporter batch sizes and timeouts, scale collector horizontally, and implement sampling for high-volume traces. Monitor backend (Supabase/ClickHouse) latency and connection health.