Agent CPU Throttling During High-Volume Trace Collection
warningAI agents and observability sidecars consume excessive CPU when processing high trace volumes, leading to throttling that impacts agent decision-making latency and reliability.
Monitor cpu_throttled_periods_total and cpu_throttling_periods_total metrics increasing. Correlate with high trace ingestion rates (traces per second) and agent latency percentiles (P95, P99). Watch for cpu_usage_total_jiffies_total growth rate exceeding allocated CPU units.
Implement trace sampling and filtering at agent level before persistence. Configure CPU limits that account for peak trace volumes. Use distributed tracing context propagation to reduce redundant trace collection. Monitor token generation speed and adjust resources when agent reasoning latency increases.