Observer effect from indiscriminate tracing competes for resources masking true bottlenecks

warning

performanceUpdated Jan 20, 2026(via Exa)

Sources

LatencyPrism: Online Non-intrusive Latency Sculpting for ...arxiv.org

[PDF] LatencyPrism: Online Non-intrusive Latency Sculpting for ... - arXivweb3.arxiv.org

Technologies:

BentoMLsubject

How to detect:

Latency-sensitive LLM inference is highly susceptible to observer effects. Indiscriminate event collection competes for CPU, memory, and I/O resources, introducing contention that can mask the true performance bottlenecks being investigated. Heavy profiling overhead can itself become the primary source of latency.

Recommended action:

Implement adaptive two-tier collection strategy: (1) Always-on Sentinel Mode with <0.5% CPU overhead collecting only workload metadata and key events; (2) On-demand Deep-Dive Mode (~7% overhead) triggered only when anomalies detected. Use dynamic probe injection/unloading and asynchronous data export via shared memory to minimize interference.