Static latency thresholds fail under variable request length causing false positives
warningperformanceUpdated Jan 20, 2026(via Exa)
Technologies:
How to detect:
Traditional static threshold alerting generates excessive false alarms for normal long-text requests or misses performance regressions in short-text requests due to wide distribution of normal processing times driven by input/output token length variance
Recommended action:
Implement workload-aware dynamic baselines that account for input token length, output token length, and KV-cache hit rates. Build theoretical expected duration models based on current batch characteristics rather than static thresholds. Compare actual execution time against workload-adjusted expectations.