FastAPI

High-Percentile Latency Divergence

warning
latencyUpdated Jan 10, 2026

P95 and P99 latencies diverge significantly from median/P50 latencies, indicating tail latency problems that affect user experience despite healthy average metrics.

How to detect:

Monitor when P95 latency exceeds 150ms for ingestion endpoints or 250ms for query endpoints. Alert when P99/P50 ratio exceeds 5x, indicating severe tail latency. Track latency distribution histograms to identify bimodal patterns.

Recommended action:

Instrument slow request paths with detailed tracing to identify blocking operations causing tail latency. Set explicit latency SLOs for P95/P99. Implement request timeouts and circuit breakers to prevent cascading delays. Use load shedding for non-critical requests during spikes.