Latency Percentile Divergence Indicating Bimodal Performance
latency
When P95/P99 latency increases significantly while P50/average remains stable, it indicates bimodal performance where a subset of requests experience degraded latency. This suggests queueing, rate limiting, or resource contention affecting tail latency.
OpenAI insight details requires a free account. Sign in with Google or GitHub to access the full knowledge base.
Sign in to access