OpenAI

Latency Percentile Divergence Indicating Bimodal Performance

latency

When P95/P99 latency increases significantly while P50/average remains stable, it indicates bimodal performance where a subset of requests experience degraded latency. This suggests queueing, rate limiting, or resource contention affecting tail latency.

OpenAI insight details requires a free account. Sign in with Google or GitHub to access the full knowledge base.

Sign in to access