Adaptive batching queue depth causes latency spikes

warning

performanceUpdated Mar 7, 2026(via Exa)

Sources

Request Processing Pipeline | bentoml/BentoML | DeepWikideepwiki.com

Technologies:

BentoMLsubject

How to detect:

CorkDispatcher queues requests and uses the CORK algorithm to decide when to release batches. During adaptive batching, requests wait in queue while the dispatcher analyzes historical performance data, potentially causing p99 latency spikes as batch size optimization trades individual request latency for throughput.

Recommended action:

Monitor bentoml.runner.adaptive_batch.wait_duration to track queuing delay. Review bentoml.runner.adaptive_batch.size distribution to understand batching behavior. For latency-sensitive workloads, consider disabling batching or tuning batch timeout parameters.