Technologies/BentoML/bentoml.runner.adaptive_batch.wait_duration
BentoMLBentoMLMetric

bentoml.runner.adaptive_batch.wait_duration

Batch queue wait time
Dimensions:None
Available on:DatadogDatadog (1)
Interface Metrics (1)
DatadogDatadog
Time requests wait in queue before batch processing in seconds
Dimensions:None

Technical Annotations (32)

Configuration Parameters (11)
max_batch_sizerecommended: 50
Maximum number of requests to batch together for inference in Runner method_configs
max_latency_msrecommended: 600
Maximum time to wait for batch accumulation before processing in Runner method_configs
max_latency
Current batching parameter with limited control over batch composition logic
runner.batching.target_latency_msrecommended: 0
Controls dispatcher wait time before executing requests; 0 minimizes wait after bursts
runner.batching.strategy
Strategy selection option added but requires load testing to determine optimal value
runner.batching.max_batch_size
Moved into strategy_options; impacts batch formation
runner.batching.max_latency
Moved into strategy_options; controls maximum acceptable latency
runners.batching.max_latency_msrecommended: 60000
Default is 60000ms (60 seconds), reduce if latency SLAs are tighter
batchablerecommended: True
Must be True on Runnable.method decorator for batching to work
batch_dimrecommended: 0
Specifies which tensor dimension to batch along
runners.<runner_name>.batching.max_latency_msrecommended: increase value when 503s occur
maximum latency in milliseconds that a batch waits before releasing for inferencing
Error Signatures (3)
ServiceUnavailableexception
raise ServiceUnavailable(body.decode())exception
503http status
Technical References (18)
adaptive batchingconceptbackground taskconceptasync methodconcepttask.get()componenttask.get_status()componentCorkDispatchercomponentCORK algorithmconceptadaptive batching algorithmcomponentbatch enginecomponentrunnercomponent@bentoml.api decoratorcomponentRunnerAppcomponentmicro-batchingconceptasync_run_methodcomponentrunner_handlecomponentNLURunnablecomponentbentoml.Runnable.methodcomponentasync_runcomponent
Related Insights (13)
BentoML adaptive batching configuration for optimal throughputinfo
BentoML tasks block synchronous execution flow instead of running in backgroundwarning
Adaptive batching queue depth causes latency spikeswarning
BentoML 0.13-LTS batching latency excluded from request duration metricswarning
Variable-length inputs cause adaptive batching to underperform or slow down inferencewarning
Adaptive batching wait duration increases latency under low throughputwarning
Batch size constraint prevents optimal throughput under high loadwarning
Insufficient visibility into adaptive batching decisions impacts troubleshootingwarning
Batching strategy configuration requires load testing before production deploymentwarning
Increased latency from excessive max_latency_ms allowing oversized batcheswarning
Batching configuration causes ServiceUnavailable errors under concurrent loadcritical
Adaptive batching shows no performance improvement over sequential processingwarning
HTTP 503 errors when adaptive batching cannot meet max_latency_ms constraintwarning