Technologies/BentoML/bentoml.runner.adaptive_batch.wait_duration

BentoMLMetric

bentoml.runner.adaptive_batch.wait_duration

Batch queue wait time

Dimensions:None

Available on:

Datadog (1)

Interface Metrics (1)

Datadog

bentoml.runner.adaptive_batch.wait_duration

Time requests wait in queue before batch processing in seconds

Dimensions:None

Sources

bentoml.runner.adaptive_batch.wait_durationdocs.bentoml.com

Technical Annotations (32)

Configuration Parameters (11)

max_batch_sizerecommended: 50

Maximum number of requests to batch together for inference in Runner method_configs

max_latency_msrecommended: 600

Maximum time to wait for batch accumulation before processing in Runner method_configs

max_latency

Current batching parameter with limited control over batch composition logic

runner.batching.target_latency_msrecommended: 0

Controls dispatcher wait time before executing requests; 0 minimizes wait after bursts

runner.batching.strategy

Strategy selection option added but requires load testing to determine optimal value

runner.batching.max_batch_size

Moved into strategy_options; impacts batch formation

runner.batching.max_latency

Moved into strategy_options; controls maximum acceptable latency

runners.batching.max_latency_msrecommended: 60000

Default is 60000ms (60 seconds), reduce if latency SLAs are tighter

batchablerecommended: True

Must be True on Runnable.method decorator for batching to work

batch_dimrecommended: 0

Specifies which tensor dimension to batch along

runners.<runner_name>.batching.max_latency_msrecommended: increase value when 503s occur

maximum latency in milliseconds that a batch waits before releasing for inferencing

Error Signatures (3)

ServiceUnavailableexception

raise ServiceUnavailable(body.decode())exception

503http status

Technical References (18)

adaptive batchingconceptbackground taskconceptasync methodconcepttask.get()componenttask.get_status()componentCorkDispatchercomponentCORK algorithmconceptadaptive batching algorithmcomponentbatch enginecomponentrunnercomponent@bentoml.api decoratorcomponentRunnerAppcomponentmicro-batchingconceptasync_run_methodcomponentrunner_handlecomponentNLURunnablecomponentbentoml.Runnable.methodcomponentasync_runcomponent

Related Insights (13)

BentoML adaptive batching configuration for optimal throughputinfo

▸

BentoML tasks block synchronous execution flow instead of running in backgroundwarning

▸

Adaptive batching queue depth causes latency spikeswarning

▸

BentoML 0.13-LTS batching latency excluded from request duration metricswarning

▸

Variable-length inputs cause adaptive batching to underperform or slow down inferencewarning

▸

Adaptive batching wait duration increases latency under low throughputwarning

▸

Batch size constraint prevents optimal throughput under high loadwarning

▸

Insufficient visibility into adaptive batching decisions impacts troubleshootingwarning

▸

Batching strategy configuration requires load testing before production deploymentwarning

▸

Increased latency from excessive max_latency_ms allowing oversized batcheswarning

▸

Batching configuration causes ServiceUnavailable errors under concurrent loadcritical

▸

Adaptive batching shows no performance improvement over sequential processingwarning

▸

HTTP 503 errors when adaptive batching cannot meet max_latency_ms constraintwarning

▸