Technologies/BentoML/bentoml.runner.adaptive_batch.size
BentoMLBentoMLMetric

bentoml.runner.adaptive_batch.size

Adaptive batch size
Dimensions:None
Available on:PrometheusPrometheus (1)OpenTelemetryOpenTelemetry (1)DatadogDatadog (1)
Interface Metrics (3)
PrometheusPrometheus
Current adaptive batch size for the runner
Dimensions:None
OpenTelemetryOpenTelemetry
Total batch size for adaptive batching in runner
Dimensions:None
DatadogDatadog
Size of batches processed by adaptive batching runner
Dimensions:None

Technical Annotations (49)

Configuration Parameters (19)
max_batch_sizerecommended: 50
Maximum number of requests to batch together for inference in Runner method_configs
max_latency_msrecommended: 600
Maximum time to wait for batch accumulation before processing in Runner method_configs
batch_size
Must be copied to separate variable before training logic overrides
runners.batching.max_batch_sizerecommended: 5
Maximum number of requests to batch together for inference; must account for pre-batched client inputs
max_latency
Current batching parameter with limited control over batch composition logic
threads
parameter in @bentoml.service decorator to set concurrency level for synchronous endpoints
runners.<runner_name>.batching.max_batch_sizerecommended: 32
Must account for input batch dimensions; actual batch size = configured value × input batch dimension
runners.<runner_name>.batching.enabledrecommended: true
Enables adaptive batching where this bug manifests
runners.<runner_name>.batching.max_latency_msrecommended: 100000
Example value used in reproduction; controls batching window
batchablerecommended: True
Must be True for adaptive batching; should be disabled if batching not needed
traffic.concurrencyrecommended: match batch size
For adaptive/continuous batching Services, must equal batch size for optimal throughput
runner.batching.target_latency_msrecommended: 0
Controls dispatcher wait time before executing requests; 0 minimizes wait after bursts
runner.batching.strategy
Strategy selection option added but requires load testing to determine optimal value
runner.batching.max_batch_size
Moved into strategy_options; impacts batch formation
runner.batching.max_latency
Moved into strategy_options; controls maximum acceptable latency
@bentoml.service.threadsrecommended: N
Set to enable concurrent requests from sync endpoints to batchable services
batch_dimrecommended: 0
Specifies which tensor dimension to batch along
signatures.__call__.batchablerecommended: True
enables adaptive batching for models that support batched input
signatures.__call__.batch_dimrecommended: (0, 0)
2-tuple specifying input and output batch dimensions (defaults to 0)
Error Signatures (4)
UnboundedLocalError: local variable 'batch_size' referenced before assignmentexception
AssertionErrorexception
assert start < endexception
bentoml/_internal/utils/metrics.py", line 44, in exponential_bucketsexception
CLI Commands (1)
BENTOML_CONFIG=./configuration.yaml bentoml serve --productiondiagnostic
Technical References (25)
adaptive batchingconceptCorkDispatchercomponentCORK algorithmconceptdispatcher loopcomponentadaptive batching algorithmcomponentbatch enginecomponent@bentoml.service decoratorcomponentsynchronous endpointconcept@bentoml.api decoratorcomponentonnxruntime.InferenceSessioncomponentbentoml._internal.marshal.dispatchercomponentrunner.async_runcomponentexponential_bucketscomponentbentoml._internal.utils.metricsfile pathcontinuous batchingconceptRunnerAppcomponentmicro-batchingconcept@bentoml.servicecomponent@bentoml.api(batchable=True)componentv1/default_configuration.yamlfile pathNLURunnablecomponentbentoml.Runnable.methodcomponentasync_runcomponentRunnercomponentRunnablecomponent
Related Insights (22)
BentoML adaptive batching configuration for optimal throughputinfo
Adaptive batching disabled prevents batch size metrics collectioninfo
Adaptive batching queue depth causes latency spikeswarning
BentoML 0.13-LTS batching latency excluded from request duration metricswarning
UnboundedLocalError in dispatcher training logic when batch_size overriddencritical
Pre-batched client requests can exceed configured max_batch_size limitwarning
Variable-length inputs cause adaptive batching to underperform or slow down inferencewarning
Batch splitting behavior may fragment requests across multiple execution cyclesinfo
Suboptimal batch sizes reduce throughput efficiencywarning
Synchronous endpoints limit batching throughput to one request at a timewarning
Batch size constraint prevents optimal throughput under high loadwarning
Adaptive batching exceeds max_batch_size causing OOM when inputs have batch dimensioncritical
Adaptive batching crashes with max_batch_size=1 due to metric bucket assertion failurecritical
Replica count undershoots or overshoots when concurrency mismatches batch sizewarning
Insufficient visibility into adaptive batching decisions impacts troubleshootingwarning
Batching strategy configuration requires load testing before production deploymentwarning
Insufficient batching throughput when sync endpoints call batchable services with default concurrencywarning
Memory exhaustion from excessive max_batch_size relative to GPU capacitywarning
Adaptive batching shows no performance improvement over sequential processingwarning
Max batch size determines memory capacity limits for GPU/memory resourceswarning
Non-batchable parameter types bypass adaptive batching optimizationinfo
Models must be explicitly declared batchable for adaptive batching to workinfo