bentoml.runner.adaptive_batch.size
Adaptive batch sizeDimensions:None
Interface Metrics (3)
Dimensions:None
Dimensions:None
Dimensions:None
Sources
Technical Annotations (49)
Configuration Parameters (19)
max_batch_sizerecommended: 50max_latency_msrecommended: 600batch_sizerunners.batching.max_batch_sizerecommended: 5max_latencythreadsrunners.<runner_name>.batching.max_batch_sizerecommended: 32runners.<runner_name>.batching.enabledrecommended: truerunners.<runner_name>.batching.max_latency_msrecommended: 100000batchablerecommended: Truetraffic.concurrencyrecommended: match batch sizerunner.batching.target_latency_msrecommended: 0runner.batching.strategyrunner.batching.max_batch_sizerunner.batching.max_latency@bentoml.service.threadsrecommended: Nbatch_dimrecommended: 0signatures.__call__.batchablerecommended: Truesignatures.__call__.batch_dimrecommended: (0, 0)Error Signatures (4)
UnboundedLocalError: local variable 'batch_size' referenced before assignmentexceptionAssertionErrorexceptionassert start < endexceptionbentoml/_internal/utils/metrics.py", line 44, in exponential_bucketsexceptionCLI Commands (1)
BENTOML_CONFIG=./configuration.yaml bentoml serve --productiondiagnosticTechnical References (25)
adaptive batchingconceptCorkDispatchercomponentCORK algorithmconceptdispatcher loopcomponentadaptive batching algorithmcomponentbatch enginecomponent@bentoml.service decoratorcomponentsynchronous endpointconcept@bentoml.api decoratorcomponentonnxruntime.InferenceSessioncomponentbentoml._internal.marshal.dispatchercomponentrunner.async_runcomponentexponential_bucketscomponentbentoml._internal.utils.metricsfile pathcontinuous batchingconceptRunnerAppcomponentmicro-batchingconcept@bentoml.servicecomponent@bentoml.api(batchable=True)componentv1/default_configuration.yamlfile pathNLURunnablecomponentbentoml.Runnable.methodcomponentasync_runcomponentRunnercomponentRunnablecomponentRelated Insights (22)
BentoML adaptive batching configuration for optimal throughputinfo
▸
Adaptive batching disabled prevents batch size metrics collectioninfo
▸
Adaptive batching queue depth causes latency spikeswarning
▸
BentoML 0.13-LTS batching latency excluded from request duration metricswarning
▸
UnboundedLocalError in dispatcher training logic when batch_size overriddencritical
▸
Pre-batched client requests can exceed configured max_batch_size limitwarning
▸
Variable-length inputs cause adaptive batching to underperform or slow down inferencewarning
▸
Batch splitting behavior may fragment requests across multiple execution cyclesinfo
▸
Suboptimal batch sizes reduce throughput efficiencywarning
▸
Synchronous endpoints limit batching throughput to one request at a timewarning
▸
Batch size constraint prevents optimal throughput under high loadwarning
▸
Adaptive batching exceeds max_batch_size causing OOM when inputs have batch dimensioncritical
▸
Adaptive batching crashes with max_batch_size=1 due to metric bucket assertion failurecritical
▸
Replica count undershoots or overshoots when concurrency mismatches batch sizewarning
▸
Insufficient visibility into adaptive batching decisions impacts troubleshootingwarning
▸
Batching strategy configuration requires load testing before production deploymentwarning
▸
Insufficient batching throughput when sync endpoints call batchable services with default concurrencywarning
▸
Memory exhaustion from excessive max_batch_size relative to GPU capacitywarning
▸
Adaptive batching shows no performance improvement over sequential processingwarning
▸
Max batch size determines memory capacity limits for GPU/memory resourceswarning
▸
Non-batchable parameter types bypass adaptive batching optimizationinfo
▸
Models must be explicitly declared batchable for adaptive batching to workinfo
▸