bentoml.runner.adaptive_batch.wait_duration
Batch queue wait timeDimensions:None
Available on:
Datadog (1)
Interface Metrics (1)
Dimensions:None
Technical Annotations (32)
Configuration Parameters (11)
max_batch_sizerecommended: 50max_latency_msrecommended: 600max_latencyrunner.batching.target_latency_msrecommended: 0runner.batching.strategyrunner.batching.max_batch_sizerunner.batching.max_latencyrunners.batching.max_latency_msrecommended: 60000batchablerecommended: Truebatch_dimrecommended: 0runners.<runner_name>.batching.max_latency_msrecommended: increase value when 503s occurError Signatures (3)
ServiceUnavailableexceptionraise ServiceUnavailable(body.decode())exception503http statusTechnical References (18)
adaptive batchingconceptbackground taskconceptasync methodconcepttask.get()componenttask.get_status()componentCorkDispatchercomponentCORK algorithmconceptadaptive batching algorithmcomponentbatch enginecomponentrunnercomponent@bentoml.api decoratorcomponentRunnerAppcomponentmicro-batchingconceptasync_run_methodcomponentrunner_handlecomponentNLURunnablecomponentbentoml.Runnable.methodcomponentasync_runcomponentRelated Insights (13)
BentoML adaptive batching configuration for optimal throughputinfo
▸
BentoML tasks block synchronous execution flow instead of running in backgroundwarning
▸
Adaptive batching queue depth causes latency spikeswarning
▸
BentoML 0.13-LTS batching latency excluded from request duration metricswarning
▸
Variable-length inputs cause adaptive batching to underperform or slow down inferencewarning
▸
Adaptive batching wait duration increases latency under low throughputwarning
▸
Batch size constraint prevents optimal throughput under high loadwarning
▸
Insufficient visibility into adaptive batching decisions impacts troubleshootingwarning
▸
Batching strategy configuration requires load testing before production deploymentwarning
▸
Increased latency from excessive max_latency_ms allowing oversized batcheswarning
▸
Batching configuration causes ServiceUnavailable errors under concurrent loadcritical
▸
Adaptive batching shows no performance improvement over sequential processingwarning
▸
HTTP 503 errors when adaptive batching cannot meet max_latency_ms constraintwarning
▸