bentoml.api_server.request.in_progress
Active API server requestsDimensions:None
Available on:
OpenTelemetry (1)
Interface Metrics (1)
Dimensions:None
Technical Annotations (34)
Configuration Parameters (6)
workersrecommended: cpu_counttraffic.max_concurrencyrecommended: 50sleeprecommended: greater than 0.001threadsservices.<service_name>.scaling.min_replicasrecommended: 0@bentoml.service.threadsrecommended: NError Signatures (7)
503http statusCRITICAL] WORKER TIMEOUTlog patternWorker exitinglog pattern502http status504http statusValueError: unexpected end of streamexceptionException on /predict [POST]log patternCLI Commands (1)
curl -X GET https://<deployment-url>/readyzremediationTechnical References (20)
request queuecomponentconcurrency-based autoscalingconceptworkerscomponent@bentoml.servicecomponentmax_concurrencycomponenttrafficcomponentanyio.to_thread.run_synccomponentcapacity limitercomponentGunicorn workercomponentworker timeoutconceptMultiFileInput adaptercomponentwerkzeug.formparsercomponent/predictcomponenthttp-servercomponentper service limitercomponent@bentoml.service decoratorcomponentsynchronous endpointconcept/readyz endpointcomponentscale-to-zeroconcept@bentoml.api(batchable=True)componentRelated Insights (14)
Request overload without queuing causes service instabilitycritical
▸
Synchronous API functions create throughput bottleneck in productionwarning
▸
Single worker configuration causes request queuing and poor throughputwarning
▸
Unconfigured max_concurrency allows unbounded request processing causing resource exhaustionwarning
▸
Thread pool exhaustion prevents synchronous API method executioncritical
▸
MaxConcurrencyMiddleware returns 503 under loadwarning
▸
Worker timeout and crash loop under concurrent request loadcritical
▸
Multipart form parsing failure under concurrent loadwarning
▸
Unbounded thread allocation per service causes resource contentionwarning
▸
Extremely low sleep interval in load tests may exhaust connection poolinfo
▸
API server request backlog indicates upstream bottleneckwarning
▸
Synchronous endpoints limit batching throughput to one request at a timewarning
▸
Service unavailable during scale-from-zero without manual readiness probeinfo
▸
Insufficient batching throughput when sync endpoints call batchable services with default concurrencywarning
▸