http.server.duration
HTTP request durationDimensions:None
Interface Metrics (3)
Dimensions:None
Dimensions:None
Dimensions:None
Sources
Technical Annotations (37)
Configuration Parameters (8)
mb_max_latencyrecommended: 720000timeoutrecommended: value appropriate for prediction durationtraffic.timeoutrecommended: 120traffic.max_concurrencyrecommended: set below thread pool capacitymax_latency_mstraffic.external_queuerecommended: truetraffic.concurrencyrecommended: required when external_queue is enabledrunner.max_latencyrecommended: 60000Error Signatures (10)
WORKER TIMEOUTlog patternServer disconnectedexception503http statusaiohttp.client_exceptions.ServerDisconnectedErrorexceptionbentoml.exceptions.RemoteExceptionexceptionCRITICAL] WORKER TIMEOUTlog patternWorker exitinglog pattern502http status504http statusBentoML has detected that a service has a max latency that is likely too low for servinglog patternCLI Commands (1)
bentoml serve-gunicorn --timeout <seconds>remediationTechnical References (18)
mb_max_latencycomponentgunicorn worker timeoutconceptprediction latencyconceptresource exhaustionconceptAPI server timeout configcomponentmiddlewarecomponent@bentoml.servicecomponenttrafficcomponentanyio.to_thread.run_synccomponentcapacity limitercomponentGunicorn workercomponentworker timeoutconcept@bentoml.api decoratorcomponentadaptive batchingconceptexternal request queuecomponentCORK algorithmcomponentCorkDispatchercomponentdispatcher.pyfile pathRelated Insights (12)
Gunicorn worker timeout causes 503 errors despite high configured mb_max_latencycritical
▸
Infrastructure anomalies correlate with prediction service degradationcritical
▸
API server requests timeout without server-side configurationwarning
▸
Timeout expiration causes request failures on long-running inference taskswarning
▸
Thread pool exhaustion prevents synchronous API method executioncritical
▸
Request timeout configuration prevents long-running inferencewarning
▸
Worker timeout and crash loop under concurrent request loadcritical
▸
BentoML metrics exclude socket IO time from request durationinfo
▸
API server request backlog indicates upstream bottleneckwarning
▸
HTTP 503 errors when adaptive batching exceeds max_latency_mswarning
▸
External queue increases Service latency due to extra I/O operationsinfo
▸
HTTP 503 errors when max_latency_ms is too low for model processing timecritical
▸