Technologies/DataHub/http.server.active_requests
DataHubDataHubMetric

http.server.active_requests

Number of active HTTP server requests
Dimensions:None
Available on:OpenTelemetryOpenTelemetry (1)
Interface Metrics (1)
OpenTelemetryOpenTelemetry
Number of active HTTP requests on the GMS server
Dimensions:None
Knowledge Base (1 documents, 0 chunks)
blog postCase Study: Fixing FastAPI Event Loop Blocking in a High-Traffic API - techbuddies.io5990 wordsscore: 0.75This case study describes diagnosing and fixing event loop blocking issues in a high-traffic FastAPI production service. It covers recognizing symptoms like latency spikes and degraded throughput, instrumenting the API to identify blocking code paths, and refactoring synchronous operations (database access, SDKs, CPU-heavy logic) to non-blocking patterns.

Technical Annotations (48)

Configuration Parameters (19)
workersrecommended: cpu_count
enable process-level parallelism for high-throughput or compute-intensive Services
sleeprecommended: greater than 0.001
K6 test script sleep interval between iterations; 0.001s may be too aggressive
DATABASES['default']['CONN_MAX_AGE']recommended: 10
Connection reuse timeout in seconds; set but not preventing connection spikes post-upgrade
DATABASES['default']['ATOMIC_REQUESTS']recommended: False
Disables automatic transaction wrapping per request
pool_moderecommended: transaction
Transaction pooling works best with Django's request-response cycle
default_pool_sizerecommended: 20
Number of actual PostgreSQL connections PgBouncer maintains
max_client_connrecommended: 1000
Maximum client connections PgBouncer will accept
listen_portrecommended: 6432
Port where PgBouncer listens for client connections
CONN_MAX_AGErecommended: 0
Must be 0 when using PgBouncer to avoid connection pooling conflicts
PORTrecommended: 6432
Django must connect to PgBouncer port instead of PostgreSQL direct
--workersrecommended: 4 for single CPU; 16 for multi-core
Controls number of worker processes; each handles one request at a time
CELERY_BROKER_URLrecommended: amqp://guest@localhost//
message broker URL for Celery task queue
db_pool_sizerecommended: 32
Increased from 10 to handle async I/O concurrency (2-4× workers)
metrics[].pods.target.averageValuerecommended: 10
RPS per pod target for HPA
maxReplicasrecommended: 40
HPA max for burst capacity
spec.minAvailablerecommended: 6
PodDisruptionBudget to maintain availability during disruptions
MetricInstruments.HTTP_SERVER_ACTIVE_REQUESTS.unitrecommended: {request}
Correct unit format matching 1.21.0 OpenTelemetry spec
MetricInstruments.HTTP_SERVER_ACTIVE_REQUESTS.descriptionrecommended: Number of active HTTP server requests.
Correct description matching 1.21.0 OpenTelemetry spec
starlette.versionrecommended: >=0.40.0
version 0.40.0 fixes unbounded memory buffering vulnerability
Error Signatures (5)
OperationalError: sorry, too many clients alreadyexception
FATAL: sorry, too many clients alreadylog pattern
FATAL: memory quota exceededlog pattern
OOM errorerror code
CVE-2024-47874error code
CLI Commands (2)
sudo apt-get install pgbouncerremediation
curl http://localhost:8000 -F 'big=</dev/urandom'diagnostic
Technical References (22)
Server-Sent EventsprotocolSSEprotocolTaskGroupcomponentasync context managerconceptexit stackconceptrequest queuecomponentconcurrency-based autoscalingconceptworkerscomponent@bentoml.servicecomponentCONN_MAX_AGEconfiguration parameterASGIcomponentPgBouncercomponentmax_connectionsconceptprocess workerconceptCelerycomponentAMQPprotocolStreamingResponse.stream_responsecomponentcreate_http_server_active_requestscomponentsemconv packagecomponentMetricInstruments.HTTP_SERVER_ACTIVE_REQUESTScomponentmultipart/form-dataprotocolfilenamecomponent
Related Insights (22)
Event Loop Blocking Under Concurrent Loadcritical

FastAPI async endpoints exhibit serial-like behavior and inflated tail latency when synchronous operations (ORM calls, CPU-heavy tasks, blocking SDKs) execute directly on the event loop. Throughput plateaus while p95/p99 latencies climb despite moderate CPU usage.

Confusing Resource Metrics During Event Loop Blockingwarning

FastAPI services experiencing event loop blocking show counterintuitive metrics: moderate CPU utilization (50-60%), healthy dependency performance, but rising tail latency and timeouts. This pattern indicates worker starvation rather than resource exhaustion.

RestLI Server Error Rate Spike API Reliabilitycritical

DataHub backend API experiencing elevated error rates impacting metadata ingestion, UI operations, and external integrations, potentially indicating service degradation or infrastructure issues.

Request Queue Buildup Under Burst Trafficcritical

Request queue times increase at load balancer during traffic bursts despite moderate server resource utilization, indicating insufficient concurrency handling or event loop saturation.

Function Concurrency Rate Limitingcritical

Default concurrent execution limits for serverless functions trigger 429 errors under traffic spikes, causing request failures and degraded user experience when new function instances cannot be spawned fast enough to handle load.

Server-Sent Events (SSE) support added for real-time streaminginfo
TaskGroup yield fix prevents request context leakswarning
Request overload without queuing causes service instabilitycritical
Synchronous API functions create throughput bottleneck in productionwarning
Single worker configuration causes request queuing and poor throughputwarning
Extremely low sleep interval in load tests may exhaust connection poolinfo
API server request backlog indicates upstream bottleneckwarning
PostgreSQL connection spike after Django 3.2 to 5.2 upgrade despite CONN_MAX_AGE settingcritical
PostgreSQL connection exhaustion from per-request Django connectionscritical
Single-worker Flask/FastAPI apps degrade severely under concurrent loadcritical
Synchronous task execution blocks request handlingwarning
Server stops responding to in-flight requests during request spikeswarning
Payments API p95 latency at 420ms during bursts with idle CPUwarning
Inconsistent http.server.active_requests metric unit and description breaks monitoringwarning
Unbounded multipart form field buffering causes memory exhaustion DoScritical
Unbounded memory allocation from multipart form data without filename causes OOMcritical
Unbounded form field buffering causes memory exhaustioncritical