Technologies/Kong Gateway/http.server.request.duration

http.server.request.duration

Duration of HTTP requests to GMS (GraphQL and REST endpoints)

Dimensions:None

Knowledge Base (2 documents, 0 chunks)

tutorialSetting up SLOs with FastAPI | Autometrics1633 wordsscore: 0.85This tutorial demonstrates how to implement Service Level Objectives (SLOs) in FastAPI applications using the Autometrics library and Prometheus. It covers error budgets, burn rates, and provides step-by-step code examples for instrumenting FastAPI endpoints with SLO-based alerting.

blog postCase Study: Fixing FastAPI Event Loop Blocking in a High-Traffic API - techbuddies.io5990 wordsscore: 0.75This case study describes diagnosing and fixing event loop blocking issues in a high-traffic FastAPI production service. It covers recognizing symptoms like latency spikes and degraded throughput, instrumenting the API to identify blocking code paths, and refactoring synchronous operations (database access, SDKs, CPU-heavy logic) to non-blocking patterns.

Technical Annotations (119)

Configuration Parameters (30)

exporters.otlp.sending_queue.queue_sizerecommended: 30000

increase queue capacity to buffer more requests

exporters.otlp.sending_queue.num_consumersrecommended: 50

each consumer maintains separate connection to backend

exporters.otlp.max_idle_connsrecommended: 100

total idle connections across all hosts

exporters.otlp.max_idle_conns_per_hostrecommended: 50

idle connections per backend host

spec.replicasrecommended: 10

scale collector deployment to distribute load

strict_content_typerecommended: False

temporary override to allow JSON requests without Content-Type header during client migration

iterator.chunk_sizerecommended: 1000

number of records to process per batch, prevents loading entire queryset into memory

TEMPLATES[0]['OPTIONS']['loaders']recommended: [('django.template.loaders.cached.Loader', [...])]

caches compiled templates in memory to avoid recompilation

DEBUGrecommended: False

disables debug mode which slows page loads in production

PROMETHEUS_LATENCY_BUCKETSrecommended: (0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.5, 0.75, 1.0, 2.0, 5.0, 10.0, float("inf"))

Customize histogram buckets to match application latency profile for more accurate percentile calculations

chunk_sizerecommended: 1000

Django iterator() batch size to stream rows without loading entire queryset into memory

DATABASES.default.ENGINErecommended: dj_db_conn_pool.backends.postgresql

pooling-aware backend required for connection pooling

DATABASES.default.CONN_MAX_AGErecommended: 0

let pool manage connections, not Django's per-request reuse

DATABASES.default.POOL_OPTIONS.POOL_SIZErecommended: 10

number of persistent connections in pool

DATABASES.default.POOL_OPTIONS.MAX_OVERFLOWrecommended: 5

extra connections allowed beyond pool size

CONN_MAX_AGErecommended: 0

Prevents persistent connections from timing out; set to 0 to disable connection pooling

wait_timeout

MySQL server-side timeout for idle connections; coordinate with CONN_MAX_AGE

query_count_thresholdrecommended: 50

Threshold for high query count warning per request

NPLUSONE_DETECTOR.THRESHOLDrecommended: 5

Query repetition count before middleware reports potential N+1 issue

managerrecommended: FastCountManager

Replace default manager to enable count caching

django.db.backendsrecommended: DEBUG

Logger level for SQL query logging to detect N+1 patterns

--workersrecommended: 4 for single CPU; 16 for multi-core

Controls number of worker processes; each handles one request at a time

SQLALCHEMY_DATABASE_URIrecommended: postgresql://user:password@host:port/db

database connection string for SQLAlchemy ORM

REDIS_URLrecommended: redis://localhost:6379/0

Redis connection URL for caching

CELERY_BROKER_URLrecommended: amqp://guest@localhost//

message broker URL for Celery task queue

include_in_schemarecommended: False

hide internal endpoints from OpenAPI to reduce schema size

db_pool_sizerecommended: 32

Increased from 10 to handle async I/O concurrency (2-4× workers)

metrics[].pods.target.averageValuerecommended: 10

RPS per pod target for HPA

maxReplicasrecommended: 40

HPA max for burst capacity

spec.minAvailablerecommended: 6

PodDisruptionBudget to maintain availability during disruptions

Error Signatures (5)

415http status

400http status

OperationalError: MySQL server has gone awayexception

OOM errorerror code

CVE-2024-47874error code

CLI Commands (5)

python manage.py dbshelldiagnostic

EXPLAIN ANALYZEdiagnostic

python -m cProfile -o app.profile myapp.pydiagnostic

app.openapi()remediation

curl http://localhost:8000 -F 'big=</dev/urandom'diagnostic

Technical References (79)

SQLAlchemycomponentContent-Typeprotocolapplication/jsonprotocolPydanticcomponentRustcomponentJSON LinesprotocolJSONLprotocolyieldconceptQuerySet.iterator()componentORMconceptSeq Scanconceptmodels.Indexcomponentdjango.template.loaders.cached.Loadercomponenttemplate inheritanceconceptselect_relatedcomponentprefetch_relatedcomponentN+1 query problemconceptDjango REST Frameworkcomponent.values()component.only()componentpaginationconceptRediscomponentdjango_http_requests_latency_seconds_by_view_method_bucketcomponenthistogram_quantileconceptorders modelcomponentselect_related()componentprefetch_related()componentDjango templatescomponentiterator()componentQuerySetcomponentdj_db_conn_poolcomponentCONN_MAX_AGEcomponentdjango-debug-toolbarcomponentQueryAnalysisMiddlewarecomponentconnection.queriescomponentForeignKeycomponentQuerySet.count()componentFastCountManagercomponentDjango admincomponentFastCountQuerySetcomponentDjango Debug ToolbarcomponentDATABASEScomponentprocess workerconceptlazy loadingconceptjoinedloadcomponentindexingconceptFlask-RediscomponentCelerycomponentAMQPprotocolcProfilecomponentpstatscomponentFirefox Inspect Element Network tabcomponentpymongocomponentend__gtecomponentbasin__existscomponentfunctools.lru_cachecomponentcachetoolscomponentuWSGIcomponentBackgroundTaskscomponent/openapi.jsonfile pathapp.openapi()componentAPIRoutecomponentBaseHTTPMiddlewarecomponentlimit_reqcomponentStreamingResponse.stream_responsecomponentASGI receive callablecomponentAPM transactionconceptincreaseconceptfloorconceptstarlette_request_duration_seconds_sumcomponentstarlette_request_duration_seconds_countcomponentstarlette.responses.FileResponsecomponentstarlette.staticfiles.StaticFilescomponent_parse_range_header()componentRangeprotocol_RANGE_PATTERNcomponentmultipart/form-dataprotocolASGIprotocolfilenamecomponent

Related Insights (64)

Backend connection pool exhaustion causes export waitswarning

Reaching MongoDB's concurrent connection limits causes 'connection refused because too many open connections' errors, freezing application operations and causing timeouts.

▸

Function Cold Start Latency Surgewarning

Vercel Functions experience significant latency spikes on first invocation after idle periods due to cold starts. This affects both serverless and edge functions, with visible impact on user-facing response times and potential timeout risks.

▸

Recent Deployment Causing Sudden Latency Spikewarning

DataHub performance degrades immediately after code deployments due to introduced regressions, configuration changes, or schema migrations. Traditional metrics show symptoms but don't correlate with deployment timing.

▸

Entity Cache Miss Storm on Cold Startwarning

DataHub experiences severe latency spikes immediately after pod restarts when entity cache is cold. Every GraphQL query hits the database directly, causing connection pool exhaustion and cascading timeouts.

▸

Event Loop Blocking Under Concurrent Loadcritical

FastAPI async endpoints exhibit serial-like behavior and inflated tail latency when synchronous operations (ORM calls, CPU-heavy tasks, blocking SDKs) execute directly on the event loop. Throughput plateaus while p95/p99 latencies climb despite moderate CPU usage.

▸

SLO Burn Rate Early Warningwarning

FastAPI services with defined SLOs (success rate and latency objectives) can detect reliability degradation before total failure by monitoring error budget burn rate. A burn rate exceeding 1.0 indicates the service is consuming its error budget faster than sustainable.

▸

Latency SLO Violation on Mixed Endpointswarning

FastAPI applications grouping multiple endpoints into a single latency SLO may violate targets when one slow endpoint drags down the aggregate percentile. The 99th percentile latency objective (e.g., P99 < 250ms) can fail even when most endpoints perform well.

▸

Confusing Resource Metrics During Event Loop Blockingwarning

FastAPI services experiencing event loop blocking show counterintuitive metrics: moderate CPU utilization (50-60%), healthy dependency performance, but rising tail latency and timeouts. This pattern indicates worker starvation rather than resource exhaustion.

▸

RestLI Server Error Rate Spike API Reliabilitycritical

DataHub backend API experiencing elevated error rates impacting metadata ingestion, UI operations, and external integrations, potentially indicating service degradation or infrastructure issues.

▸

Middleware Cascade Overheadwarning

Each middleware layer in FastAPI creates coroutine boundaries and adds latency overhead. Production stacks with authentication, logging, CORS, and monitoring middleware can reduce throughput by 80% compared to baseline.

▸

SLO Burn Rate Alert Patternwarning

SLO-based alerts on error budget burn rate provide early warning of degrading service health before complete failures. A burn rate >1 indicates the service is consuming error budget faster than sustainable.

▸

High-Percentile Latency Divergencewarning

P95 and P99 latencies diverge significantly from median/P50 latencies, indicating tail latency problems that affect user experience despite healthy average metrics.

▸

Request Queue Buildup Under Burst Trafficcritical

Request queue times increase at load balancer during traffic bursts despite moderate server resource utilization, indicating insufficient concurrency handling or event loop saturation.

▸

Dependency Injection Graph Explosioninfo

Deep dependency trees in FastAPI dependency injection cause redundant validation and initialization overhead on every request, visible as pre-handler latency in traces.

▸

Silent Error Handling Without Stack Tracescritical

Vercel production deployments hide error details for security, showing only generic '500: INTERNAL_SERVER_ERROR' messages without stack traces, making root cause analysis extremely difficult without proper error tracking infrastructure.

▸

Image Optimization Service Overloadwarning

Heavy reliance on Vercel's on-demand image optimization without proper caching or excessive unique image transformations can hit concurrency limits or cause slow image serving, impacting page load performance and LCP.

▸

Edge Middleware Performance Bottleneckcritical

Slow or misconfigured Next.js middleware running on edge intercepts every request, creating a performance bottleneck that affects all routes including static assets. This manifests as uniformly elevated latency across all endpoints.

▸

Strict Content-Type checking now enforced for JSON requestscritical

▸

Pydantic Rust-based JSON serialization doubles response performanceinfo

▸

Streaming JSON Lines and binary data with yield supportinfo

▸

Memory leak from loading entire queryset into RAMcritical

▸

Missing database indexes cause 30+ second query times in productionwarning

▸

Template rendering extremely slow from inheritance loops or missing cachewarning

▸

DEBUG mode enabled causes slow page loadswarning

▸

N+1 query problem causes exponential database loadcritical

▸

Deeply nested serializers trigger additional database querieswarning

▸

Unpaginated queries on large tables cause expensive operationswarning

▸

Repeatedly called endpoints without caching increase database loadwarning

▸

Infrastructure-only monitoring misses Django application failureswarning

▸

API endpoint p99 latency exceeds 500mswarning

▸

PostgreSQL query spike causing checkout latency during peak trafficcritical

▸

N+1 query pattern degrading Django response timewarning

▸

Slow template rendering causing transaction delaysinfo

▸

Synchronous bulk exports cause memory exhaustion and timeoutcritical

▸

Default Django backend causes connection overhead under loadwarning

▸

MySQL connection timeout causes 'server has gone away' errors during traffic spikescritical

▸

Django ORM generates duplicate queries causing high query counts per pagewarning

▸

High query count per request exceeds 50 querieswarning

▸

N+1 queries cause exponential database load as data scaleswarning

▸

Django QuerySet.count() becomes O(n) bottleneck on tables with millions of rowswarning

▸

Django admin becomes unusable due to count() on every list page for large tablescritical

▸

N+1 queries cause sluggish performance and scalability issueswarning

▸

Per-request database connections cause API latency under loadwarning

▸

Database bottleneck causes response time spikes during high trafficwarning

▸

Single-worker Flask/FastAPI apps degrade severely under concurrent loadcritical

▸

Low CPU and memory usage during high response times indicates worker starvationwarning

▸

Excessive database queries slow response timeswarning

▸

Unoptimized database queries cause performance degradationwarning

▸

Missing cache for frequently accessed data increases database loadwarning

▸

Synchronous task execution blocks request handlingwarning

▸

Slow page loads and resource exhaustion require profiling to identify bottleneckswarning

▸

Flask page load degrades as database size increaseswarning

▸

Fetching all MongoDB records then filtering client-side causes 40x slowdownwarning

▸

Expensive database aggregations run on every request without cachingwarning

▸

Long-running cleanup in yield dependencies blocks resource releaseinfo

▸

Lazy OpenAPI generation causes first-request latency spikeinfo

▸

Missing request timeouts and rate limits allow resource exhaustionwarning

▸

Server stops responding to in-flight requests during request spikeswarning

▸

Payments API p95 latency at 420ms during bursts with idle CPUwarning

▸

APM transaction timing excludes request body streaming latencywarning

▸

PromQL increase() function doubles Starlette request duration metricswarning

▸

Range header causes O(n^2) CPU exhaustion in FileResponsecritical

▸

Unbounded multipart form field buffering causes memory exhaustion DoScritical

▸

Unbounded form field buffering causes memory exhaustioncritical

▸