CPU-only HPA fails to scale I/O-bound async FastAPI under load

warning

performanceUpdated Mar 25, 2026

Sources

8 HPA/Autoscaling Tactics for FastAPI on K8s | by Nexumo - Mediummedium.com

Technologies:

KubernetesThe root cause of this issue originates in Kubernetes

How to detect:

For async Python FastAPI with I/O-bound requests (database, cache, HTTP calls), CPU utilization remains low (e.g., ~45%) while request queues grow and latency degrades, preventing CPU-based HPA from triggering scale-up

Recommended action:

Add custom request-rate (QPS) metric to HPA via Prometheus Adapter. Record http_requests_total in app, expose rate(http_requests_total[30s]) as Pods metric. Set HPA target to 70-80% of max sustainable RPS per pod at latency SLO (e.g., 10 RPS/pod if max is ~12-14)