CPU-only HPA fails to scale I/O-bound async FastAPI under load
warningperformanceUpdated Mar 25, 2026
How to detect:
For async Python FastAPI with I/O-bound requests (database, cache, HTTP calls), CPU utilization remains low (e.g., ~45%) while request queues grow and latency degrades, preventing CPU-based HPA from triggering scale-up
Recommended action:
Add custom request-rate (QPS) metric to HPA via Prometheus Adapter. Record http_requests_total in app, expose rate(http_requests_total[30s]) as Pods metric. Set HPA target to 70-80% of max sustainable RPS per pod at latency SLO (e.g., 10 RPS/pod if max is ~12-14)