KubernetesFastAPI

Pod CPU Throttling Despite Available Node Capacity

critical
Resource ContentionUpdated Jan 10, 2026

Pods hit CPU limits and experience throttling even when the underlying node has spare CPU capacity, indicating application-level saturation rather than node exhaustion. This leads to serial request processing, inflated tail latency, and degraded throughput.

How to detect:

Monitor pod CPU usage approaching pod limits (>95%) while node CPU utilization remains moderate (<60%). Watch for increasing p95/p99 latency, request queue buildup, and throughput plateauing despite available node capacity. Check for CPU throttling metrics and event loop blocking patterns.

Recommended action:

Increase pod CPU limits vertically (e.g., from 3 to 4-6 cores) or scale replicas horizontally to distribute load. For event-loop applications, refactor blocking operations to non-blocking patterns using worker pools. Monitor thread counts and identify CPU-intensive code paths for optimization.