Disk I/O Bottleneck Masquerading as Application Slowness

critical

latencyUpdated Jan 10, 2026

Application latency increases and container restarts occur due to disk stalls or slow persistent volume performance, but manifest as generic timeouts or OOM kills. The underlying storage bottleneck is hidden by higher-level symptoms.

Sources

Case Study: Fixing FastAPI Event Loop Blocking in a High-Traffic API - techbuddies.iowww.techbuddies.io

Critical Log Messageswww.cockroachlabs.com

Technologies:

KubernetesSymptoms of this issue are visible in Kubernetes metrics and logs

How to detect:

When kubernetes_filesystem_usage_pct is moderate (<80%) but application response times degrade significantly, check for disk I/O contention. Look for disk stall logs (operations taking >20 seconds) or high kubernetes_diskio_io_service_size_stats. Container restarts without clear memory/CPU cause often indicate I/O timeouts. Correlate application slowness with storage backend metrics.

Recommended action:

Provision storage with sufficient IOPS/bandwidth for workload requirements. For cloud environments, ensure PersistentVolumes are backed by appropriate storage classes (e.g., gp3 instead of gp2 in AWS). Implement disk operation timeouts at the application level. Consider using faster storage tiers for I/O-intensive workloads or implement caching layers to reduce disk pressure.