Disk I/O Bottleneck Masquerading as Application Slowness
criticalApplication latency increases and container restarts occur due to disk stalls or slow persistent volume performance, but manifest as generic timeouts or OOM kills. The underlying storage bottleneck is hidden by higher-level symptoms.
When kubernetes_filesystem_usage_pct is moderate (<80%) but application response times degrade significantly, check for disk I/O contention. Look for disk stall logs (operations taking >20 seconds) or high kubernetes_diskio_io_service_size_stats. Container restarts without clear memory/CPU cause often indicate I/O timeouts. Correlate application slowness with storage backend metrics.
Provision storage with sufficient IOPS/bandwidth for workload requirements. For cloud environments, ensure PersistentVolumes are backed by appropriate storage classes (e.g., gp3 instead of gp2 in AWS). Implement disk operation timeouts at the application level. Consider using faster storage tiers for I/O-intensive workloads or implement caching layers to reduce disk pressure.