Misleading Infrastructure Health During Application-Layer Blocking
warningWhen event loop blocking is the bottleneck, traditional infrastructure metrics (CPU, Redis ops, network I/O) appear healthy while request latency and timeouts increase - creating a diagnostic blind spot that delays root cause identification.
Alert when request timeout rate increases or p95/p99 latency degrades while simultaneously: CPU utilization remains moderate (50-60%), redis.stats.instantaneous_ops_per_sec shows headroom, redis.memory.used is stable, and redis.net.input/output show no saturation. This divergence pattern indicates application-layer rather than infrastructure issues.
Implement event loop monitoring instrumentation (asyncio task tracking, loop lag metrics) alongside infrastructure metrics. When this pattern appears, focus investigation on application code paths rather than scaling infrastructure. Add structured logging around blocking operations to identify hot paths.