Network Error Rate Spike During Pod Churn
warningConnection failures and network errors spike during rolling deployments or node failures when pod IPs change faster than service mesh or load balancer updates can propagate. Downstream services continue sending traffic to terminating pods or stale endpoints.
Monitor kubernetes_network_errors increasing during periods of high pod creation/termination events. Correlate network error spikes with deployment rollouts or node failures. Check for error patterns like 'connection refused' or 'no route to host' in application logs coinciding with these network error increases.
Implement preStop lifecycle hooks with sufficient delay (10-30 seconds) to allow connection draining before pod termination. Configure appropriate terminationGracePeriodSeconds. Ensure load balancers and service meshes have aggressive health check intervals during deployments. Consider using readiness gates to coordinate with external load balancers.