Kubernetes Pod Memory OOM Before Limits Reached

critical

Resource ContentionUpdated Nov 18, 2024

Kubernetes pods may be terminated by the OS out-of-memory killer before reaching their configured memory limits, especially when memory requests are set significantly lower than limits. This disconnect between reservation and actual usage causes unexpected pod evictions.

Sources

How to prevent ECS tasks from being terminated because too much memory is used. | AWS re:Postrepost.aws

Technologies:

KubernetesThe root cause of this issue originates in Kubernetes

How to detect:

Monitor kubernetes_memory_usage approaching kubernetes_memory_requested while still below kubernetes_memory_limits. If pods are terminated with exit code 137 (OOM killed) while memory usage is below limits but close to requests, this indicates kernel-level OOM rather than Kubernetes limit enforcement. Cross-reference with node-level allocatable memory exhaustion.

Recommended action:

Increase memory requests to match realistic usage patterns (aim for requests at 70-80% of limits rather than 50%). For critical workloads, set requests equal to limits to guarantee resources. Review pod termination logs for OOM signatures and correlate with node memory pressure events.