CrewAIKubernetes

Agent Memory Usage Spike and OOMKill

critical
Resource ContentionUpdated Feb 23, 2026

CrewAI web pods experience OOMKilled restarts when memory limits are insufficient for concurrent agent workloads, especially with high WEB_CONCURRENCY and RAILS_MAX_THREADS settings, causing service disruptions.

How to detect:

Monitor pod memory usage trends and OOMKilled events. Track memory consumption per agent session and correlate with concurrency settings. Alert when memory usage exceeds 85% of limits or when OOMKilled events occur. Watch for memory leak patterns (steady increase over time).

Recommended action:

Increase pod memory limits (e.g., from 12Gi to 16Gi) based on observed usage. Reduce WEB_CONCURRENCY and RAILS_MAX_THREADS to lower memory footprint per pod. Implement horizontal pod autoscaling based on memory usage. Profile agent workloads to identify memory leaks. Monitor actual memory usage patterns to right-size limits without over-provisioning.