Memory Exhaustion and OOM Killer Prevention
criticalIncident Response
Preventing PostgreSQL crashes from Linux OOM killer when work_mem settings multiply across concurrent sessions causing system memory exhaustion.
Prompt: “My PostgreSQL server keeps crashing and the Linux OOM killer is terminating the postgres process. I have work_mem set to 256MB but we have hundreds of concurrent connections. How do I calculate safe memory limits and prevent these crashes?”
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
Memory exhaustion from OOM killer is almost always caused by work_mem multiplying across concurrent connections beyond available RAM. Start by calculating your total potential memory consumption (max_connections × work_mem × operations_per_query), then verify actual usage patterns, and finally tune Linux overcommit settings to prevent unpredictable kills.
1Calculate total potential work_mem consumption
With work_mem=256MB and hundreds of connections, you're at immediate risk. Work_mem is allocated per operation, not per connection—a single query with 3 hash joins can consume 768MB. Check `postgresql.connection.max` and calculate worst case: if max_connections=200, that's 200 × 256MB × 3 operations = 150GB potential consumption. The insight on work_mem concurrent exhaustion shows this is the primary cause of OOM kills. Start by reducing work_mem to 64MB globally and only increase for specific analytical workloads via session-level SET commands.
2Check actual connection usage and patterns
Look at `connections_by_process` to see how many connections are actually active versus idle. Many applications over-provision max_connections when connection pooling would be better. If you're seeing >300 connections with `postgresql.connection.max`, you're burning ~10MB per connection just for overhead before any work_mem allocation. The max-connections-memory-exhaustion insight confirms that keeping max_connections between 100-200 and using pgBouncer for connection pooling prevents this waste.
3Review temp file usage to validate work_mem adequacy
Check `temp_bytes` to see if queries are spilling to disk because work_mem is too small. High temp file creation means work_mem is insufficient for actual workloads, but in your scenario with 256MB work_mem causing OOM kills, you likely have the opposite problem—work_mem is too generous for your connection count. If temp_bytes is low, your current work_mem is adequate; reduce it to prevent OOM. If temp_bytes is high after reducing work_mem, use session-level adjustments for heavy queries rather than increasing the global setting.
4Configure Linux memory overcommit to prevent unpredictable OOM kills
Set vm.overcommit_memory=2 and vm.overcommit_ratio=50-80 in /etc/sysctl.conf to make the kernel refuse allocations that would exceed available RAM rather than allowing overcommit and invoking OOM killer unpredictably. The linux-overcommit-oom-kills-postgres insight shows that default overcommit settings (vm.overcommit_memory=0) cause OOM killer to terminate PostgreSQL even when operating within configured limits. With strict overcommit, you'll get predictable malloc failures instead of random process terminations, making memory issues visible before they become crashes.
5Calculate total memory allocation across all PostgreSQL settings
Review shared_buffers (check `buffercache_summary_buffer_used`), maintenance_work_mem, and autovacuum_work_mem in addition to work_mem. The oom-killer-invoked-postgresql insight confirms you need to calculate total requirements versus available RAM. Typical safe allocation: shared_buffers=25% of RAM, then allocate remaining for work_mem considering max_connections. On a 32GB server, that's ~8GB for shared_buffers, leaving 20GB (after OS overhead) for work_mem—with max_connections=200, that means 100MB max per connection, but remember operations multiply this.
6Monitor for memory pressure and swapping
Track system memory usage and swap activity continuously—any swapping means you're already in trouble. The low-memory-swapping-crash insight shows that swapping degrades performance before causing crashes. Use `vmstat 1` or similar to watch for swap activity and `free -m` to check available memory. If you see swapping with your current settings, you've confirmed memory exhaustion is imminent, and the OOM killer will strike when a memory spike occurs during concurrent query execution.
Technologies
Related Insights
Linux memory overcommit can cause OOM killer to terminate PostgreSQL
critical
Linux OOM killer terminates PostgreSQL processes
critical
Insufficient memory allocation when work_mem multiplies across concurrent sessions
critical
Low memory availability causes swapping and database crashes
critical
Excessive work_mem allocation can exhaust system memory under concurrent load
prerequisitecritical
Excessive work_mem allocation causes OOM Killer to terminate PostgreSQL
outcomecritical
Excessive database queries slow response times
supporting_evidencewarning
Excessive max_connections consumes memory and requires connection pooling
prerequisitewarning
Relevant Metrics
Monitoring Interfaces
PostgreSQL Native