MultiXact Member Space Exhaustion

criticalIncident Response

Emergency response when PostgreSQL approaches MultiXact member wraparound, which prevents all write operations and requires immediate vacuum intervention.

Prompt: My PostgreSQL database is throwing errors about MultiXact member space exhaustion and all writes are failing. I see messages about emergency autovacuum in the logs. What is MultiXact member space and how do I fix this before we have a complete outage?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,450vs1,132
Latency
27.8smsvs24.5sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
4,216 charsvs3,091 chars
Evaluation
Without Schema
B
vs
With Schema
A-
The response with Schema provides a more focused emergency response that includes a critical single-user mode option for completely locked databases, which the response without Schema omits entirely. While both responses offer good technical depth and diagnostic queries, the response without Schema asks clarifying questions at the end rather than providing immediate actionable guidance. The response with Schema also better communicates urgency with specific thresholds (95% usage, "hours at most") and organizes the information more effectively for emergency triage. The single-user mode fallback is a significant differentiator that could save critical time in a production outage scenario.
Without Schema
This is a critical PostgreSQL issue that needs immediate attention. Let me explain what's happening and how to fix it. ## What is MultiXact Member Space? **MultiXact** (Multi-Transaction) is PostgreSQL's mechanism for tracking multiple transactions that hold row-level locks on the same tuple. This happens when: - Multiple transactions use `SELECT ... FOR SHARE` or `SELECT ... FOR KEY SHARE` - Foreign keys are being checked - Multiple transactions are updating different columns of the same row The "member space" stores the list of transaction IDs that share locks. It's a fixed-size circular buffer (similar to transaction ID space), and when it wraps around without being vacuumed, you get exhaustion. ## Immediate Diagnosis Run these commands to check the severity: ```sql -- Check current MultiXact age and limits SELECT datname, age(datminmxid) as mxid_age, mxid_age(datminmxid) as multixact_age FROM pg_database ORDER BY mxid_age(datminmxid) DESC; -- Check which tables need vacuuming most urgently SELECT schemaname, tablename, n_tup_ins + n_tup_upd + n_tup_del as total_changes, last_autovacuum, last_vacuum, age(relfrozenxid) as xid_age, mxid_age(relminmxid) as mxid_age FROM pg_stat_user_tables t JOIN pg_class c ON c.relname = t.relname WHERE c.relkind = 'r' ORDER BY mxid_age DESC LIMIT 20; ``` ## Emergency Fixes (In Order) ### 1. **Immediate: Aggressive Manual VACUUM** ```sql -- Run VACUUM on the most problematic tables immediately -- Get the list from the query above and vacuum them VACUUM (VERBOSE, ANALYZE) your_table_name; -- If very urgent, run VACUUM FREEZE on critical tables VACUUM (FREEZE, VERBOSE, ANALYZE) your_table_name; ``` ### 2. **Temporarily Boost Autovacuum Aggressiveness** ```sql -- Make autovacuum more aggressive (requires superuser) ALTER SYSTEM SET autovacuum_max_workers = 6; -- increase workers ALTER SYSTEM SET autovacuum_naptime = '10s'; -- run more frequently ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 2000; -- allow more I/O ALTER SYSTEM SET vacuum_multixact_failsafe_age = 400000000; -- emergency vacuum threshold -- Reload configuration SELECT pg_reload_conf(); ``` ### 3. **Emergency: Kill Long-Running Transactions** Long-running transactions prevent vacuum from cleaning up: ```sql -- Find long-running transactions blocking cleanup SELECT pid, datname, usename, state, now() - xact_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' AND now() - xact_start > interval '5 minutes' ORDER BY duration DESC; -- Kill problematic ones (carefully!) SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid = <problematic_pid>; ``` ### 4. **Per-Table Emergency Vacuum** For the worst offenders: ```sql -- Set aggressive vacuum settings per table ALTER TABLE problematic_table SET ( autovacuum_vacuum_scale_factor = 0.01, -- vacuum at 1% changes autovacuum_vacuum_cost_delay = 0, -- no delay autovacuum_multixact_freeze_max_age = 400000000 ); -- Then immediately vacuum it VACUUM (FREEZE, VERBOSE, ANALYZE) problematic_table; ``` ## Long-Term Prevention After the emergency is resolved: 1. **Monitor MultiXact age** - set up alerts when it approaches 1.5 billion 2. **Tune autovacuum** - ensure it runs frequently enough for your workload 3. **Avoid long transactions** - they prevent cleanup 4. **Review locking patterns** - excessive FOR SHARE locks can cause buildup 5. **Consider increasing `autovacuum_multixact_freeze_max_age`** from default (400M) to 600M if needed ## Check Progress Monitor the situation: ```sql -- Watch vacuum progress SELECT * FROM pg_stat_progress_vacuum; -- Monitor MultiXact age trending down SELECT datname, mxid_age(datminmxid) as multixact_age FROM pg_database WHERE datname = current_database(); ``` **Critical**: Start with the manual VACUUM on your highest mxid_age tables immediately. The emergency autovacuum should be kicking in, but manual intervention is faster. Once writes are restored, tune autovacuum to prevent recurrence. What database version are you running? And can you share the output of the first diagnostic query? This will help me give more specific guidance.
With Schema
This is a **critical production issue** that needs immediate action. Here's what's happening and how to fix it: ## What is MultiXact Member Space? MultiXact (Multiple Transaction) IDs track situations where multiple transactions need to lock the same row simultaneously - typically with `SELECT FOR SHARE` locks or foreign key checks. PostgreSQL has limited space (2 billion entries) for these IDs, and when exhausted, all writes fail. ## Immediate Diagnosis Run these queries to check the severity: ```sql -- Check how close you are to wraparound SELECT setting::bigint AS max_members, ROUND(100.0 * members_used / max_members, 2) AS pct_used FROM pg_control_checkpoint(), (SELECT setting::bigint AS max_members FROM pg_settings WHERE name = 'max_multixact_members') s, (SELECT SUM(age(mxid)) AS members_used FROM pg_database) u; -- Check which tables need vacuuming most urgently SELECT schemaname, tablename, age(datminmxid) as mxid_age, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size FROM pg_tables t JOIN pg_database d ON d.datname = current_database() WHERE schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY age(datminmxid) DESC LIMIT 20; ``` ## Immediate Fix **Option 1: Aggressive Manual VACUUM (preferred if database is still accepting some commands)** ```sql -- In psql, run with multiple workers SET maintenance_work_mem = '2GB'; -- adjust based on available RAM VACUUM (VERBOSE, FREEZE) your_problem_tables; ``` **Option 2: Emergency single-user mode (if database is completely locked)** ```bash # Stop PostgreSQL sudo systemctl stop postgresql # Start in single-user mode (replace 'dbname' with your database) sudo -u postgres postgres --single -D /var/lib/postgresql/data dbname # In single-user mode, run: VACUUM FREEZE; \q # Restart normally sudo systemctl start postgresql ``` ## Configuration Changes (prevent recurrence) Add these to `postgresql.conf`: ```conf # Trigger autovacuum earlier autovacuum_multixact_freeze_max_age = 200000000 # down from 400M autovacuum_max_workers = 5 # increase if you have CPU available # Make autovacuum more aggressive autovacuum_vacuum_cost_limit = 1000 # increase from default 200 autovacuum_naptime = 10s # check more frequently # Monitor settings log_autovacuum_min_duration = 0 # log all autovacuum activity ``` Then reload: `SELECT pg_reload_conf();` ## Monitor Progress ```sql -- Watch autovacuum activity SELECT pid, datname, usename, state, query, now() - query_start AS duration FROM pg_stat_activity WHERE query LIKE '%autovacuum%' OR query LIKE '%VACUUM%'; ``` ## Root Cause Investigation (after fixing) Check for: - **Long-running transactions** blocking vacuum: `SELECT * FROM pg_stat_activity WHERE state != 'idle' ORDER BY xact_start;` - **Tables with foreign keys** (heavy MultiXact users) - **Replication lag** preventing old MultiXact cleanup - **Disabled autovacuum** on specific tables **Time-sensitive**: If over 95% used, you have hours at most. Act immediately with Option 1 or 2 above.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

MultiXact member space exhaustion is a critical emergency requiring immediate action. Start by checking how close you are to the ~4 billion member limit and identifying long-running transactions that block member space reclamation. Then verify emergency autovacuum is running, audit foreign key relationships causing quadratic member growth, and if necessary initiate manual data-only vacuums for faster recovery.

1Assess MultiXact member space consumption and confirm write blockage
Check `database_wraparound_age_datminmxid` immediately to see how close you are to the ~4 billion member limit that triggers write lockout. The member space is separate from MultiXact ID and fills quadratically when multiple transactions lock the same row. If you're seeing 'members space is nearly exhausted' errors, you're already in emergency territory and writes requiring new MultiXacts are blocked. This metric isn't exposed by standard PostgreSQL monitoring, so you may need to query pg_controldata or check member file storage directly.
2Identify and terminate long-running transactions blocking member reclamation
Query for long-running transactions using `postgresql.relation.xmin` and pg_stat_activity, focusing on transactions holding old MultiXacts. PostgreSQL can only reclaim member space in contiguous segments, so a single long-running transaction prevents vacuum from reclaiming newer unused member spaces. Terminate any non-critical long transactions immediately—this is an emergency. Avoid mixing long-running analytics queries with high-frequency transactional workloads during this crisis.
3Verify emergency autovacuum is running and progressing
Check `autovacuumed` metric and pg_stat_progress_vacuum to confirm emergency autovacuum workers are active and not blocked. PostgreSQL should automatically trigger emergency vacuums when member space is exhausted, but these can be blocked by locks or insufficient worker capacity. If autovacuum isn't running or is making no progress, you'll need to manually intervene. On a 30TB+ database, these vacuums can take hours, so confirm they're actually working.
4Audit foreign key relationships for quadratic member growth patterns
Investigate foreign keys, especially on low-cardinality enum tables or frequently-referenced parent tables. When n backends concurrently insert rows referencing the same parent row, PostgreSQL creates O(n) MultiXacts with O(n²) total members—5 concurrent transactions create 14 members. Check recent write patterns and identify tables with foreign keys experiencing high concurrent insert volume. You may need to temporarily drop problematic foreign keys to stop the bleeding while vacuum catches up.
5Check for bulk operations creating sustained MultiXact pressure
Look for ongoing partitioning migrations, backfill jobs, or other bulk write operations that create sustained MultiXact pressure. Table partitioning with dual writes and backfills can rapidly consume member space, especially when combined with foreign keys. If you're running monthly invoice generation or similar high-write scheduled jobs, pause them immediately. Lower backfill concurrency to near-zero until member space recovers.
6Initiate manual data-only vacuum on high dead-tuple tables
If emergency autovacuum is insufficient or blocked, manually run VACUUM (FREEZE, VERBOSE) on tables with highest `n_dead_tup` counts, but skip index vacuums for faster recovery. Use the data-only vacuum strategy—indexes can wait, you need to reclaim member space now. Focus on the oldest tables by MultiXact age first. Increase autovacuum workers temporarily if you have CPU headroom. Each table vacuum on large databases takes hours, so prioritize ruthlessly.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

PostgreSQL Datadog
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)