This is a critical PostgreSQL issue that needs immediate attention. Let me explain what's happening and how to fix it.
## What is MultiXact Member Space?
**MultiXact** (Multi-Transaction) is PostgreSQL's mechanism for tracking multiple transactions that hold row-level locks on the same tuple. This happens when:
- Multiple transactions use `SELECT ... FOR SHARE` or `SELECT ... FOR KEY SHARE`
- Foreign keys are being checked
- Multiple transactions are updating different columns of the same row
The "member space" stores the list of transaction IDs that share locks. It's a fixed-size circular buffer (similar to transaction ID space), and when it wraps around without being vacuumed, you get exhaustion.
## Immediate Diagnosis
Run these commands to check the severity:
```sql
-- Check current MultiXact age and limits
SELECT datname,
age(datminmxid) as mxid_age,
mxid_age(datminmxid) as multixact_age
FROM pg_database
ORDER BY mxid_age(datminmxid) DESC;
-- Check which tables need vacuuming most urgently
SELECT schemaname, tablename,
n_tup_ins + n_tup_upd + n_tup_del as total_changes,
last_autovacuum,
last_vacuum,
age(relfrozenxid) as xid_age,
mxid_age(relminmxid) as mxid_age
FROM pg_stat_user_tables t
JOIN pg_class c ON c.relname = t.relname
WHERE c.relkind = 'r'
ORDER BY mxid_age DESC
LIMIT 20;
```
## Emergency Fixes (In Order)
### 1. **Immediate: Aggressive Manual VACUUM**
```sql
-- Run VACUUM on the most problematic tables immediately
-- Get the list from the query above and vacuum them
VACUUM (VERBOSE, ANALYZE) your_table_name;
-- If very urgent, run VACUUM FREEZE on critical tables
VACUUM (FREEZE, VERBOSE, ANALYZE) your_table_name;
```
### 2. **Temporarily Boost Autovacuum Aggressiveness**
```sql
-- Make autovacuum more aggressive (requires superuser)
ALTER SYSTEM SET autovacuum_max_workers = 6; -- increase workers
ALTER SYSTEM SET autovacuum_naptime = '10s'; -- run more frequently
ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 2000; -- allow more I/O
ALTER SYSTEM SET vacuum_multixact_failsafe_age = 400000000; -- emergency vacuum threshold
-- Reload configuration
SELECT pg_reload_conf();
```
### 3. **Emergency: Kill Long-Running Transactions**
Long-running transactions prevent vacuum from cleaning up:
```sql
-- Find long-running transactions blocking cleanup
SELECT pid, datname, usename, state,
now() - xact_start AS duration,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND now() - xact_start > interval '5 minutes'
ORDER BY duration DESC;
-- Kill problematic ones (carefully!)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = <problematic_pid>;
```
### 4. **Per-Table Emergency Vacuum**
For the worst offenders:
```sql
-- Set aggressive vacuum settings per table
ALTER TABLE problematic_table SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum at 1% changes
autovacuum_vacuum_cost_delay = 0, -- no delay
autovacuum_multixact_freeze_max_age = 400000000
);
-- Then immediately vacuum it
VACUUM (FREEZE, VERBOSE, ANALYZE) problematic_table;
```
## Long-Term Prevention
After the emergency is resolved:
1. **Monitor MultiXact age** - set up alerts when it approaches 1.5 billion
2. **Tune autovacuum** - ensure it runs frequently enough for your workload
3. **Avoid long transactions** - they prevent cleanup
4. **Review locking patterns** - excessive FOR SHARE locks can cause buildup
5. **Consider increasing `autovacuum_multixact_freeze_max_age`** from default (400M) to 600M if needed
## Check Progress
Monitor the situation:
```sql
-- Watch vacuum progress
SELECT * FROM pg_stat_progress_vacuum;
-- Monitor MultiXact age trending down
SELECT datname, mxid_age(datminmxid) as multixact_age
FROM pg_database
WHERE datname = current_database();
```
**Critical**: Start with the manual VACUUM on your highest mxid_age tables immediately. The emergency autovacuum should be kicking in, but manual intervention is faster. Once writes are restored, tune autovacuum to prevent recurrence.
What database version are you running? And can you share the output of the first diagnostic query? This will help me give more specific guidance.
This is a **critical production issue** that needs immediate action. Here's what's happening and how to fix it:
## What is MultiXact Member Space?
MultiXact (Multiple Transaction) IDs track situations where multiple transactions need to lock the same row simultaneously - typically with `SELECT FOR SHARE` locks or foreign key checks. PostgreSQL has limited space (2 billion entries) for these IDs, and when exhausted, all writes fail.
## Immediate Diagnosis
Run these queries to check the severity:
```sql
-- Check how close you are to wraparound
SELECT setting::bigint AS max_members,
ROUND(100.0 * members_used / max_members, 2) AS pct_used
FROM pg_control_checkpoint(),
(SELECT setting::bigint AS max_members
FROM pg_settings WHERE name = 'max_multixact_members') s,
(SELECT SUM(age(mxid)) AS members_used
FROM pg_database) u;
-- Check which tables need vacuuming most urgently
SELECT schemaname, tablename,
age(datminmxid) as mxid_age,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables t
JOIN pg_database d ON d.datname = current_database()
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY age(datminmxid) DESC
LIMIT 20;
```
## Immediate Fix
**Option 1: Aggressive Manual VACUUM (preferred if database is still accepting some commands)**
```sql
-- In psql, run with multiple workers
SET maintenance_work_mem = '2GB'; -- adjust based on available RAM
VACUUM (VERBOSE, FREEZE) your_problem_tables;
```
**Option 2: Emergency single-user mode (if database is completely locked)**
```bash
# Stop PostgreSQL
sudo systemctl stop postgresql
# Start in single-user mode (replace 'dbname' with your database)
sudo -u postgres postgres --single -D /var/lib/postgresql/data dbname
# In single-user mode, run:
VACUUM FREEZE;
\q
# Restart normally
sudo systemctl start postgresql
```
## Configuration Changes (prevent recurrence)
Add these to `postgresql.conf`:
```conf
# Trigger autovacuum earlier
autovacuum_multixact_freeze_max_age = 200000000 # down from 400M
autovacuum_max_workers = 5 # increase if you have CPU available
# Make autovacuum more aggressive
autovacuum_vacuum_cost_limit = 1000 # increase from default 200
autovacuum_naptime = 10s # check more frequently
# Monitor settings
log_autovacuum_min_duration = 0 # log all autovacuum activity
```
Then reload: `SELECT pg_reload_conf();`
## Monitor Progress
```sql
-- Watch autovacuum activity
SELECT pid, datname, usename, state, query,
now() - query_start AS duration
FROM pg_stat_activity
WHERE query LIKE '%autovacuum%' OR query LIKE '%VACUUM%';
```
## Root Cause Investigation (after fixing)
Check for:
- **Long-running transactions** blocking vacuum: `SELECT * FROM pg_stat_activity WHERE state != 'idle' ORDER BY xact_start;`
- **Tables with foreign keys** (heavy MultiXact users)
- **Replication lag** preventing old MultiXact cleanup
- **Disabled autovacuum** on specific tables
**Time-sensitive**: If over 95% used, you have hours at most. Act immediately with Option 1 or 2 above.