Transaction ID Wraparound Emergency Response

criticalIncident Response

Emergency response when PostgreSQL approaches or reaches transaction ID wraparound, requiring immediate vacuum operations to prevent database shutdown.

Prompt: “I'm getting FATAL errors about transaction ID wraparound in PostgreSQL and the database is warning it will shut down in 1 million transactions. The age of the oldest transaction is over 2 billion. What do I need to do right now to prevent an outage?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

823vs1,018

Latency

18.9smsvs22.7sms

Turns

2vs2

Tool Calls

0vs0

Schema Calls

0vs0

Response

1,884 charsvs2,476 chars

Evaluation

Without Schema

B+

With Schema

A-

The response with Schema provides more sophisticated emergency handling by offering both full-database and targeted table-by-table approaches, along with specific vacuum performance tuning parameters (vacuum_cost_delay, maintenance_work_mem) that could significantly speed up recovery. It includes more comprehensive autovacuum configuration and explicitly warns against common mistakes like restarting PostgreSQL. Both responses correctly identify the emergency and provide actionable commands, but the Schema-enhanced version offers more efficient recovery paths and better monitoring tools (pg_stat_progress_vacuum). The additional technical depth and optimization options could save critical time during an emergency where every transaction counts.

Without Schema

This is a **critical emergency** - you're very close to an automatic PostgreSQL shutdown. Here's what you need to do immediately:

## Immediate Actions (RIGHT NOW)

### 1. Check Current Status
```bash
psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;"
```

### 2. Emergency VACUUM FREEZE
Run this on the database with the highest age (likely ~2.1 billion):

```bash
# For the most critical database:
vacuumdb --freeze --verbose --dbname=<your_database_name>

# Or if you need to do all databases:
vacuumdb --all --freeze --verbose
```

**WARNING**: This will be I/O intensive and may slow your database, but it's necessary to prevent shutdown.

### 3. Monitor Progress
In another terminal, watch the age decrease:
```bash
watch -n 30 'psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;"'
```

## Critical Context

- **Danger zone**: age > 2 billion (you're here)
- **Automatic shutdown**: age reaches ~2.14 billion (2^31 - 10 million safety margin)
- **You have**: ~1 million transactions left = minutes to hours depending on your transaction rate

## Why This Happened

Your autovacuum likely isn't keeping up due to:
- Very large tables
- Long-running transactions blocking VACUUM
- Autovacuum disabled or misconfigured
- High transaction rate

## After the Emergency

Once age is below 200 million, check:
```bash
# Find long-running transactions:
psql -c "SELECT pid, usename, datname, state, age(backend_xid) as xid_age, age(backend_xmin) as xmin_age FROM pg_stat_activity WHERE backend_xid IS NOT NULL OR backend_xmin IS NOT NULL ORDER BY GREATEST(age(backend_xid), age(backend_xmin)) DESC;"

# Check autovacuum settings:
psql -c "SHOW autovacuum; SHOW autovacuum_freeze_max_age;"
```

**Do you need help running these commands, or do you have questions about your specific PostgreSQL setup?**

With Schema

This is a **critical emergency** - you have very little time before automatic shutdown. Here's what to do immediately:

## Immediate Actions (DO NOW)

### 1. Check which database(s) are at risk:
```bash
psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;"
```

### 2. Start aggressive VACUUM FREEZE on the problem database(s):

For each database showing age > 2 billion:

```bash
# Connect to the problem database
psql -d <database_name>

# Run VACUUM FREEZE on all tables (this will take time but is necessary)
VACUUM FREEZE VERBOSE;
```

**OR** if you need more control, target the worst tables first:

```sql
-- Find tables with oldest transaction IDs
SELECT 
    schemaname, 
    relname, 
    age(relfrozenxid) as xid_age,
    n_dead_tup
FROM pg_stat_all_tables 
WHERE age(relfrozenxid) > 1000000000
ORDER BY age(relfrozenxid) DESC 
LIMIT 20;

-- Then VACUUM FREEZE each one:
VACUUM FREEZE VERBOSE schema.table_name;
```

### 3. Temporarily boost vacuum performance:

```sql
-- Increase vacuum resources (adjust based on your system)
SET vacuum_cost_delay = 0;  -- Remove vacuum throttling
SET maintenance_work_mem = '2GB';  -- More memory for vacuum
```

## Prevent Future Wraparound

Once the immediate crisis is resolved:

1. **Enable autovacuum** (if disabled):
```sql
ALTER DATABASE <dbname> SET autovacuum = on;
```

2. **Make autovacuum more aggressive** in `postgresql.conf`:
```ini
autovacuum_max_workers = 5
autovacuum_naptime = 15s
autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_scale_factor = 0.05
autovacuum_freeze_max_age = 200000000
```

3. **Check for blocking issues**:
```sql
-- Long-running transactions prevent vacuum from advancing
SELECT pid, usename, state, age(backend_xid), query_start, query 
FROM pg_stat_activity 
WHERE age(backend_xid) IS NOT NULL 
ORDER BY age(backend_xid) DESC;
```

## Critical Notes

- **Don't restart PostgreSQL** - that won't help and wastes time
- **VACUUM FREEZE cannot be interrupted** once started without losing progress
- **Monitor progress**: Watch the `age(datfrozenxid)` value decrease
- **This will cause I/O load** - the database may be slower but this is necessary
- If you have replicas, they'll catch up after the vacuum completes

## How long will this take?

VACUUM FREEZE can take hours on large databases. Monitor with:
```sql
SELECT * FROM pg_stat_progress_vacuum;
```

Do you need help identifying which database is affected or running these commands?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

In a transaction ID wraparound emergency with only 1 million transactions remaining, your first priority is to eliminate anything blocking VACUUM from advancing relfrozenxid—primarily long-running transactions and prepared transactions. Then immediately identify and manually VACUUM FREEZE the specific tables with the oldest transaction IDs. Speed is critical; the database will shut down when the counter reaches zero.

1Immediately terminate all long-running transactions

Query `pg_stat_activity` and check `age(backend_xid)` and `age(backend_xmin)` for every active session. Any transaction with age > 10 million is blocking vacuum from advancing. Use `pg_terminate_backend(pid)` to kill these sessions immediately—don't wait for graceful completion. Even a single idle-in-transaction session can prevent the entire database from freezing old XIDs. Check `postgresql.activity.xact_start_age` to find transactions that have been open for hours or days.

Long-running transactions prevent vacuum from advancing relfrozenxid long-running transactions preventing freeze backend_xidpostgresql.activity.xact_start_agetransactions_open

2Clean up old prepared transactions

Check `pg_prepared_xacts` for any rows where `age(transaction)` is large (> 1 million). Prepared transactions from two-phase commit protocols can sit indefinitely and block vacuum just like regular transactions. Either `COMMIT PREPARED` or `ROLLBACK PREPARED` each one immediately. In my experience, these are often forgotten transactions from failed application deployments or crashed transaction coordinators.

Old prepared transactions prevent vacuum from advancing relfrozenxid

3Identify the tables closest to wraparound

Query `SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 20`. The tables at the top of this list are your problem—when any table reaches ~2.14 billion XID age, the database will refuse writes. Check `database_wraparound_age_datfrozenxid` and `before_xid_wraparound` metrics to see exactly how many transactions remain. Focus your VACUUM efforts on the oldest tables first; you don't need to freeze everything, just enough to buy breathing room.

Transaction ID wraparound risk causes database shutdown Unused tables with disabled autovacuum block entire database at wraparound database_wraparound_age_datfrozenxidbefore_xid_wraparound

4Manually VACUUM FREEZE the worst tables immediately

Don't wait for autovacuum—run `VACUUM (FREEZE, VERBOSE)` manually on the top 5-10 tables with the oldest `relfrozenxid`. The VERBOSE output shows progress per block, which is critical for tracking during an emergency. Large tables may take hours; if you're truly out of time, consider `VACUUM FREEZE` on just the system catalogs first to prevent immediate shutdown. Monitor `autovacuumed` and `n_dead_tup` to track progress. Be aware that emergency autovacuum will also be running at this point, possibly causing load spikes.

Transaction ID wraparound risk causes database shutdown Transaction ID wraparound forces emergency autovacuum despite disabled autovacuum autovacuumed

5Monitor for new blockers during the vacuum

While your manual VACUUM FREEZE runs, continuously re-check `pg_stat_activity` for new long-running transactions that could stop the vacuum from advancing `relfrozenxid`. A vacuum that completes but can't update the frozen XID is wasted work. Kill any new sessions with large `backend_xid` age immediately. Use `postgresql.transactions_open` to track the current transaction count—if it's still climbing rapidly, you may run out of XIDs before the vacuum completes.

Long-running transactions prevent vacuum from advancing relfrozenxid transactions_openbackend_xid

6Check for disabled autovacuum on problem tables

Once the immediate crisis is contained, query `pg_class` and check if any tables have `reloptions` containing `autovacuum_enabled=false`. Even a single unused test table with autovacuum disabled can be the root cause of the entire database reaching wraparound. The insight here is that forgotten or abandoned tables are often the culprit, not your active production tables. Drop genuinely unused tables, or re-enable autovacuum and set aggressive `autovacuum_freeze_max_age` if they must remain.

Disabled autovacuum causes severe bloat and eventual outage risk Unused tables with disabled autovacuum block entire database at wraparound autovacuumed

7Set up monitoring to prevent recurrence

Transaction ID wraparound provides no gradual performance warning—CPU, memory, and I/O look normal until sudden failure. Do not rely on traditional capacity monitoring. Immediately implement alerts on `database_wraparound_age_datfrozenxid` at 150-180 million XID age per table, well before the 200 million emergency threshold. This failure mode cannot be reproduced in staging or testing; it only emerges after months of production use, so production monitoring is your only defense.

Transaction ID wraparound provides no gradual performance warning before failure Transaction ID wraparound cannot be reproduced in short-term testing database_wraparound_age_datfrozenxid