Transaction ID Wraparound Emergency Response

criticalIncident Response

Emergency response when PostgreSQL approaches or reaches transaction ID wraparound, requiring immediate vacuum operations to prevent database shutdown.

Prompt: I'm getting FATAL errors about transaction ID wraparound in PostgreSQL and the database is warning it will shut down in 1 million transactions. The age of the oldest transaction is over 2 billion. What do I need to do right now to prevent an outage?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
823vs1,018
Latency
18.9smsvs22.7sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
1,884 charsvs2,476 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
The response with Schema provides more sophisticated emergency handling by offering both full-database and targeted table-by-table approaches, along with specific vacuum performance tuning parameters (vacuum_cost_delay, maintenance_work_mem) that could significantly speed up recovery. It includes more comprehensive autovacuum configuration and explicitly warns against common mistakes like restarting PostgreSQL. Both responses correctly identify the emergency and provide actionable commands, but the Schema-enhanced version offers more efficient recovery paths and better monitoring tools (pg_stat_progress_vacuum). The additional technical depth and optimization options could save critical time during an emergency where every transaction counts.
Without Schema
This is a **critical emergency** - you're very close to an automatic PostgreSQL shutdown. Here's what you need to do immediately: ## Immediate Actions (RIGHT NOW) ### 1. Check Current Status ```bash psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;" ``` ### 2. Emergency VACUUM FREEZE Run this on the database with the highest age (likely ~2.1 billion): ```bash # For the most critical database: vacuumdb --freeze --verbose --dbname=<your_database_name> # Or if you need to do all databases: vacuumdb --all --freeze --verbose ``` **WARNING**: This will be I/O intensive and may slow your database, but it's necessary to prevent shutdown. ### 3. Monitor Progress In another terminal, watch the age decrease: ```bash watch -n 30 'psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;"' ``` ## Critical Context - **Danger zone**: age > 2 billion (you're here) - **Automatic shutdown**: age reaches ~2.14 billion (2^31 - 10 million safety margin) - **You have**: ~1 million transactions left = minutes to hours depending on your transaction rate ## Why This Happened Your autovacuum likely isn't keeping up due to: - Very large tables - Long-running transactions blocking VACUUM - Autovacuum disabled or misconfigured - High transaction rate ## After the Emergency Once age is below 200 million, check: ```bash # Find long-running transactions: psql -c "SELECT pid, usename, datname, state, age(backend_xid) as xid_age, age(backend_xmin) as xmin_age FROM pg_stat_activity WHERE backend_xid IS NOT NULL OR backend_xmin IS NOT NULL ORDER BY GREATEST(age(backend_xid), age(backend_xmin)) DESC;" # Check autovacuum settings: psql -c "SHOW autovacuum; SHOW autovacuum_freeze_max_age;" ``` **Do you need help running these commands, or do you have questions about your specific PostgreSQL setup?**
With Schema
This is a **critical emergency** - you have very little time before automatic shutdown. Here's what to do immediately: ## Immediate Actions (DO NOW) ### 1. Check which database(s) are at risk: ```bash psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;" ``` ### 2. Start aggressive VACUUM FREEZE on the problem database(s): For each database showing age > 2 billion: ```bash # Connect to the problem database psql -d <database_name> # Run VACUUM FREEZE on all tables (this will take time but is necessary) VACUUM FREEZE VERBOSE; ``` **OR** if you need more control, target the worst tables first: ```sql -- Find tables with oldest transaction IDs SELECT schemaname, relname, age(relfrozenxid) as xid_age, n_dead_tup FROM pg_stat_all_tables WHERE age(relfrozenxid) > 1000000000 ORDER BY age(relfrozenxid) DESC LIMIT 20; -- Then VACUUM FREEZE each one: VACUUM FREEZE VERBOSE schema.table_name; ``` ### 3. Temporarily boost vacuum performance: ```sql -- Increase vacuum resources (adjust based on your system) SET vacuum_cost_delay = 0; -- Remove vacuum throttling SET maintenance_work_mem = '2GB'; -- More memory for vacuum ``` ## Prevent Future Wraparound Once the immediate crisis is resolved: 1. **Enable autovacuum** (if disabled): ```sql ALTER DATABASE <dbname> SET autovacuum = on; ``` 2. **Make autovacuum more aggressive** in `postgresql.conf`: ```ini autovacuum_max_workers = 5 autovacuum_naptime = 15s autovacuum_vacuum_cost_delay = 10ms autovacuum_vacuum_scale_factor = 0.05 autovacuum_freeze_max_age = 200000000 ``` 3. **Check for blocking issues**: ```sql -- Long-running transactions prevent vacuum from advancing SELECT pid, usename, state, age(backend_xid), query_start, query FROM pg_stat_activity WHERE age(backend_xid) IS NOT NULL ORDER BY age(backend_xid) DESC; ``` ## Critical Notes - **Don't restart PostgreSQL** - that won't help and wastes time - **VACUUM FREEZE cannot be interrupted** once started without losing progress - **Monitor progress**: Watch the `age(datfrozenxid)` value decrease - **This will cause I/O load** - the database may be slower but this is necessary - If you have replicas, they'll catch up after the vacuum completes ## How long will this take? VACUUM FREEZE can take hours on large databases. Monitor with: ```sql SELECT * FROM pg_stat_progress_vacuum; ``` Do you need help identifying which database is affected or running these commands?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

In a transaction ID wraparound emergency with only 1 million transactions remaining, your first priority is to eliminate anything blocking VACUUM from advancing relfrozenxid—primarily long-running transactions and prepared transactions. Then immediately identify and manually VACUUM FREEZE the specific tables with the oldest transaction IDs. Speed is critical; the database will shut down when the counter reaches zero.

1Immediately terminate all long-running transactions
Query `pg_stat_activity` and check `age(backend_xid)` and `age(backend_xmin)` for every active session. Any transaction with age > 10 million is blocking vacuum from advancing. Use `pg_terminate_backend(pid)` to kill these sessions immediately—don't wait for graceful completion. Even a single idle-in-transaction session can prevent the entire database from freezing old XIDs. Check `postgresql.activity.xact_start_age` to find transactions that have been open for hours or days.
2Clean up old prepared transactions
Check `pg_prepared_xacts` for any rows where `age(transaction)` is large (> 1 million). Prepared transactions from two-phase commit protocols can sit indefinitely and block vacuum just like regular transactions. Either `COMMIT PREPARED` or `ROLLBACK PREPARED` each one immediately. In my experience, these are often forgotten transactions from failed application deployments or crashed transaction coordinators.
3Identify the tables closest to wraparound
Query `SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 20`. The tables at the top of this list are your problem—when any table reaches ~2.14 billion XID age, the database will refuse writes. Check `database_wraparound_age_datfrozenxid` and `before_xid_wraparound` metrics to see exactly how many transactions remain. Focus your VACUUM efforts on the oldest tables first; you don't need to freeze everything, just enough to buy breathing room.
4Manually VACUUM FREEZE the worst tables immediately
Don't wait for autovacuum—run `VACUUM (FREEZE, VERBOSE)` manually on the top 5-10 tables with the oldest `relfrozenxid`. The VERBOSE output shows progress per block, which is critical for tracking during an emergency. Large tables may take hours; if you're truly out of time, consider `VACUUM FREEZE` on just the system catalogs first to prevent immediate shutdown. Monitor `autovacuumed` and `n_dead_tup` to track progress. Be aware that emergency autovacuum will also be running at this point, possibly causing load spikes.
5Monitor for new blockers during the vacuum
While your manual VACUUM FREEZE runs, continuously re-check `pg_stat_activity` for new long-running transactions that could stop the vacuum from advancing `relfrozenxid`. A vacuum that completes but can't update the frozen XID is wasted work. Kill any new sessions with large `backend_xid` age immediately. Use `postgresql.transactions_open` to track the current transaction count—if it's still climbing rapidly, you may run out of XIDs before the vacuum completes.
6Check for disabled autovacuum on problem tables
Once the immediate crisis is contained, query `pg_class` and check if any tables have `reloptions` containing `autovacuum_enabled=false`. Even a single unused test table with autovacuum disabled can be the root cause of the entire database reaching wraparound. The insight here is that forgotten or abandoned tables are often the culprit, not your active production tables. Drop genuinely unused tables, or re-enable autovacuum and set aggressive `autovacuum_freeze_max_age` if they must remain.
7Set up monitoring to prevent recurrence
Transaction ID wraparound provides no gradual performance warning—CPU, memory, and I/O look normal until sudden failure. Do not rely on traditional capacity monitoring. Immediately implement alerts on `database_wraparound_age_datfrozenxid` at 150-180 million XID age per table, well before the 200 million emergency threshold. This failure mode cannot be reproduced in staging or testing; it only emerges after months of production use, so production monitoring is your only defense.

Technologies

Related Insights

Transaction ID wraparound provides no gradual performance warning before failure
warning
Transaction ID wraparound cannot be reproduced in short-term testing
info
Long-running transactions prevent vacuum from advancing relfrozenxid
critical
Old prepared transactions prevent vacuum from advancing relfrozenxid
critical
Disabled autovacuum causes severe bloat and eventual outage risk
critical
Unused tables with disabled autovacuum block entire database at wraparound
critical
Transaction ID wraparound forces emergency autovacuum despite disabled autovacuum
diagnostic_stepcritical
Transaction ID wraparound risk causes database shutdown
outcomecritical
PostgreSQL's 32-bit transaction ID counter approaches exhaustion (2 billion limit), risking forced database shutdown. Tables with age(relfrozenxid) approaching autovacuum_freeze_max_age require aggressive freezing VACUUMs to prevent wraparound.
long-running transactions preventing freeze
prerequisite
Stub insight created from scenario discovery hint: long-running transactions preventing freeze
autovacuum freeze not running
prerequisite
Stub insight created from scenario discovery hint: autovacuum freeze not running
risk of forced database shutdown
outcome
Stub insight created from scenario discovery hint: risk of forced database shutdown

Relevant Metrics

Monitoring Interfaces

PostgreSQL Datadog
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Datadog Integration