When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
In a transaction ID wraparound emergency with only 1 million transactions remaining, your first priority is to eliminate anything blocking VACUUM from advancing relfrozenxid—primarily long-running transactions and prepared transactions. Then immediately identify and manually VACUUM FREEZE the specific tables with the oldest transaction IDs. Speed is critical; the database will shut down when the counter reaches zero.
1Immediately terminate all long-running transactions
Query `pg_stat_activity` and check `age(backend_xid)` and `age(backend_xmin)` for every active session. Any transaction with age > 10 million is blocking vacuum from advancing. Use `pg_terminate_backend(pid)` to kill these sessions immediately—don't wait for graceful completion. Even a single idle-in-transaction session can prevent the entire database from freezing old XIDs. Check `postgresql.activity.xact_start_age` to find transactions that have been open for hours or days.
2Clean up old prepared transactions
Check `pg_prepared_xacts` for any rows where `age(transaction)` is large (> 1 million). Prepared transactions from two-phase commit protocols can sit indefinitely and block vacuum just like regular transactions. Either `COMMIT PREPARED` or `ROLLBACK PREPARED` each one immediately. In my experience, these are often forgotten transactions from failed application deployments or crashed transaction coordinators.
3Identify the tables closest to wraparound
Query `SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 20`. The tables at the top of this list are your problem—when any table reaches ~2.14 billion XID age, the database will refuse writes. Check `database_wraparound_age_datfrozenxid` and `before_xid_wraparound` metrics to see exactly how many transactions remain. Focus your VACUUM efforts on the oldest tables first; you don't need to freeze everything, just enough to buy breathing room.
4Manually VACUUM FREEZE the worst tables immediately
Don't wait for autovacuum—run `VACUUM (FREEZE, VERBOSE)` manually on the top 5-10 tables with the oldest `relfrozenxid`. The VERBOSE output shows progress per block, which is critical for tracking during an emergency. Large tables may take hours; if you're truly out of time, consider `VACUUM FREEZE` on just the system catalogs first to prevent immediate shutdown. Monitor `autovacuumed` and `n_dead_tup` to track progress. Be aware that emergency autovacuum will also be running at this point, possibly causing load spikes.
5Monitor for new blockers during the vacuum
While your manual VACUUM FREEZE runs, continuously re-check `pg_stat_activity` for new long-running transactions that could stop the vacuum from advancing `relfrozenxid`. A vacuum that completes but can't update the frozen XID is wasted work. Kill any new sessions with large `backend_xid` age immediately. Use `postgresql.transactions_open` to track the current transaction count—if it's still climbing rapidly, you may run out of XIDs before the vacuum completes.
6Check for disabled autovacuum on problem tables
Once the immediate crisis is contained, query `pg_class` and check if any tables have `reloptions` containing `autovacuum_enabled=false`. Even a single unused test table with autovacuum disabled can be the root cause of the entire database reaching wraparound. The insight here is that forgotten or abandoned tables are often the culprit, not your active production tables. Drop genuinely unused tables, or re-enable autovacuum and set aggressive `autovacuum_freeze_max_age` if they must remain.
7Set up monitoring to prevent recurrence
Transaction ID wraparound provides no gradual performance warning—CPU, memory, and I/O look normal until sudden failure. Do not rely on traditional capacity monitoring. Immediately implement alerts on `database_wraparound_age_datfrozenxid` at 150-180 million XID age per table, well before the 200 million emergency threshold. This failure mode cannot be reproduced in staging or testing; it only emerges after months of production use, so production monitoring is your only defense.