Transaction ID wraparound emergency approaching

criticalIncident Response

Database approaching 2 billion unvacuumed transactions, risking automatic read-only shutdown

Prompt: My PostgreSQL database is showing warnings about transaction ID wraparound with age at 1.8 billion. How urgent is this, what happens if I hit the limit, and what's the safest way to prevent emergency autovacuum or shutdown?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,098vs814
Latency
24.3smsvs18.7sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
2,971 charsvs2,097 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
Both responses are highly actionable with specific SQL queries and VACUUM commands that would successfully guide an operator to resolve the issue. The response with Schema provides more accurate threshold numbers (2.1 billion for emergency autovacuum vs. 1.5 billion in the baseline), which is critical for urgency assessment. The response without Schema offers more comprehensive diagnostics (database-level and table-level queries) and practical timeline calculations based on transaction rate. The response with Schema includes advanced monitoring techniques (pg_stat_progress_vacuum) and per-table autovacuum tuning, demonstrating deeper operational knowledge. The technical accuracy advantage on thresholds gives the Schema version a slight edge, though both responses would reliably prevent the wraparound crisis.
Without Schema
This is **moderately urgent** and requires action soon. Here's what you need to know: ## Urgency Assessment **Your situation (1.8 billion):** - **Warning threshold**: 1 billion (you've passed this) - **Autovacuum forced**: ~1.5 billion (you may be experiencing this) - **Emergency shutdown**: 2.14 billion (~2^31) - **Your buffer**: ~340 million XIDs remaining (~16-19% safety margin) You're in the warning zone but not critical yet. However, depending on your transaction rate, you could have days to weeks before hitting the hard limit. ## What Happens at the Limit **At ~2 billion XIDs:** - PostgreSQL triggers **emergency autovacuum** that cannot be canceled - This blocks normal operations and can cause severe performance degradation - All other database operations slow to a crawl **At 2.14 billion (2^31 - 3 million):** - PostgreSQL **refuses all new transactions** and effectively shuts down - Database becomes read-only until the problem is resolved - This prevents data loss from XID wraparound corruption ## Safest Prevention Strategy **1. Check current state:** ```sql -- See which tables/databases are worst SELECT datname, age(datfrozenxid), 2147483648 - age(datfrozenxid) AS xids_remaining FROM pg_database ORDER BY age(datfrozenxid) DESC; -- Find problem tables SELECT schemaname, relname, age(relfrozenxid), pg_size_pretty(pg_total_relation_size(oid)) AS size FROM pg_stat_all_tables ORDER BY age(relfrozenxid) DESC LIMIT 20; ``` **2. Immediate action - Manual VACUUM:** ```sql -- For specific problem tables (fastest, least disruptive) VACUUM FREEZE VERBOSE tablename; -- Or for entire database (more thorough) VACUUMDB --all --freeze --verbose ``` **3. Monitor progress:** ```sql -- Watch the age decrease SELECT datname, age(datfrozenxid) FROM pg_database WHERE datname = current_database(); ``` **4. Tune autovacuum to prevent recurrence:** ```sql -- Check current settings SHOW autovacuum_freeze_max_age; -- default 200M SHOW autovacuum_multixact_freeze_max_age; -- In postgresql.conf, consider: -- autovacuum_freeze_max_age = 200000000 -- autovacuum_vacuum_cost_delay = 10 (more aggressive) -- autovacuum_max_workers = 3 or more ``` ## Best Practices During Recovery - Run `VACUUM FREEZE` during low-traffic periods if possible - Target the oldest tables first (from query above) - Monitor I/O impact - vacuum can be I/O intensive - Don't disable autovacuum globally (common mistake) - For large tables, consider `vacuum_cost_delay` to throttle I/O ## Timeline With 340M XIDs remaining and typical transaction rates: - **Low-traffic database** (1K TPS): ~3-4 days - **Medium-traffic** (10K TPS): ~9 hours - **High-traffic** (50K TPS): ~2 hours **Recommendation**: Start manual VACUUM now on the oldest tables, monitor progress, and tune autovacuum settings to prevent this from happening again. Would you like me to help you check your database's current state or create a monitoring/remediation script?
With Schema
This is **urgent** and needs immediate attention. Here's what you're facing: ## Urgency Level: HIGH At **1.8 billion transactions**, you're approaching critical thresholds: - **2 billion**: PostgreSQL starts issuing warnings - **2.1 billion (2^31 - 100M)**: Emergency autovacuum kicks in automatically - **2.14 billion (2^31 - 10M)**: Database goes into read-only mode to prevent data loss You likely have **200-300 million transactions** of buffer before emergency mode. ## What Happens at the Limit If you hit the wraparound limit: 1. **Emergency Autovacuum** (at 2.1B): PostgreSQL force-launches autovacuum regardless of settings, causing severe performance degradation 2. **Read-Only Mode** (at 2.14B): Database refuses all writes to prevent catastrophic data corruption from transaction ID wraparound 3. **Data Loss Risk**: If wraparound occurs, old data can appear "newer" than recent data, causing logical corruption ## Safe Prevention Strategy **Immediate actions:** 1. **Check which tables are problematic:** ```sql SELECT relname, age(relfrozenxid), pg_size_pretty(pg_total_relation_size(oid)) FROM pg_class WHERE relkind IN ('r', 't') ORDER BY age(relfrozenxid) DESC LIMIT 20; ``` 2. **Run manual VACUUM FREEZE on worst offenders:** ```sql VACUUM FREEZE VERBOSE table_name; ``` - Start with smallest tables that have high age - This is I/O intensive but safe - Monitor with `pg_stat_progress_vacuum` 3. **For large tables**, use aggressive autovacuum settings temporarily: ```sql ALTER TABLE large_table SET (autovacuum_vacuum_cost_delay = 0); ALTER TABLE large_table SET (autovacuum_vacuum_scale_factor = 0.01); ``` 4. **Check autovacuum is running:** ```sql SELECT * FROM pg_stat_activity WHERE query LIKE 'autovacuum:%'; ``` **Prevention going forward:** - Tune `autovacuum_freeze_max_age` (default 200M may be too high) - Increase `autovacuum_max_workers` - Monitor `age(datfrozenxid)` regularly - Consider more aggressive vacuum schedules for high-churn tables Would you like me to help you check your specific database status or create a remediation script?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

At 1.8 billion transaction IDs, you're in a critical window—PostgreSQL starts warning at 40 million from wraparound and goes read-only at 3 million remaining. Start by confirming the exact age and urgency via datfrozenxid metrics, then immediately hunt for blockers preventing vacuum from advancing the frozen XID horizon: long-running transactions, prepared transactions, and replication slots. Finally, assess whether autovacuum is running and if manual VACUUM FREEZE is needed on the oldest tables.

1Confirm current XID age and calculate time to emergency
Check `database_wraparound_age_datfrozenxid` to get the exact transaction age—if it's above 1.5 billion (75% of the 2 billion limit), you're in the danger zone. Cross-reference with `before_xid_wraparound` to see how many transactions remain before forced shutdown. At 1.8 billion, you likely have 200 million XIDs left, which sounds like a lot but can burn through quickly on high-transaction workloads. The `transaction-id-wraparound-risk` insight flags tables above 150 million as requiring immediate action, and `transaction-id-wraparound-prevention` notes that PostgreSQL refuses new transactions at 3 million remaining.
2Hunt for long-running and prepared transactions blocking vacuum
Check `backend_xid` and `backend_xmin` across all sessions in pg_stat_activity—any transaction with age(backend_xmin) in the hundreds of millions is preventing vacuum from advancing the freeze horizon. Also query pg_prepared_xacts for orphaned prepared transactions, which are a common silent killer. The `transaction-id-wraparound-prevention` insight specifically calls out that old prepared transactions and long-running snapshots are the primary blockers. Terminate or resolve these immediately—even a single ancient transaction can prevent the entire database from advancing its frozen XID.
3Verify autovacuum is running and not being suppressed
Check `postgresql-vacuum-count` and `postgresql-autovacuumed` to confirm vacuum activity is happening, and use `postgresql-last-vacuum-age` to identify tables that haven't been vacuumed in weeks or months. If last vacuum age is extremely high on large tables, autovacuum may be disabled, too conservative (high vacuum_scale_factor), or getting canceled by statement_timeout. The `transaction-id-wraparound-emergency-autovacuum` insight warns that emergency autovacuum kicks in at autovacuum_freeze_max_age (default 200M), but if you're at 1.8B, that emergency vacuum has either failed or is being blocked.
4Identify the specific tables closest to wraparound
Query pg_class for age(relfrozenxid) on all tables and sort descending—tables with age above 1.5 billion are your immediate fire drill. The `postgresql-relation-xmin` metric can help track transaction visibility, but you specifically need relfrozenxid age here. Focus manual VACUUM FREEZE efforts on the oldest tables first, as the `transaction-id-wraparound-risk` and `transaction-id-wraparound-write-outage` insights explain that individual table ages determine when the database goes read-only (at autovacuum_freeze_max_age, default 200M, but you're way past that).
5Check for replication slots holding back the XID horizon
Query pg_replication_slots for any slots with non-NULL xmin or catalog_xmin values—these hold back vacuum globally even if no other transactions are running. The `transaction-id-wraparound-prevention` insight specifically mentions dropping old replication slots as a required action. Unused or abandoned logical replication slots are a classic cause of sudden wraparound emergencies. If you find old slots, drop them immediately unless they're actively needed for replication.
6Assess multixact ID wraparound risk separately
Check `database_wraparound_age_datminmxid` to see if you also have a multixact ID problem—multixacts have a separate 2 billion limit and can cause wraparound independently. The `multixact-id-wraparound-emergency-autovacuum` insight notes that multixact wraparound triggers emergency vacuum at autovacuum_multixact_freeze_max_age (default 400M). If datminmxid age is also high (above 1 billion), you have two separate emergencies and need to run VACUUM to advance both datfrozenxid and datminmxid.

Technologies

Related Insights

Transaction ID wraparound causes forced downtime for emergency vacuuming
critical
Transaction ID wraparound forces emergency autovacuum despite disabled autovacuum
critical
Transaction ID wraparound prevents new transactions
critical
Transaction ID wraparound prevents new transactions and causes database shutdown
critical
Transaction ID wraparound causes complete write outage
critical
Transaction ID wraparound risk causes database shutdown
critical
PostgreSQL's 32-bit transaction ID counter approaches exhaustion (2 billion limit), risking forced database shutdown. Tables with age(relfrozenxid) approaching autovacuum_freeze_max_age require aggressive freezing VACUUMs to prevent wraparound.
Transaction ID wraparound shutdown risk
critical
Multixact ID wraparound forces emergency autovacuum despite disabled autovacuum
critical
Disabled autovacuum prevents transaction ID freezing and leads to wraparound
critical
Long-running or idle transactions prevent transaction ID freezing
warning

Relevant Metrics

Monitoring Interfaces

PostgreSQL Prometheus
PostgreSQL Datadog
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)