Transaction ID Wraparound Risk

criticalIncident Response

Database is approaching transaction ID wraparound limit, which could force emergency shutdown if not addressed.

Prompt: “I'm getting warnings about transaction ID wraparound in my PostgreSQL logs. The age of the oldest transaction is over 1.5 billion. How urgent is this and what do I need to do?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

826vs802

Latency

17.7smsvs17.7sms

Turns

2vs2

Tool Calls

0vs0

Schema Calls

0vs0

Response

1,635 charsvs1,784 chars

Evaluation

Without Schema

A-

With Schema

Both responses correctly identify this as urgent and provide concrete VACUUM commands. The response without Schema gives solid immediate steps and explains the danger, though it's slightly imprecise about the 2.1B threshold (PostgreSQL shuts down before corruption occurs). The response with Schema provides more diagnostic depth, particularly in its dedicated section for checking blockers like long-running transactions, abandoned prepared transactions, and replication slots — common root causes that prevent autovacuum from working. It also provides a more refined table-checking query that filters by age threshold and includes size information.

Without Schema

This is **very urgent** - you need to act immediately. At 1.5 billion transaction age, you're roughly 75% toward the 2 billion limit where PostgreSQL will shut down and enter read-only mode to prevent data loss.

## Immediate Actions

**1. Check the current situation:**
```bash
psql -d your_database -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;"
```

**2. Identify problem tables:**
```bash
psql -d your_database -c "SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind IN ('r', 't') ORDER BY age(relfrozenxid) DESC LIMIT 20;"
```

**3. Run VACUUM FREEZE immediately** on the worst offenders:
```bash
# For specific tables with highest age
psql -d your_database -c "VACUUM FREEZE VERBOSE your_table_name;"

# Or for entire database (this will take longer)
psql -d your_database -c "VACUUM FREEZE VERBOSE;"
```

## Timeline
- **1.5 billion**: You're here - urgent action needed
- **2 billion**: PostgreSQL will refuse new transactions and go read-only
- **2.1 billion**: Risk of data corruption/loss

You likely have days to weeks depending on transaction rate, but **start vacuuming now**.

## Why This Happened
- Autovacuum isn't keeping up (disabled, too slow, or blocked)
- Long-running transactions preventing cleanup
- Very large tables that take too long to vacuum

## After Emergency Is Resolved
1. Tune autovacuum settings to be more aggressive
2. Monitor `age(datfrozenxid)` regularly
3. Investigate what's blocking or slowing autovacuum

Do you have access to the database right now? I can help you check the specific situation and prioritize which tables to vacuum first.

With Schema

This is **extremely urgent** - you're approaching a critical PostgreSQL failure point. Here's what's happening and what to do:

## The Danger

- PostgreSQL uses 32-bit transaction IDs (max ~2.1 billion)
- At **2 billion XIDs**, PostgreSQL will go into emergency shutdown mode to prevent data loss
- At **1.5 billion**, you're ~75% of the way there
- You likely have **days to weeks** to fix this, not months

## Immediate Steps

### 1. Check which databases/tables are the problem:

```sql
SELECT datname, age(datfrozenxid) 
FROM pg_database 
ORDER BY age(datfrozenxid) DESC;
```

### 2. Find the worst tables in each problematic database:

```sql
SELECT 
  schemaname, 
  relname, 
  age(relfrozenxid) as xid_age,
  pg_size_pretty(pg_total_relation_size(relid)) as size
FROM pg_stat_all_tables 
WHERE age(relfrozenxid) > 1000000000
ORDER BY age(relfrozenxid) DESC 
LIMIT 20;
```

### 3. Run aggressive VACUUM on the worst offenders:

```sql
VACUUM FREEZE VERBOSE tablename;
```

For entire databases:
```bash
vacuumdb --all --freeze --verbose
```

### 4. Check for blockers:

```sql
-- Long-running transactions
SELECT pid, now() - xact_start as duration, state, query 
FROM pg_stat_activity 
WHERE xact_start IS NOT NULL 
ORDER BY duration DESC;

-- Check autovacuum is enabled
SHOW autovacuum;
```

## Common Causes

- Autovacuum disabled or too conservative
- Long-running transactions preventing VACUUM from advancing frozen XIDs
- Abandoned prepared transactions
- Replication slots not advancing

## Recovery Timeline

Vacuuming large tables can take hours. Start immediately and monitor progress. Once you're back below 200 million, adjust autovacuum settings to prevent recurrence.

**Do you want me to help you diagnose which specific tables or databases are causing this?**

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

Transaction ID wraparound at 1.5 billion is a critical emergency — you're 75% to forced shutdown at 2 billion. First, assess exact risk across all databases and tables to understand your timeline. Then identify and terminate any long-running transactions blocking freeze operations, check if autovacuum is working, and immediately run manual VACUUM FREEZE on the oldest tables. This is a drop-everything priority.

1Assess immediate wraparound risk across all databases

Check `database-wraparound-age-datfrozenxid` and `database-wraparound-age-datminmxid` metrics to see exactly how close each database is to the wraparound limit. At 1.5 billion, you're at ~75% of the 2 billion hard limit where PostgreSQL shuts down to prevent data corruption. Also check `postgresql-before-xid-wraparound` to see how many transactions remain before emergency mode. If any database shows age above 1.8 billion or fewer than 200 million transactions remaining, you have hours to days, not weeks — this is an all-hands emergency.

Transaction ID wraparound causes forced downtime for emergency vacuuming Transaction ID wraparound prevents new transactions and causes database shutdown Transaction ID wraparound prevents new transactions database_wraparound_age_datfrozenxiddatabase_wraparound_age_datminmxidbefore_xid_wraparound

2Identify and terminate blocking long-running transactions

Query `pg_stat_activity` for transactions with high `age(backend_xid)` or `age(backend_xmin)` — these prevent autovacuum from advancing relfrozenxid even if it's running. Also check `long-running-transaction-oldest-timestamp` to find the oldest active transaction. Long-running analytics queries, forgotten idle transactions, or stale replication slots are common culprits. Terminate these immediately unless they're absolutely critical — the database's survival depends on allowing freeze operations to proceed.

Long-running or idle transactions prevent transaction ID freezing Transaction ID wraparound prevents new transactions backend_xidbackend_xminlong_running_transaction_oldest_timestamp

3Verify autovacuum is enabled and running freeze operations

Check `postgresql-autovacuumed` and `postgresql-vacuum-count` to confirm autovacuum is actually running. Then check `postgresql-last-vacuum` and `postgresql-last-vacuum-age` on your largest tables — if last vacuum was days or weeks ago, autovacuum isn't keeping up or is disabled. The `disabled-autovacuum-prevents-freezing` insight warns that disabling autovacuum (even temporarily) on tables silently ages transaction IDs until sudden failure. Emergency autovacuum should kick in automatically around 200M age, but if it's not running, you need to enable it immediately or run manual vacuums.

Disabled autovacuum prevents transaction ID freezing and leads to wraparound Transaction ID wraparound forces emergency autovacuum despite disabled autovacuum Autovacuum falling behind on tables causes bloat accumulation autovacuumedpostgresql.vacuum.countpostgresql.last_vacuumpostgresql.last_vacuum_age

4Identify tables with oldest frozen transaction IDs

Query `pg_class` to find tables with highest `age(relfrozenxid)` using the `postgresql-relation-xmin` metric as a guide. Tables with age above 1.5 billion are your highest priorities for immediate freeze operations. Large tables take hours to vacuum freeze, so you need to start on the worst offenders NOW. Also check `dead-tuple` counts — tables with many dead tuples will take longer to vacuum and are likely why autovacuum fell behind in the first place.

Transaction ID wraparound risk causes database shutdown Transaction ID wraparound shutdown risk postgresql.relation.xminn_dead_tup

5Run emergency manual VACUUM FREEZE on critical tables

Execute `VACUUM (FREEZE, VERBOSE)` on tables identified in step 4, starting with the oldest relfrozenxid. For large tables, this can take hours and will generate heavy I/O — but there's no choice. Run these in parallel if you have I/O capacity (`vacuumdb --analyze --verbose --jobs=4`), but don't starve production workload completely. Monitor `postgresql-operations` to ensure the database is still serving queries. The `transaction-wraparound` insight recommends aggressive manual vacuum as the immediate action when approaching wraparound limits.

Transaction ID wraparound shutdown risk Transaction ID wraparound causes complete write outage postgresql.operationspostgresql.vacuum.count

6Tune autovacuum to prevent recurrence

Once immediate danger passes, tune autovacuum settings to prevent this from happening again. Lower `autovacuum_vacuum_scale_factor` from default 0.2 to 0.01-0.05 so vacuum triggers on smaller table changes. Increase `autovacuum_max_workers` from 3 to 5-8 depending on your workload. Set `autovacuum_freeze_max_age` conservatively (150M instead of 200M) to give yourself more warning time. Monitor `database-wraparound-age-datfrozenxid` continuously and alert when any database exceeds 1 billion — by 1.5 billion you're already in crisis mode.

Transaction ID wraparound causes forced downtime for emergency vacuuming Autovacuum falling behind on tables causes bloat accumulation database_wraparound_age_datfrozenxidautovacuumedpostgresql.last_vacuum_age