Autovacuum falling behind causing bloat

warningIncident Response

Table and index bloat accumulating faster than autovacuum can clean it up, degrading performance and consuming disk space

Prompt: “My PostgreSQL tables are showing 40% bloat and autovacuum doesn't seem to be keeping up with our high-update workload. Should I tune autovacuum settings, run manual VACUUM, or is there a blocking transaction preventing cleanup?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

2,380vs1,591

Latency

41.2smsvs31.8sms

Turns

9vs5

Tool Calls

3vs1

Schema Calls

0vs0

Response

3,065 charsvs3,506 chars

Evaluation

Without Schema

A-

With Schema

A-

Both responses provide comprehensive, actionable guidance with specific SQL queries and parameter values. The response without Schema includes pg_repack as a zero-downtime alternative to VACUUM FULL, which is valuable for production systems. The response with Schema adds transaction wraparound checks with specific thresholds (200M/1B), per-table autovacuum tuning directly addressing the high-update workload scenario, and more detailed monitoring including dead tuple percentage calculations. Both correctly identify blocking transactions as the primary culprit and provide the same core autovacuum tuning parameters. The per-table tuning in the Schema response is particularly relevant for the workload described, while pg_repack addresses the immediate bloat problem more elegantly.

Without Schema

3 tool calls

I'll help you diagnose and fix the PostgreSQL bloat issue. Let me first check your current environment and then help you investigate the root cause.
Let me create a diagnostic script to help you identify the root cause of your bloat issue:
Let me provide you with a diagnostic approach and solutions directly:

## **Step 1: Diagnose the Root Cause**

Run these queries to identify what's blocking VACUUM:

### **Check for Long-Running Transactions** (most common cause):
```sql
SELECT pid, usename, application_name, state, backend_xmin,
       age(backend_xmin) as xmin_age,
       now() - xact_start as transaction_duration,
       query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
```

### **Check for Idle-in-Transaction**:
```sql
SELECT pid, usename, application_name, 
       now() - state_change as idle_duration, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY state_change;
```

### **Check Replication Slots** (can hold back VACUUM):
```sql
SELECT slot_name, active, xmin, age(xmin) as xmin_age
FROM pg_replication_slots
WHERE xmin IS NOT NULL
ORDER BY age(xmin) DESC;
```

---

## **Step 2: Solutions Based on Root Cause**

### **If you find long-running transactions (xmin_age > 1000000)**:
- **Immediate**: Terminate the blocking transaction: `SELECT pg_terminate_backend(pid);`
- **Prevention**: Fix application code to avoid long transactions, especially with ORM frameworks

### **If autovacuum settings are too conservative**:
```sql
-- For high-update workloads, tune these settings:
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05;  -- from default 0.2
ALTER SYSTEM SET autovacuum_vacuum_threshold = 50;       -- from default 50
ALTER SYSTEM SET autovacuum_naptime = 10;                -- from default 60s
ALTER SYSTEM SET autovacuum_max_workers = 6;             -- from default 3
ALTER SYSTEM SET maintenance_work_mem = '1GB';           -- from default 64MB
SELECT pg_reload_conf();
```

### **If tables are already bloated (immediate fix)**:
```sql
-- Run manual VACUUM (doesn't lock, safe during production):
VACUUM (VERBOSE, ANALYZE) your_table_name;

-- For severe bloat, run VACUUM FULL (requires ACCESS EXCLUSIVE lock):
VACUUM FULL your_table_name;  -- WARNING: locks table, causes downtime
```

### **Better alternative to VACUUM FULL** (no downtime):
```sql
-- Use pg_repack extension instead:
pg_repack -t your_table_name -d your_database
```

---

## **Recommended Action Plan**

1. **Run diagnostic queries** to find blocking transactions
2. **Kill any long-running transactions** preventing cleanup
3. **Tune autovacuum** for your workload (settings above)
4. **Run manual VACUUM** on bloated tables tonight
5. **Monitor** with: `SELECT * FROM pg_stat_progress_vacuum;`

The **most likely culprit** with high-update workloads is either:
- Long-running analytical queries holding old transaction snapshots
- Application connection poolers with `idle in transaction` connections
- Inactive replication slots

Check those first before tuning autovacuum settings.

With Schema

1 tool calls

I'll help you diagnose and fix this PostgreSQL bloat issue. Let me gather some information first.
Let me help you diagnose and resolve this PostgreSQL bloat issue. Here's a systematic approach:

## 1. First, check for blocking transactions

Run this query to identify long-running transactions that prevent vacuum from cleaning up dead tuples:

```sql
-- Check for long-running transactions
SELECT pid, 
       now() - xact_start AS duration,
       state,
       query,
       wait_event_type,
       wait_event
FROM pg_stat_activity
WHERE state != 'idle'
  AND xact_start IS NOT NULL
ORDER BY xact_start
LIMIT 10;

-- Check for transaction ID wraparound issues
SELECT datname, age(datfrozenxid) 
FROM pg_database 
ORDER BY age(datfrozenxid) DESC;
```

**Red flags:**
- Transactions running for hours/days
- `age(datfrozenxid)` > 200 million (critical if > 1 billion)

## 2. Review current autovacuum settings

```sql
-- Check current autovacuum configuration
SHOW autovacuum;
SHOW autovacuum_vacuum_scale_factor;
SHOW autovacuum_vacuum_threshold;
SHOW autovacuum_vacuum_cost_delay;
SHOW autovacuum_max_workers;

-- Check per-table stats to see if autovacuum is running
SELECT schemaname, relname,
       last_vacuum, last_autovacuum,
       n_tup_upd, n_tup_del,
       n_dead_tup,
       round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;
```

## 3. Immediate action vs Long-term fix

**For immediate relief (run manual VACUUM):**
```sql
-- On specific bloated tables
VACUUM ANALYZE your_table_name;

-- Or if bloat is severe and you can afford a brief lock:
VACUUM FULL your_table_name;  -- Warning: acquires exclusive lock
```

**For high-update workloads, tune autovacuum aggressively:**

```sql
-- Global settings (requires PostgreSQL restart or reload)
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05;  -- Default is 0.2
ALTER SYSTEM SET autovacuum_vacuum_threshold = 50;       -- Default is 50
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 10;      -- Default is 20ms (lower = faster)
ALTER SYSTEM SET autovacuum_max_workers = 6;             -- Default is 3
SELECT pg_reload_conf();

-- Per-table tuning for your hottest tables:
ALTER TABLE your_hot_table SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 100,
  autovacuum_vacuum_cost_delay = 5
);
```

**Explanation:**
- **scale_factor = 0.01**: Vacuum when 1% of rows are dead (vs 20% default)
- **Lower cost_delay**: Vacuum runs faster but uses more I/O
- **More workers**: Handle more tables concurrently

## 4. Monitor the fix

```sql
-- Watch autovacuum activity
SELECT * FROM pg_stat_progress_vacuum;

-- Check if bloat is decreasing over time
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
```

## My recommendation:

1. **Run the blocking transaction check first** - if you find long-running idle transactions, kill them (`SELECT pg_terminate_backend(pid)`)
2. **Tune autovacuum settings** as shown above (especially for high-update tables)
3. **Run manual VACUUM ANALYZE** on the worst tables now for immediate relief
4. **Avoid VACUUM FULL** unless absolutely necessary (it locks the table)

Would you like me to create a script to monitor this, or do you have specific table names you want me to help optimize?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When autovacuum falls behind and bloat accumulates, start by confirming the bloat severity and dead tuple counts, then check if autovacuum is actually running and when it last completed. The most common culprits are long-running transactions blocking cleanup or default autovacuum settings that can't keep pace with high-write workloads—fix those before resorting to manual VACUUM FULL.

1Confirm bloat severity and dead tuple accumulation

Start by checking `table_bloat` to see if you're truly at 40% bloat, and examine `n_dead_tup` and `postgresql.dead_rows` to quantify dead tuple accumulation. Calculate the dead tuple percentage by comparing `postgresql.dead_rows` to `postgresql.live_rows`—if you're seeing 20%+ dead tuples on large tables, that's a clear signal autovacuum isn't keeping up. This gives you a baseline to measure improvement after tuning.

High dead tuple percentage degrades query performance and bloats tables table_bloatn_dead_tuppostgresql.dead_rowspostgresql.live_rows

2Check when autovacuum last ran successfully

Look at `postgresql.last_vacuum_age` and `autovacuum_timestamp` to see how long it's been since autovacuum touched your tables. If `postgresql.last_vacuum_age` is measured in hours or days on high-churn tables, autovacuum is definitely falling behind. Also check `postgresql.vacuum.count` over time—if the count isn't increasing regularly, autovacuum might not be enabled or is being blocked from starting.

Autovacuum falling behind on tables causes bloat accumulation Autovacuum lag causes bloat and planner estimate degradation postgresql.last_vacuum_ageautovacuum_timestamppostgresql.vacuum.count

3Identify long-running transactions blocking autovacuum

This is the #1 silent killer of autovacuum effectiveness. Query `pg_stat_activity` for transactions with `state_change` timestamps older than several hours—even a single idle transaction can prevent autovacuum from reclaiming space. The `blocked-autovacuum-bloat-accumulation` insight warns that these cause unbounded bloat. Terminate these transactions if they're not critical, as no amount of autovacuum tuning will help if transactions are holding back the cleanup horizon.

Blocked autovacuum allows unbounded bloat accumulation

4Assess write workload intensity and vacuum lag

Check `n_ins_since_vacuum` and `n_mod_since_analyze` to see how many modifications are piling up between vacuum runs. If you're seeing 100K+ inserts or 50K+ modifications since the last vacuum on tables with high update rates, your workload is overwhelming default autovacuum settings. Tables with 10,000+ inserts/min need much more aggressive autovacuum parameters than the defaults provide.

Autovacuum falls behind on high-write tables due to conservative defaults n_ins_since_vacuumn_mod_since_analyze

5Tune autovacuum parameters for your workload

Lower `autovacuum_vacuum_scale_factor` from 0.2 to 0.05-0.1 so vacuum triggers earlier, and increase `autovacuum_max_workers` to 5-10 to handle more tables concurrently. For high-write tables, also bump `autovacuum_vacuum_cost_limit` to 1000-2000 and set `autovacuum_vacuum_cost_delay` to 0 to make vacuum more aggressive. The `autovacuum-lag-bloat` and `table-bloat-autovacuum-configuration` insights provide specific threshold guidance—on large tables with `postgresql.table.size` > 10GB, the default scale_factor delays vacuum too long.

Autovacuum lag causes table bloat and performance degradation Table bloat from suboptimal autovacuum settings slows query performance postgresql.table.size

6Consider manual VACUUM FULL for severe bloat

If `table_bloat` is above 40-50% after fixing blocking transactions and tuning autovacuum, standard VACUUM won't reclaim the space—you'll need VACUUM FULL, CLUSTER, or a table rewrite. Be aware this requires an ACCESS EXCLUSIVE lock (blocking all access) and temporarily uses disk space equal to the table size, so schedule it during maintenance windows. After the one-time cleanup, the tuned autovacuum settings should prevent bloat from accumulating again.

Table bloat from insufficient vacuuming requires VACUUM FULL table_bloatindex_bloat