Autovacuum falling behind causing bloat

warningIncident Response

Table and index bloat accumulating faster than autovacuum can clean it up, degrading performance and consuming disk space

Prompt: My PostgreSQL tables are showing 40% bloat and autovacuum doesn't seem to be keeping up with our high-update workload. Should I tune autovacuum settings, run manual VACUUM, or is there a blocking transaction preventing cleanup?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
2,380vs1,591
Latency
41.2smsvs31.8sms
Turns
9vs5
Tool Calls
3vs1
Schema Calls
0vs0
Response
3,065 charsvs3,506 chars
Evaluation
Without Schema
A-
vs
With Schema
A-
Both responses provide comprehensive, actionable guidance with specific SQL queries and parameter values. The response without Schema includes pg_repack as a zero-downtime alternative to VACUUM FULL, which is valuable for production systems. The response with Schema adds transaction wraparound checks with specific thresholds (200M/1B), per-table autovacuum tuning directly addressing the high-update workload scenario, and more detailed monitoring including dead tuple percentage calculations. Both correctly identify blocking transactions as the primary culprit and provide the same core autovacuum tuning parameters. The per-table tuning in the Schema response is particularly relevant for the workload described, while pg_repack addresses the immediate bloat problem more elegantly.
Without Schema
3 tool calls
I'll help you diagnose and fix the PostgreSQL bloat issue. Let me first check your current environment and then help you investigate the root cause. Let me create a diagnostic script to help you identify the root cause of your bloat issue: Let me provide you with a diagnostic approach and solutions directly: ## **Step 1: Diagnose the Root Cause** Run these queries to identify what's blocking VACUUM: ### **Check for Long-Running Transactions** (most common cause): ```sql SELECT pid, usename, application_name, state, backend_xmin, age(backend_xmin) as xmin_age, now() - xact_start as transaction_duration, query FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY age(backend_xmin) DESC; ``` ### **Check for Idle-in-Transaction**: ```sql SELECT pid, usename, application_name, now() - state_change as idle_duration, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY state_change; ``` ### **Check Replication Slots** (can hold back VACUUM): ```sql SELECT slot_name, active, xmin, age(xmin) as xmin_age FROM pg_replication_slots WHERE xmin IS NOT NULL ORDER BY age(xmin) DESC; ``` --- ## **Step 2: Solutions Based on Root Cause** ### **If you find long-running transactions (xmin_age > 1000000)**: - **Immediate**: Terminate the blocking transaction: `SELECT pg_terminate_backend(pid);` - **Prevention**: Fix application code to avoid long transactions, especially with ORM frameworks ### **If autovacuum settings are too conservative**: ```sql -- For high-update workloads, tune these settings: ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05; -- from default 0.2 ALTER SYSTEM SET autovacuum_vacuum_threshold = 50; -- from default 50 ALTER SYSTEM SET autovacuum_naptime = 10; -- from default 60s ALTER SYSTEM SET autovacuum_max_workers = 6; -- from default 3 ALTER SYSTEM SET maintenance_work_mem = '1GB'; -- from default 64MB SELECT pg_reload_conf(); ``` ### **If tables are already bloated (immediate fix)**: ```sql -- Run manual VACUUM (doesn't lock, safe during production): VACUUM (VERBOSE, ANALYZE) your_table_name; -- For severe bloat, run VACUUM FULL (requires ACCESS EXCLUSIVE lock): VACUUM FULL your_table_name; -- WARNING: locks table, causes downtime ``` ### **Better alternative to VACUUM FULL** (no downtime): ```sql -- Use pg_repack extension instead: pg_repack -t your_table_name -d your_database ``` --- ## **Recommended Action Plan** 1. **Run diagnostic queries** to find blocking transactions 2. **Kill any long-running transactions** preventing cleanup 3. **Tune autovacuum** for your workload (settings above) 4. **Run manual VACUUM** on bloated tables tonight 5. **Monitor** with: `SELECT * FROM pg_stat_progress_vacuum;` The **most likely culprit** with high-update workloads is either: - Long-running analytical queries holding old transaction snapshots - Application connection poolers with `idle in transaction` connections - Inactive replication slots Check those first before tuning autovacuum settings.
With Schema
1 tool calls
I'll help you diagnose and fix this PostgreSQL bloat issue. Let me gather some information first. Let me help you diagnose and resolve this PostgreSQL bloat issue. Here's a systematic approach: ## 1. First, check for blocking transactions Run this query to identify long-running transactions that prevent vacuum from cleaning up dead tuples: ```sql -- Check for long-running transactions SELECT pid, now() - xact_start AS duration, state, query, wait_event_type, wait_event FROM pg_stat_activity WHERE state != 'idle' AND xact_start IS NOT NULL ORDER BY xact_start LIMIT 10; -- Check for transaction ID wraparound issues SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC; ``` **Red flags:** - Transactions running for hours/days - `age(datfrozenxid)` > 200 million (critical if > 1 billion) ## 2. Review current autovacuum settings ```sql -- Check current autovacuum configuration SHOW autovacuum; SHOW autovacuum_vacuum_scale_factor; SHOW autovacuum_vacuum_threshold; SHOW autovacuum_vacuum_cost_delay; SHOW autovacuum_max_workers; -- Check per-table stats to see if autovacuum is running SELECT schemaname, relname, last_vacuum, last_autovacuum, n_tup_upd, n_tup_del, n_dead_tup, round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY n_dead_tup DESC LIMIT 20; ``` ## 3. Immediate action vs Long-term fix **For immediate relief (run manual VACUUM):** ```sql -- On specific bloated tables VACUUM ANALYZE your_table_name; -- Or if bloat is severe and you can afford a brief lock: VACUUM FULL your_table_name; -- Warning: acquires exclusive lock ``` **For high-update workloads, tune autovacuum aggressively:** ```sql -- Global settings (requires PostgreSQL restart or reload) ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05; -- Default is 0.2 ALTER SYSTEM SET autovacuum_vacuum_threshold = 50; -- Default is 50 ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 10; -- Default is 20ms (lower = faster) ALTER SYSTEM SET autovacuum_max_workers = 6; -- Default is 3 SELECT pg_reload_conf(); -- Per-table tuning for your hottest tables: ALTER TABLE your_hot_table SET ( autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 100, autovacuum_vacuum_cost_delay = 5 ); ``` **Explanation:** - **scale_factor = 0.01**: Vacuum when 1% of rows are dead (vs 20% default) - **Lower cost_delay**: Vacuum runs faster but uses more I/O - **More workers**: Handle more tables concurrently ## 4. Monitor the fix ```sql -- Watch autovacuum activity SELECT * FROM pg_stat_progress_vacuum; -- Check if bloat is decreasing over time SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; ``` ## My recommendation: 1. **Run the blocking transaction check first** - if you find long-running idle transactions, kill them (`SELECT pg_terminate_backend(pid)`) 2. **Tune autovacuum settings** as shown above (especially for high-update tables) 3. **Run manual VACUUM ANALYZE** on the worst tables now for immediate relief 4. **Avoid VACUUM FULL** unless absolutely necessary (it locks the table) Would you like me to create a script to monitor this, or do you have specific table names you want me to help optimize?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When autovacuum falls behind and bloat accumulates, start by confirming the bloat severity and dead tuple counts, then check if autovacuum is actually running and when it last completed. The most common culprits are long-running transactions blocking cleanup or default autovacuum settings that can't keep pace with high-write workloads—fix those before resorting to manual VACUUM FULL.

1Confirm bloat severity and dead tuple accumulation
Start by checking `table_bloat` to see if you're truly at 40% bloat, and examine `n_dead_tup` and `postgresql.dead_rows` to quantify dead tuple accumulation. Calculate the dead tuple percentage by comparing `postgresql.dead_rows` to `postgresql.live_rows`—if you're seeing 20%+ dead tuples on large tables, that's a clear signal autovacuum isn't keeping up. This gives you a baseline to measure improvement after tuning.
2Check when autovacuum last ran successfully
Look at `postgresql.last_vacuum_age` and `autovacuum_timestamp` to see how long it's been since autovacuum touched your tables. If `postgresql.last_vacuum_age` is measured in hours or days on high-churn tables, autovacuum is definitely falling behind. Also check `postgresql.vacuum.count` over time—if the count isn't increasing regularly, autovacuum might not be enabled or is being blocked from starting.
3Identify long-running transactions blocking autovacuum
This is the #1 silent killer of autovacuum effectiveness. Query `pg_stat_activity` for transactions with `state_change` timestamps older than several hours—even a single idle transaction can prevent autovacuum from reclaiming space. The `blocked-autovacuum-bloat-accumulation` insight warns that these cause unbounded bloat. Terminate these transactions if they're not critical, as no amount of autovacuum tuning will help if transactions are holding back the cleanup horizon.
4Assess write workload intensity and vacuum lag
Check `n_ins_since_vacuum` and `n_mod_since_analyze` to see how many modifications are piling up between vacuum runs. If you're seeing 100K+ inserts or 50K+ modifications since the last vacuum on tables with high update rates, your workload is overwhelming default autovacuum settings. Tables with 10,000+ inserts/min need much more aggressive autovacuum parameters than the defaults provide.
5Tune autovacuum parameters for your workload
Lower `autovacuum_vacuum_scale_factor` from 0.2 to 0.05-0.1 so vacuum triggers earlier, and increase `autovacuum_max_workers` to 5-10 to handle more tables concurrently. For high-write tables, also bump `autovacuum_vacuum_cost_limit` to 1000-2000 and set `autovacuum_vacuum_cost_delay` to 0 to make vacuum more aggressive. The `autovacuum-lag-bloat` and `table-bloat-autovacuum-configuration` insights provide specific threshold guidance—on large tables with `postgresql.table.size` > 10GB, the default scale_factor delays vacuum too long.
6Consider manual VACUUM FULL for severe bloat
If `table_bloat` is above 40-50% after fixing blocking transactions and tuning autovacuum, standard VACUUM won't reclaim the space—you'll need VACUUM FULL, CLUSTER, or a table rewrite. Be aware this requires an ACCESS EXCLUSIVE lock (blocking all access) and temporarily uses disk space equal to the table size, so schedule it during maintenance windows. After the one-time cleanup, the tuned autovacuum settings should prevent bloat from accumulating again.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

PostgreSQL Prometheus
PostgreSQL Datadog
PostgreSQL Native
PostgreSQL CloudWatch Metrics
PostgreSQL Prometheus Exporter (pg_exporter)