Table Bloat Diagnosis and Remediation

warningProactive Health

Tables have grown much larger than expected due to bloat from dead tuples, impacting query performance and storage costs.

Prompt: One of my PostgreSQL tables is taking up 50GB of disk space but only has 10GB of actual data when I dump it. How do I measure bloat and safely reclaim this space?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,362vs1,385
Latency
23.9smsvs24.4sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
3,676 charsvs3,500 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
Both responses provide comprehensive, technically accurate guidance covering bloat measurement, reclamation methods, and prevention strategies. The response with Schema has better organization, featuring a clear size-based decision matrix (< 1GB, 1-100GB, > 100GB) and directly addressing the specific 50GB→10GB scenario with a pg_repack recommendation. However, neither response shows obvious signs of leveraging authoritative documentation sources—both lack links to official PostgreSQL docs, which would be expected from a knowledge graph integration. The response with Schema edges ahead primarily on structure and decision clarity rather than access to unique infrastructure intelligence.
Without Schema
I'll help you measure and reclaim that bloated space in your PostgreSQL table. This is a common issue caused by PostgreSQL's MVCC (Multi-Version Concurrency Control) system leaving behind dead tuples. ## Measuring Bloat **1. Check bloat with this query:** ```sql SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size, pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size, round(100 * pg_relation_size(schemaname||'.'||tablename) / pg_total_relation_size(schemaname||'.'||tablename)) AS table_pct FROM pg_tables WHERE tablename = 'your_table_name' AND schemaname = 'public'; ``` **2. More detailed bloat estimation:** ```sql SELECT current_database() AS db, schemaname, tablename, round(100 * (pg_relation_size(schemaname||'.'||tablename) - (pg_stat_get_live_tuples(c.oid) * (current_setting('block_size')::integer) / 1024))::numeric / pg_relation_size(schemaname||'.'||tablename), 2) AS bloat_pct, pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size, pg_stat_get_live_tuples(c.oid) AS live_tuples, pg_stat_get_dead_tuples(c.oid) AS dead_tuples FROM pg_tables t JOIN pg_class c ON c.relname = t.tablename WHERE tablename = 'your_table_name' AND schemaname = 'public'; ``` ## Reclaiming Space Safely ### Option 1: VACUUM FULL (Requires exclusive lock) ```sql -- This locks the table and rewrites it VACUUM FULL VERBOSE your_table_name; ``` **Pros:** Built-in, guaranteed to work **Cons:** Requires exclusive lock (table unavailable during operation), needs 2x disk space temporarily ### Option 2: pg_repack (Recommended for production) ```bash # Install extension CREATE EXTENSION pg_repack; # Run from command line (minimal locking) pg_repack -t your_table_name -d your_database ``` **Pros:** Minimal locking, works online, also rebuilds indexes **Cons:** Requires extension installation, needs extra disk space ### Option 3: CLUSTER (Reorders table) ```sql -- Rewrite table ordered by an index CLUSTER your_table_name USING your_index_name; ``` **Pros:** Also optimizes data order for better query performance **Cons:** Requires exclusive lock like VACUUM FULL ### Option 4: Regular VACUUM + autovacuum tuning ```sql -- Non-blocking but doesn't return space to OS immediately VACUUM VERBOSE your_table_name; ``` This won't reclaim the 40GB immediately, but prevents future bloat. ## Best Practice Workflow ```sql -- 1. Check current bloat and dead tuples SELECT n_dead_tup, n_live_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE relname = 'your_table_name'; -- 2. If high bloat, run regular vacuum first VACUUM ANALYZE your_table_name; -- 3. For immediate space reclaim: -- During maintenance window: VACUUM FULL VERBOSE your_table_name; -- OR with pg_repack during production: -- pg_repack -t your_table_name -d your_database ``` ## Prevent Future Bloat ```sql -- Tune autovacuum for this table ALTER TABLE your_table_name SET ( autovacuum_vacuum_scale_factor = 0.05, -- More aggressive autovacuum_vacuum_threshold = 50 ); -- Check autovacuum settings SELECT relname, reloptions FROM pg_class WHERE relname = 'your_table_name'; ``` **Which approach should you use?** - **Development/off-hours:** VACUUM FULL (simplest) - **Production with downtime window:** VACUUM FULL or CLUSTER - **Production 24/7:** pg_repack (requires installation but near-zero downtime) Would you like me to help you check the current bloat level or set up pg_repack?
With Schema
I'll help you measure and reclaim that PostgreSQL table bloat. Here's a comprehensive approach: ## Measuring Bloat First, let's check which tables have bloat using this query: ```sql -- Check table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size, pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size, round(100 * pg_relation_size(schemaname||'.'||tablename) / pg_total_relation_size(schemaname||'.'||tablename)) AS table_pct FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 20; ``` For more detailed bloat estimation: ```sql -- Estimate dead tuples and bloat percentage SELECT schemaname, relname, n_live_tup, n_dead_tup, round(100 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct, pg_size_pretty(pg_relation_size(schemaname||'.'||relname)) AS size, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 0 ORDER BY n_dead_tup DESC; ``` ## Safe Methods to Reclaim Space ### Option 1: VACUUM FULL (requires table lock) **Pros:** Built-in, thorough **Cons:** Exclusive lock blocks reads/writes, requires 2x disk space temporarily ```sql VACUUM FULL VERBOSE your_table_name; ``` Or analyze first: ```sql VACUUM FULL ANALYZE your_table_name; ``` ### Option 2: pg_repack (recommended for production) **Pros:** No long-term locks, works online **Cons:** Requires extension installation ```bash # Install extension (Debian/Ubuntu) sudo apt-get install postgresql-<version>-repack # Or via source git clone https://github.com/reorg/pg_repack.git ``` ```sql -- In PostgreSQL CREATE EXTENSION pg_repack; ``` ```bash # Repack specific table pg_repack -t your_table_name -d your_database # Repack entire database pg_repack -d your_database ``` ### Option 3: CLUSTER (reorders by index) **Pros:** Also optimizes table order **Cons:** Requires exclusive lock ```sql CLUSTER your_table_name USING your_primary_key_index; ``` ## Recommended Workflow 1. **Check current bloat:** ```bash psql -d your_database -f bloat_check.sql ``` 2. **If bloat > 20% and table is large:** - **Production:** Use `pg_repack` (minimal downtime) - **Dev/Maintenance window:** Use `VACUUM FULL` 3. **Prevent future bloat:** ```sql -- Check autovacuum settings SELECT relname, reloptions FROM pg_class WHERE relname = 'your_table_name'; -- Tune autovacuum for high-churn tables ALTER TABLE your_table_name SET ( autovacuum_vacuum_scale_factor = 0.05, -- Vacuum at 5% dead tuples (default 20%) autovacuum_analyze_scale_factor = 0.05 ); ``` 4. **Monitor space after reclaim:** ```sql SELECT pg_size_pretty(pg_total_relation_size('your_table_name')); ``` ## Quick Decision Guide - **< 1GB table:** VACUUM FULL during off-hours - **1-100GB table with downtime:** VACUUM FULL - **1-100GB table in production:** pg_repack - **> 100GB table:** pg_repack or scheduled maintenance window Given your 50GB→10GB situation, **pg_repack is likely your best option** if this is a production system. It will rebuild the table online without blocking operations. Would you like help setting up pg_repack, or do you have a maintenance window where VACUUM FULL would work?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When diagnosing table bloat in PostgreSQL, start by quantifying the actual bloat percentage using dead versus live tuple counts, then investigate why autovacuum isn't keeping up — check its run frequency and look for long-running transactions that block cleanup. Based on bloat severity, choose between manual VACUUM (< 30%), scheduled maintenance VACUUM (30-50%), or disruptive VACUUM FULL/pg_repack (> 50%), and tune autovacuum settings to prevent recurrence.

1Quantify the bloat percentage
Start by calculating bloat as (`dead-tuple` / `live-tuple`) * 100 using the `n_dead_tup` and `n_live_tup` metrics from pg_stat_user_tables, or check the `postgresql-table-bloat` metric directly if available. In your case with 50GB disk but 10GB dump, you're looking at roughly 80% bloat, which is well above the 50% threshold where manual intervention is required. Compare the `postgresql-table-size` metric to your expected data volume to confirm the gap. This measurement tells you how aggressive your remediation needs to be.
2Check autovacuum effectiveness and history
Look at `postgresql-last-vacuum-age` to see when the table was last vacuumed — if it's been days or weeks, autovacuum isn't keeping up with your workload. Check the `postgresql-autovacuumed` counter to verify autovacuum is actually running on this table, not just configured globally. If autovacuum is running but bloat is still accumulating, the default thresholds (typically 50 base + 20% of table) are too high for a high-churn table. This tells you whether the problem is autovacuum not running at all versus not running frequently enough.
3Look for long-running transactions blocking cleanup
Query pg_stat_activity for transactions older than a few minutes, especially idle-in-transaction sessions — even a single long-running transaction prevents VACUUM from reclaiming tuples deleted after that transaction started. Check `postgresql-locks` for any locks that might be blocking autovacuum workers from accessing the table. A transaction that's been open for hours can create a bloat backlog of gigabytes on a busy table, so this is often the root cause when autovacuum settings look reasonable but bloat keeps growing.
4Select the appropriate remediation strategy
With 80% bloat, regular VACUUM won't reclaim disk space — it only marks space for reuse but doesn't shrink the table file. You need either VACUUM FULL (requires exclusive lock, table unavailable during operation) or pg_repack (online, minimal locking, but requires extension install and free disk space equal to table size). I typically use pg_repack for production tables during business hours and VACUUM FULL during maintenance windows. Check `postgresql-locks` before running either to ensure no active long transactions will cause the operation to block or fail.
5Rebuild indexes and tune autovacuum to prevent recurrence
After reclaiming table space, rebuild indexes with REINDEX CONCURRENTLY (PostgreSQL 12+) since they bloat alongside tables and degrade query performance even after table cleanup. Then tune autovacuum for this table specifically: lower autovacuum_vacuum_threshold to 50 and autovacuum_vacuum_scale_factor to 0.05-0.10 so it triggers more frequently on high-churn tables. Monitor `dead-tuple` growth and `postgresql-autovacuumed` frequency over the next few days to verify the new settings prevent bloat from building up again.

Technologies

Related Insights

Index bloat wastes disk space and degrades query performance
warning
long-running transactions preventing cleanup
Stub insight created from scenario discovery hint: long-running transactions preventing cleanup
storage cost impact from bloat
Stub insight created from scenario discovery hint: storage cost impact from bloat
table bloat exceeding healthy threshold
diagnostic_step
Stub insight created from scenario discovery hint: table bloat exceeding healthy threshold
Table bloat exceeds 50% requiring manual VACUUM
diagnostic_stepwarning
need for VACUUM FULL or pg_repack
prerequisite
Stub insight created from scenario discovery hint: need for VACUUM FULL or pg_repack
Table bloat wastes disk space without autovacuum full or pg_repack
outcomeinfo
autovacuum not reclaiming space effectively
prerequisite
Stub insight created from scenario discovery hint: autovacuum not reclaiming space effectively
Table bloat from suboptimal autovacuum settings slows query performance
supporting_evidencewarning
Table bloat increases sequential scan cost linearly
outcomewarning

Relevant Metrics

Monitoring Interfaces

PostgreSQL Datadog
PostgreSQL Prometheus
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Datadog Integration