I'll help you choose the right approach for handling PostgreSQL index bloat without impacting production.
## Quick Recommendation
**Use REINDEX CONCURRENTLY** if you're on PostgreSQL 12+ - it's built-in, simpler, and sufficient for most cases.
**Use pg_repack** if you:
- Need to rebuild tables (not just indexes)
- Are on PostgreSQL < 12
- Want slightly lower lock requirements during the rebuild
- Have very large indexes where you want more control
## Comparison
**REINDEX CONCURRENTLY (PostgreSQL 12+)**
- ✅ Built-in, no extensions needed
- ✅ No exclusive locks during rebuild
- ✅ Simpler to use
- ⚠️ Requires 2x disk space temporarily
- ⚠️ Takes `ShareUpdateExclusiveLock` briefly at start/end (blocks schema changes, not reads/writes)
- ⚠️ Cannot rebuild invalid indexes or system catalogs
**pg_repack**
- ✅ Can rebuild both tables and indexes
- ✅ Slightly more granular control
- ✅ Works on PostgreSQL < 12
- ⚠️ Requires extension installation
- ⚠️ Also needs temporary space
- ⚠️ More complex setup
## Production-Safe Approach
```sql
-- 1. First, verify bloat on a specific index
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE indexrelname = 'your_index_name';
-- 2. Set a statement timeout to prevent runaway operations
SET statement_timeout = '2h';
-- 3. Reindex concurrently (one index at a time)
REINDEX INDEX CONCURRENTLY your_index_name;
-- 4. For multiple indexes on a table
REINDEX TABLE CONCURRENTLY your_table_name;
```
## Best Practices to Avoid Locking
1. **Run during low-traffic periods** - even CONCURRENTLY has brief lock moments
2. **Do indexes one at a time** rather than all at once:
```bash
# Script to reindex sequentially with monitoring
psql -c "SELECT indexname FROM pg_indexes
WHERE tablename = 'high_traffic_table'" -t | \
while read index; do
echo "Reindexing $index..."
psql -c "REINDEX INDEX CONCURRENTLY $index"
sleep 5 # Brief pause between operations
done
```
3. **Monitor active locks** in another session:
```sql
SELECT
pid,
usename,
wait_event_type,
query,
state
FROM pg_stat_activity
WHERE datname = current_database()
AND query ILIKE '%reindex%';
```
4. **Check for blocking queries**:
```sql
SELECT
blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_query,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
```
## If Using pg_repack
```bash
# Install extension first
CREATE EXTENSION pg_repack;
# Repack a specific table (rebuilds table + indexes)
pg_repack -t your_table_name -d your_database --no-kill-backend
# Just indexes
pg_repack -t your_table_name -d your_database --index-only
```
## Risk Mitigation
1. **Ensure sufficient disk space** - you need roughly 2x the index size
2. **Set connection limits** appropriately - REINDEX CONCURRENTLY uses 2 connections
3. **Have a rollback plan** - the old index stays until completion, so canceling is safe
4. **Test on a replica first** if you have streaming replication
## Long-term Prevention
Consider addressing root causes:
- Tune `fillfactor` for heavily updated indexes (e.g., 70 instead of 100)
- Schedule regular VACUUM/ANALYZE
- Monitor bloat with pg_stat_user_tables.n_dead_tup
- Consider partitioning very large tables
Would you like me to help you create a monitoring script or generate the specific REINDEX commands for your bloated indexes?
I'll help you address PostgreSQL index bloat safely in production. Here's a comparison and practical guidance:
## **REINDEX CONCURRENTLY vs pg_repack**
### **REINDEX CONCURRENTLY** (PostgreSQL 12+)
**Pros:**
- Built into PostgreSQL (no extension needed)
- Truly concurrent - allows reads and writes
- Simpler to use
**Cons:**
- Temporarily requires 2x disk space (builds new index alongside old)
- Cannot be run in a transaction (can't roll back)
- Slower than regular REINDEX
- Cannot reindex system catalogs or invalid indexes
**Locks:** Takes ShareUpdateExclusiveLock (blocks other DDL, allows DML)
### **pg_repack**
**Pros:**
- Can repack tables AND indexes together
- More space-efficient for tables
- Can handle invalid indexes
**Cons:**
- Requires extension installation
- More complex setup
- Creates triggers temporarily (slight write overhead)
**Locks:** Brief exclusive locks at start and end (usually milliseconds)
---
## **Recommended Approach for 30%+ Bloat**
For your situation with high-traffic tables, I recommend:
### **1. Use REINDEX CONCURRENTLY for individual indexes:**
```sql
-- Check disk space first (need ~2x current index size)
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Reindex one at a time during lower traffic periods
REINDEX INDEX CONCURRENTLY index_name;
```
### **2. Safe execution pattern:**
```sql
-- Monitor blocking queries
SELECT
pid,
usename,
state,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
-- Set statement timeout as safety net
SET statement_timeout = '2h';
-- Execute reindex
REINDEX INDEX CONCURRENTLY users_email_idx;
-- Verify success
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE indexname = 'users_email_idx';
```
### **3. For multiple indexes, use pg_repack:**
```sql
-- Install extension (requires superuser)
CREATE EXTENSION pg_repack;
-- Dry run first
pg_repack --table tablename --dry-run -d dbname
-- Repack table (rebuilds all indexes)
pg_repack --table tablename -d dbname --no-superuser-check
-- Or repack specific index
pg_repack --index indexname -d dbname
```
---
## **Production Safety Checklist**
**Before starting:**
```bash
# 1. Check available disk space (need ~2x largest index)
df -h /var/lib/postgresql
# 2. Identify bloat
psql -d yourdb -f - <<'SQL'
SELECT
schemaname || '.' || tablename AS table,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
ROUND(100 * (pg_relation_size(indexrelid) -
pg_relation_size(indexrelid, 'main')) /
NULLIF(pg_relation_size(indexrelid), 0), 1) AS bloat_pct
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
SQL
# 3. Schedule during low-traffic window
# 4. Set up monitoring for locks and replication lag
```
**During execution:**
```sql
-- Monitor progress (in another session)
SELECT
now() - query_start AS duration,
state,
query
FROM pg_stat_activity
WHERE query LIKE '%REINDEX%' OR query LIKE '%pg_repack%';
-- Check for blocking
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocker.pid AS blocker_pid,
blocker.query AS blocker_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocker_locks ON
blocked_locks.relation = blocker_locks.relation
AND blocked_locks.pid != blocker_locks.pid
JOIN pg_stat_activity blocker ON blocker.pid = blocker_locks.pid
WHERE NOT blocked_locks.granted;
```
---
## **My Recommendation**
Given your **30%+ bloat on high-traffic tables:**
1. **Start with REINDEX CONCURRENTLY** for your most critical indexes:
- Easier to test and validate
- More predictable behavior
- No extension dependencies
2. **Switch to pg_repack if:**
- You're running out of disk space (it's more efficient)
- You need to repack entire tables, not just indexes
- You have many indexes per table (repack is faster for bulk operations)
3. **Execution strategy:**
- Do 1-2 indexes per day initially to observe impact
- Monitor replication lag if you have replicas
- Run during lowest traffic periods (even though concurrent)
- Keep old statistics: `REINDEX ... CONCURRENTLY` preserves them
4. **Preventive measures:**
- Set up autovacuum more aggressively: `autovacuum_vacuum_scale_factor = 0.05`
- Monitor bloat weekly with a script
- Consider partitioning extremely high-traffic tables
Would you like me to help you generate a specific maintenance script for your indexes, or do you need help identifying which indexes should be prioritized first?