Concurrent Index Build Monitoring

infoProactive Health

Monitoring PostgreSQL CREATE INDEX CONCURRENTLY operations to track progress, manage locks, and ensure successful completion without blocking production traffic.

Prompt: I'm running CREATE INDEX CONCURRENTLY on a 200GB production table in PostgreSQL. It's been running for 2 hours and I need to know if it's progressing normally or stuck. How do I monitor the operation and what could cause it to fail?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,531vs1,203
Latency
30.7smsvs24.3sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
4,600 charsvs3,431 chars
Evaluation
Without Schema
A-
vs
With Schema
A
Both responses provide comprehensive, technically accurate SQL queries and cover all critical monitoring aspects. The response without Schema offers more extensive coverage including a detailed blocking locks query and invalid index checks, but presents information in a more linear fashion. The response with Schema provides better organization through numbered queries and explicitly prioritizes the most critical issue (long-running transactions marked as '#1 killer') with clear visual markers on the idle transaction check. Both would reliably guide the operator to resolution, but the response with Schema's clearer structure and explicit prioritization makes it slightly more actionable for an SRE troubleshooting under time pressure.
Without Schema
I'll help you monitor your `CREATE INDEX CONCURRENTLY` operation and identify potential issues. ## Check Index Creation Progress **1. Check if the backend is still active:** ```sql SELECT pid, now() - query_start AS duration, state, wait_event_type, wait_event, query FROM pg_stat_activity WHERE query ILIKE '%CREATE INDEX%CONCURRENTLY%' AND query NOT ILIKE '%pg_stat_activity%'; ``` **2. Monitor the index status:** ```sql SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_progress_create_index; ``` **3. Detailed progress view (PostgreSQL 12+):** ```sql SELECT p.pid, now() - a.xact_start AS duration, p.datname, p.relid::regclass AS table_name, p.index_relid::regclass AS index_name, p.phase, p.lockers_total, p.lockers_done, p.blocks_total, p.blocks_done, p.tuples_total, p.tuples_done, round(100.0 * p.tuples_done / NULLIF(p.tuples_total, 0), 2) AS pct_done FROM pg_stat_progress_create_index p JOIN pg_stat_activity a ON p.pid = a.pid; ``` ## Key Phases to Watch The operation goes through these phases: - **initializing** - Just started - **waiting for writers before build** - Waiting for transactions to complete - **building index** - Actually creating the index (longest phase) - **waiting for writers before validation** - Waiting again - **validation** - Validating the index - **waiting for old snapshots** - Final wait - **final validation** - Almost done ## Common Causes of Failure/Slowness **1. Long-running transactions blocking progress:** ```sql SELECT pid, now() - xact_start AS duration, state, query FROM pg_stat_activity WHERE xact_start IS NOT NULL ORDER BY xact_start LIMIT 10; ``` **2. Check for blocking locks:** ```sql SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS blocking_statement FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted; ``` **3. Check for invalid indexes (if a previous attempt failed):** ```sql SELECT schemaname, tablename, indexname FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema') AND indexrelid IN ( SELECT indexrelid FROM pg_index WHERE NOT indisvalid ); ``` ## What Could Cause Failure - **Old transactions**: Long-running transactions prevent the index from being finalized - **Idle in transaction**: Connections sitting idle in transaction state - **Statement timeout**: If `statement_timeout` is set too low - **Disk space**: Running out of disk space (monitor with `df -h`) - **OOM**: Running out of memory - **Connection killed**: Network issues or manual termination ## If It's Stuck **Don't kill it unless necessary** - if you kill it: - The index will be marked INVALID - You'll need to drop it and start over - 2 hours of work is lost **Instead:** 1. Check if it's in a wait phase - this is normal 2. Look for and terminate blocking idle transactions 3. Monitor `pg_stat_progress_create_index` - if `tuples_done` is increasing, it's working **To safely cancel if needed:** ```sql -- First, get the PID from pg_stat_activity SELECT pg_cancel_backend(pid); -- Try gentle cancel first -- If that doesn't work after 30 seconds: SELECT pg_terminate_backend(pid); -- Forceful termination ``` For a 200GB table, 2 hours is not unusual depending on hardware, index type, and current system load. Keep monitoring - if the tuples_done is increasing, it's progressing normally.
With Schema
I'll help you monitor your `CREATE INDEX CONCURRENTLY` operation and identify potential issues. ## Check Current Progress Run these queries to monitor the operation: ```sql -- 1. Check if the index build is active and see its phase SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state, wait_event_type, wait_event FROM pg_stat_activity WHERE query LIKE '%CREATE INDEX CONCURRENTLY%' AND query NOT LIKE '%pg_stat_activity%'; -- 2. Check pg_stat_progress_create_index (PostgreSQL 12+) SELECT p.pid, now() - a.query_start AS duration, p.phase, p.blocks_total, p.blocks_done, ROUND(100.0 * p.blocks_done / NULLIF(p.blocks_total, 0), 2) AS percent_done, p.tuples_total, p.tuples_done FROM pg_stat_progress_create_index p JOIN pg_stat_activity a ON p.pid = a.pid; -- 3. Check for blocking transactions (critical!) SELECT pid, usename, state, query_start, state_change, query FROM pg_stat_activity WHERE state = 'idle in transaction' AND query_start < now() - interval '5 minutes' ORDER BY query_start; -- 4. Check pg_locks for the index creation SELECT l.locktype, l.mode, l.granted, a.query, a.state FROM pg_locks l JOIN pg_stat_activity a ON l.pid = a.pid WHERE a.query LIKE '%CREATE INDEX CONCURRENTLY%'; ``` ## Key Phases to Watch `CREATE INDEX CONCURRENTLY` goes through these phases: 1. **initializing** - Quick 2. **waiting for old snapshots** - Can be slow if there are long-running transactions 3. **building index** - Main work, should show steady progress in `blocks_done` 4. **waiting for readers before marking valid** - Can hang if there are long transactions ## Common Causes of Failure/Stalling 1. **Long-running transactions** - The #1 killer - CICC must wait for all transactions that started before it to complete - Even `idle in transaction` sessions will block it - **Action**: Find and kill blocking transactions (check query #3 above) 2. **Deadlocks** - The operation might silently fail - Check `pg_stat_activity` - if your CREATE INDEX disappeared, it failed - Check logs: `grep "CREATE INDEX" /var/log/postgresql/*.log` 3. **Disk space** - Index needs additional space during build ```sql SELECT pg_size_pretty(pg_total_relation_size('your_table_name')); ``` 4. **Statement timeout** - Check if statement_timeout would kill it ```sql SHOW statement_timeout; ``` 5. **autovacuum conflicts** - Less common but possible ## What's Normal for 200GB? - **2 hours** could be normal depending on: - Number of columns in the index - Hardware (disk I/O, CPU) - Server load - Whether you're indexing expressions or simple columns - If `percent_done` is progressing steadily, you're fine - If stuck at "waiting for old snapshots" phase → kill long transactions ## If You Need to Cancel ```sql -- Find the PID from queries above, then: SELECT pg_cancel_backend(PID); -- Graceful -- or if that doesn't work after 30 seconds: SELECT pg_terminate_backend(PID); -- Forceful ``` **Note**: Cancelled CICC operations leave behind an INVALID index that you should drop: ```sql SELECT indexname FROM pg_indexes WHERE tablename = 'your_table' AND schemaname = 'public'; -- Then DROP INDEX CONCURRENTLY if invalid ``` Run the progress queries above and let me know what phase it's in and if you see any long-running transactions blocking it!

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When monitoring CREATE INDEX CONCURRENTLY, start by confirming the operation is actually progressing through block and tuple metrics, then calculate how far along it is. The most common causes of stalls are long-running or idle transactions that prevent the concurrent build from advancing through its phases, so check for blocking sessions before investigating lock contention.

1Verify the index build is making forward progress
Check `block-done` and `postgresql-create-index-tuples-done` in pg_stat_progress_create_index and compare values over a 5-10 minute window. If these numbers aren't increasing, the operation is stuck, not just slow. On a 200GB table you should see thousands of tuples processed per second under normal conditions. No progress means you're blocked waiting for something—move immediately to checking transactions and locks.
2Calculate completion percentage and assess if timing is normal
Divide `block-done` by `postgresql-create-index-blocks-total` to get percent complete. A 200GB table is roughly 25 million 8KB blocks—at 50K blocks/minute (typical for concurrent builds with moderate write traffic), you'd expect 8-9 hours total. If you're 50% done after 2 hours, you're ahead of schedule. Under 20% suggests the operation is being slowed by contention or hardware constraints.
3Hunt for long-running transactions that started before the index build
Query pg_stat_activity for transactions with xact_start or query_start older than when your CREATE INDEX CONCURRENTLY began. The `long-transactions-hold-locks` and `long-running-transactions-preventing-cleanup` insights explain that concurrent index builds must wait for all pre-existing transactions to complete before advancing to later phases. Even a single forgotten read-only transaction from 2 hours ago will block phase transitions. Kill or commit these transactions to unblock progress.
4Check for idle-in-transaction sessions holding snapshots
Look in pg_stat_activity for sessions with state='idle in transaction', especially those with state_change timestamps before your index build started. The `idle-transaction-locks` insight highlights that these sessions hold transaction snapshots that prevent concurrent index operations from progressing, and with default idle_in_transaction_session_timeout=0 they'll block indefinitely. Terminate these sessions or set a timeout to prevent recurrence.
5Monitor lock contention through locker metrics
Compare `postgresql-create-index-lockers-done` to `postgresql-create-index-lockers-total`—if these are far apart, the index build is waiting for locks on the table. Also check `postgresql-locks` for ACCESS EXCLUSIVE or SHARE UPDATE EXCLUSIVE locks on your table. High lock counts indicate write-heavy workloads competing with the concurrent build; this is expected but if locks aren't clearing, you may have a stuck transaction (go back to step 3).
6If partitioned, assess per-partition progress distribution
Check if `partitions` is greater than 1, indicating a partitioned table. Compare `postgresql-create-index-partitions-done` to total partitions—the index builds separately on each partition, so you might be 100% done on some and 0% on others. This is normal, but if one partition is taking much longer, it may have significantly more data or different write patterns affecting the concurrent build.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

PostgreSQL Datadog
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Datadog Integration