Concurrent Index Build Monitoring

infoProactive Health

Monitoring PostgreSQL CREATE INDEX CONCURRENTLY operations to track progress, manage locks, and ensure successful completion without blocking production traffic.

Prompt: “I'm running CREATE INDEX CONCURRENTLY on a 200GB production table in PostgreSQL. It's been running for 2 hours and I need to know if it's progressing normally or stuck. How do I monitor the operation and what could cause it to fail?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

1,531vs1,203

Latency

30.7smsvs24.3sms

Turns

2vs2

Tool Calls

0vs0

Schema Calls

0vs0

Response

4,600 charsvs3,431 chars

Evaluation

Without Schema

A-

With Schema

Both responses provide comprehensive, technically accurate SQL queries and cover all critical monitoring aspects. The response without Schema offers more extensive coverage including a detailed blocking locks query and invalid index checks, but presents information in a more linear fashion. The response with Schema provides better organization through numbered queries and explicitly prioritizes the most critical issue (long-running transactions marked as '#1 killer') with clear visual markers on the idle transaction check. Both would reliably guide the operator to resolution, but the response with Schema's clearer structure and explicit prioritization makes it slightly more actionable for an SRE troubleshooting under time pressure.

Without Schema

I'll help you monitor your `CREATE INDEX CONCURRENTLY` operation and identify potential issues.

## Check Index Creation Progress

**1. Check if the backend is still active:**
```sql
SELECT pid, now() - query_start AS duration, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE query ILIKE '%CREATE INDEX%CONCURRENTLY%'
  AND query NOT ILIKE '%pg_stat_activity%';
```

**2. Monitor the index status:**
```sql
SELECT schemaname, tablename, indexname, 
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_progress_create_index;
```

**3. Detailed progress view (PostgreSQL 12+):**
```sql
SELECT 
    p.pid,
    now() - a.xact_start AS duration,
    p.datname,
    p.relid::regclass AS table_name,
    p.index_relid::regclass AS index_name,
    p.phase,
    p.lockers_total,
    p.lockers_done,
    p.blocks_total,
    p.blocks_done,
    p.tuples_total,
    p.tuples_done,
    round(100.0 * p.tuples_done / NULLIF(p.tuples_total, 0), 2) AS pct_done
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid;
```

## Key Phases to Watch

The operation goes through these phases:
- **initializing** - Just started
- **waiting for writers before build** - Waiting for transactions to complete
- **building index** - Actually creating the index (longest phase)
- **waiting for writers before validation** - Waiting again
- **validation** - Validating the index
- **waiting for old snapshots** - Final wait
- **final validation** - Almost done

## Common Causes of Failure/Slowness

**1. Long-running transactions blocking progress:**
```sql
SELECT pid, now() - xact_start AS duration, state, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start
LIMIT 10;
```

**2. Check for blocking locks:**
```sql
SELECT blocked_locks.pid AS blocked_pid,
       blocked_activity.usename AS blocked_user,
       blocking_locks.pid AS blocking_pid,
       blocking_activity.usename AS blocking_user,
       blocked_activity.query AS blocked_statement,
       blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks 
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
```

**3. Check for invalid indexes (if a previous attempt failed):**
```sql
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
  AND indexrelid IN (
    SELECT indexrelid 
    FROM pg_index 
    WHERE NOT indisvalid
  );
```

## What Could Cause Failure

- **Old transactions**: Long-running transactions prevent the index from being finalized
- **Idle in transaction**: Connections sitting idle in transaction state
- **Statement timeout**: If `statement_timeout` is set too low
- **Disk space**: Running out of disk space (monitor with `df -h`)
- **OOM**: Running out of memory
- **Connection killed**: Network issues or manual termination

## If It's Stuck

**Don't kill it unless necessary** - if you kill it:
- The index will be marked INVALID
- You'll need to drop it and start over
- 2 hours of work is lost

**Instead:**
1. Check if it's in a wait phase - this is normal
2. Look for and terminate blocking idle transactions
3. Monitor `pg_stat_progress_create_index` - if `tuples_done` is increasing, it's working

**To safely cancel if needed:**
```sql
-- First, get the PID from pg_stat_activity
SELECT pg_cancel_backend(pid);  -- Try gentle cancel first

-- If that doesn't work after 30 seconds:
SELECT pg_terminate_backend(pid);  -- Forceful termination
```

For a 200GB table, 2 hours is not unusual depending on hardware, index type, and current system load. Keep monitoring - if the tuples_done is increasing, it's progressing normally.

With Schema

I'll help you monitor your `CREATE INDEX CONCURRENTLY` operation and identify potential issues.

## Check Current Progress

Run these queries to monitor the operation:

```sql
-- 1. Check if the index build is active and see its phase
SELECT 
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query,
    state,
    wait_event_type,
    wait_event
FROM pg_stat_activity
WHERE query LIKE '%CREATE INDEX CONCURRENTLY%'
  AND query NOT LIKE '%pg_stat_activity%';

-- 2. Check pg_stat_progress_create_index (PostgreSQL 12+)
SELECT 
    p.pid,
    now() - a.query_start AS duration,
    p.phase,
    p.blocks_total,
    p.blocks_done,
    ROUND(100.0 * p.blocks_done / NULLIF(p.blocks_total, 0), 2) AS percent_done,
    p.tuples_total,
    p.tuples_done
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid;

-- 3. Check for blocking transactions (critical!)
SELECT 
    pid,
    usename,
    state,
    query_start,
    state_change,
    query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND query_start < now() - interval '5 minutes'
ORDER BY query_start;

-- 4. Check pg_locks for the index creation
SELECT 
    l.locktype,
    l.mode,
    l.granted,
    a.query,
    a.state
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE a.query LIKE '%CREATE INDEX CONCURRENTLY%';
```

## Key Phases to Watch

`CREATE INDEX CONCURRENTLY` goes through these phases:
1. **initializing** - Quick
2. **waiting for old snapshots** - Can be slow if there are long-running transactions
3. **building index** - Main work, should show steady progress in `blocks_done`
4. **waiting for readers before marking valid** - Can hang if there are long transactions

## Common Causes of Failure/Stalling

1. **Long-running transactions** - The #1 killer
   - CICC must wait for all transactions that started before it to complete
   - Even `idle in transaction` sessions will block it
   - **Action**: Find and kill blocking transactions (check query #3 above)

2. **Deadlocks** - The operation might silently fail
   - Check `pg_stat_activity` - if your CREATE INDEX disappeared, it failed
   - Check logs: `grep "CREATE INDEX" /var/log/postgresql/*.log`

3. **Disk space** - Index needs additional space during build
   ```sql
   SELECT pg_size_pretty(pg_total_relation_size('your_table_name'));
   ```

4. **Statement timeout** - Check if statement_timeout would kill it
   ```sql
   SHOW statement_timeout;
   ```

5. **autovacuum conflicts** - Less common but possible

## What's Normal for 200GB?

- **2 hours** could be normal depending on:
  - Number of columns in the index
  - Hardware (disk I/O, CPU)
  - Server load
  - Whether you're indexing expressions or simple columns

- If `percent_done` is progressing steadily, you're fine
- If stuck at "waiting for old snapshots" phase → kill long transactions

## If You Need to Cancel

```sql
-- Find the PID from queries above, then:
SELECT pg_cancel_backend(PID);  -- Graceful
-- or if that doesn't work after 30 seconds:
SELECT pg_terminate_backend(PID);  -- Forceful
```

**Note**: Cancelled CICC operations leave behind an INVALID index that you should drop:
```sql
SELECT indexname FROM pg_indexes WHERE tablename = 'your_table' AND schemaname = 'public';
-- Then DROP INDEX CONCURRENTLY if invalid
```

Run the progress queries above and let me know what phase it's in and if you see any long-running transactions blocking it!

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When monitoring CREATE INDEX CONCURRENTLY, start by confirming the operation is actually progressing through block and tuple metrics, then calculate how far along it is. The most common causes of stalls are long-running or idle transactions that prevent the concurrent build from advancing through its phases, so check for blocking sessions before investigating lock contention.

1Verify the index build is making forward progress

Check `block-done` and `postgresql-create-index-tuples-done` in pg_stat_progress_create_index and compare values over a 5-10 minute window. If these numbers aren't increasing, the operation is stuck, not just slow. On a 200GB table you should see thousands of tuples processed per second under normal conditions. No progress means you're blocked waiting for something—move immediately to checking transactions and locks.

block_donepostgresql.create_index.tuples_done

2Calculate completion percentage and assess if timing is normal

Divide `block-done` by `postgresql-create-index-blocks-total` to get percent complete. A 200GB table is roughly 25 million 8KB blocks—at 50K blocks/minute (typical for concurrent builds with moderate write traffic), you'd expect 8-9 hours total. If you're 50% done after 2 hours, you're ahead of schedule. Under 20% suggests the operation is being slowed by contention or hardware constraints.

block_donepostgresql.create_index.blocks_total

3Hunt for long-running transactions that started before the index build

Query pg_stat_activity for transactions with xact_start or query_start older than when your CREATE INDEX CONCURRENTLY began. The `long-transactions-hold-locks` and `long-running-transactions-preventing-cleanup` insights explain that concurrent index builds must wait for all pre-existing transactions to complete before advancing to later phases. Even a single forgotten read-only transaction from 2 hours ago will block phase transitions. Kill or commit these transactions to unblock progress.

Long-running transactions hold locks and block other operations long-running transactions preventing cleanup

4Check for idle-in-transaction sessions holding snapshots

Look in pg_stat_activity for sessions with state='idle in transaction', especially those with state_change timestamps before your index build started. The `idle-transaction-locks` insight highlights that these sessions hold transaction snapshots that prevent concurrent index operations from progressing, and with default idle_in_transaction_session_timeout=0 they'll block indefinitely. Terminate these sessions or set a timeout to prevent recurrence.

Idle transactions hold locks and block operations

5Monitor lock contention through locker metrics

Compare `postgresql-create-index-lockers-done` to `postgresql-create-index-lockers-total`—if these are far apart, the index build is waiting for locks on the table. Also check `postgresql-locks` for ACCESS EXCLUSIVE or SHARE UPDATE EXCLUSIVE locks on your table. High lock counts indicate write-heavy workloads competing with the concurrent build; this is expected but if locks aren't clearing, you may have a stuck transaction (go back to step 3).

postgresql.create_index.lockers_donepostgresql.create_index.lockers_totalpostgresql.locks

6If partitioned, assess per-partition progress distribution

Check if `partitions` is greater than 1, indicating a partitioned table. Compare `postgresql-create-index-partitions-done` to total partitions—the index builds separately on each partition, so you might be 100% done on some and 0% on others. This is normal, but if one partition is taking much longer, it may have significantly more data or different write patterns affecting the concurrent build.

partitions_totalpostgresql.create_index.partitions_done