Slow Query Diagnosis

warningIncident Response

Application performance degradation requires identifying and optimizing slow-running queries.

Prompt: “Our app response times have doubled in the last hour. How do I quickly find which PostgreSQL queries are slow and whether they need better indexes or query optimization?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

994vs1,017

Latency

18.9smsvs20.9sms

Turns

2vs5

Tool Calls

0vs1

Schema Calls

0vs0

Response

2,374 charsvs2,303 chars

Evaluation

Without Schema

With Schema

B+

Both responses provide comprehensive PostgreSQL diagnostic queries and are technically sound. The response with Schema has slightly better incident framing with its 'Immediate Steps' section and adds a helpful percent_total calculation to prioritize queries by load impact. The response without Schema includes a query to identify unused indexes (an optimization opportunity) that the other lacks. Neither response fully optimizes for incident response — both jump directly to database queries without suggesting checking recent changes or application-level metrics first. The content and depth are remarkably similar, with Schema providing marginal improvements in organization rather than fundamentally different guidance.

Without Schema

I'll help you diagnose slow PostgreSQL queries. Here's a step-by-step approach:

## 1. **Check Currently Running Queries**

Connect to your PostgreSQL database and run:

```sql
-- See what's running right now
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;
```

To kill a slow query if needed: `SELECT pg_terminate_backend(pid);`

## 2. **Find Slow Queries (if pg_stat_statements is enabled)**

```sql
-- Top 10 slowest queries by average time
SELECT 
  round(mean_exec_time::numeric, 2) AS avg_ms,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Queries causing most load (high total time)
SELECT 
  round(total_exec_time::numeric, 2) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
```

## 3. **Analyze Query Performance**

For any slow query you find, run:

```sql
EXPLAIN (ANALYZE, BUFFERS) <your query here>;
```

**Look for these red flags:**
- **Seq Scan** on large tables → needs index
- **High rows estimate vs actual** → needs VACUUM ANALYZE
- **Nested Loop** with large outer table → might need better join strategy
- **High buffer reads** → missing index or inefficient query

## 4. **Find Missing Indexes**

```sql
-- Tables with high sequential scans (may need indexes)
SELECT 
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;

-- Unused indexes (safe to drop)
SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
```

## 5. **Quick Wins**

```sql
-- Update statistics (often fixes bad query plans)
ANALYZE;

-- Check for bloat
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::text))
FROM pg_tables WHERE schemaname = 'public';
```

**What's your setup?** Can you run these queries, or would you like me to help you:
- Enable pg_stat_statements if it's not already on?
- Analyze your application logs for slow query patterns?
- Check your database connection pool settings?

With Schema

1 tool calls

I'll help you quickly identify slow PostgreSQL queries and diagnose if they need indexes or optimization. Let me check your current setup first.
Here's a quick action plan to diagnose slow PostgreSQL queries:

## **Immediate Steps**

**1. Check currently running queries:**
```sql
-- Find queries running longer than 1 second
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle' 
  AND now() - query_start > interval '1 second'
ORDER BY duration DESC;
```

**2. Enable and query pg_stat_statements (most powerful tool):**
```sql
-- Check if enabled
SELECT * FROM pg_available_extensions WHERE name = 'pg_stat_statements';

-- If not enabled, add to postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.track = all

-- Find slowest queries
SELECT 
  round(total_exec_time::numeric, 2) AS total_time_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_total,
  substring(query, 1, 200) AS query_snippet
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
```

**3. Analyze query plans for slow queries:**
```sql
-- Use EXPLAIN ANALYZE on the slow query
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) 
SELECT ... your slow query ...;
```

## **Red Flags That Need Indexes**

Look for these in EXPLAIN output:
- `Seq Scan` on large tables (should be Index Scan)
- High `cost` values
- `rows` estimate vs actual rows mismatch
- `Buffers: shared read` with high numbers

**Quick index check:**
```sql
-- Find tables with seq scans
SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
```

## **Quick Wins**

**Missing indexes on WHERE/JOIN columns:**
```sql
CREATE INDEX CONCURRENTLY idx_name ON table_name(column_name);
```

**For queries with recent slowdown, check:**
- Table bloat: `SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) FROM pg_tables ORDER BY pg_total_relation_size DESC;`
- Need VACUUM/ANALYZE: `ANALYZE table_name;`
- Lock contention: Check `pg_locks` joined with `pg_stat_activity`

Want me to help analyze specific queries or set up monitoring for your database?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When app response times suddenly double, start by identifying which queries are consuming the most total execution time, then check for the most common culprits: missing indexes causing sequential scans and stale statistics leading to bad query plans. If those aren't the issue, investigate lock contention and I/O pressure before looking at table bloat.

1Identify the top queries by total execution time

Query pg_stat_statements ordered by total_exec_time (mean_exec_time × calls) to find your actual bottlenecks. A 20ms query called 50K times in the last hour is far more damaging than a 5-second query called twice. Look at `postgresql-queries-duration-max` for the longest individual queries and `postgresql-queries-count` to understand call frequency. The insight on `slow-queries-degrading-performance` shows that these top 3-5 queries typically account for 80% of your slowdown.

slow queries degrading performance queries_duration_maxpostgresql.queries.count

2Check for missing indexes causing sequential scans

Compare `postgresql-sequential-scans` to index scans on tables larger than 100K rows. If `sequential-tuple-read` is high (millions of rows scanned per minute) on tables that should be using indexes, you've found your problem. Run EXPLAIN ANALYZE on your top queries from step 1—look for 'Seq Scan' nodes on large tables. The `missing-indexes-causing-sequential-scans` insight explains that even one missing index on a WHERE or JOIN clause can force full table scans, turning a 10ms query into a 2-second one.

missing indexes causing sequential scans need for index creation postgresql.sequential_scansseq_tup_read

3Check for stale statistics and query plan regression

Query pg_stat_all_tables for n_mod_since_analyze—if critical tables show >10% of rows modified since last analyze, statistics are stale. The `stale-statistics-inefficient-query-plans` insight is crucial here: unlike bloat which degrades linearly, stale stats can cause the planner to choose exponentially worse plans (nested loops instead of hash joins, wrong join order). If you recently had a bulk load or heavy UPDATE activity and didn't run ANALYZE, this is likely your culprit. The `query-plan-regression` often correlates with the `dead-tuple` metric climbing rapidly.

Stale statistics cause inefficient query plans with exponentially worse performance statistics out of date query plan regression n_dead_tup

4Look for lock contention blocking queries

Check `postgresql-locks` and query pg_stat_activity for queries in 'waiting' state. A single long-running transaction holding an AccessExclusiveLock can block dozens of queries, creating the appearance of widespread slowness. The `lock-contention-deadlocks-blocking` insight shows this is especially common after schema changes or VACUUM FULL operations. Query pg_locks joined with pg_stat_activity to identify blocking PIDs—if you see multiple queries waiting on the same lock, killing the blocker with pg_cancel_backend() often immediately resolves the issue.

Lock contention and deadlocks block transaction progress postgresql.locks

5Check for memory pressure and excessive disk I/O

If `postgresql-temp-bytes` is non-zero and climbing, queries are spilling to disk because work_mem is too small for sorts and hash joins. Check `postgresql-queries-shared-blks-read` and `postgresql-queries-blk-write-time`—if block write time is high (>100ms per query), you're I/O bound. Low cache hit ratios (check the `hit` metric—should be >95%) mean insufficient shared_buffers. This ties to `inefficient-query-patterns` and often indicates queries that could be rewritten to avoid large sorts or hash operations.

inefficient query patterns need for query optimization postgresql.temp_bytespostgresql.queries.shared_blks_readpostgresql.queries.blk_write_timehits

6Check for table bloat from dead tuples

Query pg_stat_user_tables for tables with high `dead-tuple` counts (n_dead_tup > 10% of n_live_tup). The `dead-tuples-exceed-threshold-bloat` insight explains that accumulated dead tuples force sequential scans to read more pages and degrade index performance. This is a slower-burning issue than the others—it causes gradual degradation over days, not sudden hourly spikes—but if autovacuum is disabled or can't keep up, bloat amplifies all the other issues. Run VACUUM ANALYZE on heavily bloated tables and check autovacuum settings.

Dead tuples exceed threshold causing table and index bloat n_dead_tup