Query Plan Regression Detection

warningIncident Response

Previously fast queries have become slow due to query planner choosing suboptimal execution plans.

Prompt: A query that used to run in 100ms is now taking 30 seconds, but the data volume hasn't changed much. How do I check if the query plan changed and why the optimizer might be making a bad choice now?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
969vs1,078
Latency
19.3smsvs21.3sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
2,717 charsvs2,853 chars
Evaluation
Without Schema
B+
vs
With Schema
B+
Both responses provide comprehensive coverage of query plan regression diagnosis across PostgreSQL, MySQL, and SQL Server, with concrete SQL commands for capturing plans, checking statistics, and comparing execution history. The response without Schema offers more detailed explanations of plan differences to look for (sequential vs index scans, estimated vs actual row counts), while the response with Schema presents a more organized structure with numbered sections and a clearer diagnostic checklist. Neither response demonstrates unique insights that would indicate access to curated monitoring intelligence, and both defer to requesting more specifics (database system, actual query, EXPLAIN output) rather than providing a complete diagnostic methodology. The responses are substantively equivalent in technical accuracy and actionability.
Without Schema
Here's how to diagnose and fix query plan regression: ## Check Current vs Historical Query Plan **PostgreSQL:** ```sql -- Get current plan EXPLAIN (ANALYZE, BUFFERS) SELECT ...; -- Check plan history (if pg_stat_statements enabled) SELECT query, calls, mean_exec_time, stddev_exec_time FROM pg_stat_statements WHERE query LIKE '%your_query%'; ``` **MySQL:** ```sql -- Get current plan EXPLAIN FORMAT=JSON SELECT ...; -- Check execution stats SELECT * FROM performance_schema.events_statements_history WHERE sql_text LIKE '%your_query%'; ``` **SQL Server:** ```sql -- Get current plan with stats SET STATISTICS IO ON; SET STATISTICS TIME ON; -- Your query here -- Check plan cache SELECT * FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle); ``` ## Common Causes of Plan Regression 1. **Stale statistics** - Optimizer using outdated data distribution info ```sql -- PostgreSQL ANALYZE table_name; -- MySQL ANALYZE TABLE table_name; -- SQL Server UPDATE STATISTICS table_name WITH FULLSCAN; ``` 2. **Missing/unused index** - Index was dropped, disabled, or optimizer ignores it - Check index usage stats - Look for "Seq Scan" instead of "Index Scan" in plan 3. **Data distribution skew** - New data patterns make cardinality estimates wrong - Check histograms - Consider filtered statistics for skewed columns 4. **Parameter sniffing** (SQL Server/prepared statements) - First execution cached a bad plan for atypical parameters - Fix: `OPTION (RECOMPILE)` or `OPTIMIZE FOR UNKNOWN` 5. **Configuration changes** - Cost parameters, work_mem, or optimizer settings changed ## Quick Diagnosis Steps 1. **Compare plans side by side** - Look for: - Seq Scan → Index Scan (or vice versa) - Hash Join → Nested Loop - Estimated rows vs actual rows (major mismatch = bad statistics) 2. **Check index health:** ```sql -- PostgreSQL: bloat & missing indexes SELECT schemaname, tablename, attname, n_distinct, correlation FROM pg_stats WHERE tablename = 'your_table'; ``` 3. **Force good plan temporarily:** - PostgreSQL: Use hints or `SET enable_seqscan=off` for testing - MySQL: `USE INDEX (index_name)` - SQL Server: Query hints like `WITH (INDEX(idx_name))` ## Fixes - **Update statistics** (most common fix) - **Create missing indexes** based on WHERE/JOIN clauses - **Increase statistics target** for columns with high cardinality - **Pin a good plan** (last resort) using plan guides/hints - **Rewrite query** to help optimizer (e.g., break into CTEs, remove functions on indexed columns) What database system are you using? I can give more specific commands if you share the actual query.
With Schema
I'll help you diagnose this query plan regression. Here's how to investigate: ## 1. **Capture and Compare Query Plans** ### PostgreSQL: ```sql -- Get current plan EXPLAIN (ANALYZE, BUFFERS) SELECT ...; -- Check plan history (if pg_stat_statements enabled) SELECT query, calls, mean_exec_time, stddev_exec_time FROM pg_stat_statements WHERE query LIKE '%your_table%' ORDER BY mean_exec_time DESC; ``` ### MySQL: ```sql -- Get current plan EXPLAIN FORMAT=JSON SELECT ...; -- Check slow query log for historical plans -- Or use Performance Schema: SELECT * FROM performance_schema.events_statements_history WHERE sql_text LIKE '%your_query%'; ``` ### SQL Server: ```sql -- Get current plan SET STATISTICS IO, TIME ON; SELECT ...; -- Check plan cache SELECT * FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp WHERE st.text LIKE '%your_query%'; ``` ## 2. **Common Causes of Plan Regression** Look for these issues: **Stale Statistics:** ```sql -- PostgreSQL ANALYZE your_table; SELECT schemaname, tablename, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE tablename = 'your_table'; -- MySQL ANALYZE TABLE your_table; -- SQL Server UPDATE STATISTICS your_table WITH FULLSCAN; ``` **Missing/Dropped Indexes:** ```sql -- Check indexes still exist -- PostgreSQL: \d+ table_name -- MySQL: SHOW INDEXES FROM table_name; ``` **Parameter Sniffing** (SQL Server): ```sql -- Force recompile OPTION (RECOMPILE) -- Or clear plan cache DBCC FREEPROCCACHE; ``` **Data Distribution Changes:** - Histogram skew - Cardinality estimate is now way off ## 3. **Quick Diagnostic Checklist** Run these checks: 1. **Statistics age** - when were they last updated? 2. **Index availability** - are all expected indexes present? 3. **Plan differences** - compare old vs new (look for scan vs seek, nested loop vs hash join) 4. **Execution metrics** - logical reads, actual vs estimated rows 5. **Locks/blocking** - could contention be the issue? 6. **Parameter changes** - did query parameters change? ## 4. **Immediate Fixes** Try these in order: ```sql -- 1. Update statistics (safe, fast) ANALYZE table_name; -- PostgreSQL ANALYZE TABLE table_name; -- MySQL UPDATE STATISTICS table_name; -- SQL Server -- 2. Force index hint (temporary workaround) -- PostgreSQL SELECT /*+ IndexScan(table_name idx_name) */ ... -- MySQL SELECT ... FROM table_name FORCE INDEX (idx_name) ... -- SQL Server SELECT ... FROM table_name WITH (INDEX(idx_name)) ... -- 3. Disable specific optimization (PostgreSQL example) SET enable_seqscan = off; -- Force index usage ``` Would you like me to help investigate a specific query? If you share: - The database system you're using - The query that's slow - The EXPLAIN plan output I can provide more targeted diagnostics.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When a query suddenly slows down without data volume changes, the first thing to confirm is that the query plan actually changed by comparing EXPLAIN ANALYZE output. Most plan regressions are caused by stale statistics, so check mod_since_analyze and last_analyze timestamps next. After ruling out statistics issues, investigate whether indexes are being ignored, data distribution has shifted, or planner cost parameters need tuning.

1Capture and compare the current query plan
Run EXPLAIN (ANALYZE, BUFFERS) on the slow query to see the actual execution plan, estimated vs. actual row counts, and I/O patterns. If you have the old plan (from pg_stat_statements or logs), compare them side-by-side. Look for changed join orders, switched from index scan to sequential scan, or drastically wrong row estimates — these are smoking guns. The `postgresql-queries-shared-blks-read` metric will spike if the planner switched to sequential scans.
2Check for stale statistics on the involved tables
Query pg_stat_all_tables for the tables in your slow query and check `n_mod_since_analyze` and `last_analyze`. If you see tens of thousands of modifications since the last ANALYZE, or last_analyze is days/weeks old, the planner is working with outdated distribution data. This is the most common cause of plan regressions — the planner thinks a table has 1000 rows when it now has 1M, or vice versa. Run ANALYZE manually on those tables and re-test the query.
3Verify indexes are being used
Compare `postgresql.sequential_scans` vs `postgresql.index_rel_scans` for the tables in the query. If you see high sequential scans on tables with millions of rows, the planner is choosing table scans over indexes. Check if the query's WHERE clause can actually use your indexes — functional expressions, type mismatches, or OR conditions can prevent index usage. Also check if the index exists but statistics are stale (back to step 2).
4Look for data distribution changes
Even with fresh statistics, if your data distribution has fundamentally changed (e.g., most rows now fall into one value after a bulk load, or you deleted 90% of a table), the planner's cost model might now prefer different plans. Check the actual row counts returned in EXPLAIN ANALYZE against the estimates — if estimates are off by 10x or more, the planner can't make good choices. Look at `postgresql.rows` trends over time to see if data volume spiked or dropped suddenly.
5Review planner cost parameters
If statistics are fresh and indexes exist but the planner still chooses sequential scans, check random_page_cost, effective_cache_size, and seq_page_cost settings. On SSDs, random_page_cost should often be lowered to 1.1-1.5 (default is 4.0). If effective_cache_size is too low, the planner underestimates how much data fits in memory and favors sequential scans. Use EXPLAIN to see estimated costs, then tune parameters and re-EXPLAIN to see if the plan improves.
6Monitor query I/O patterns over time
Track `postgresql.queries.shared_blks_read` for this specific query over time — a sudden jump indicates the planner switched from an index (reading few blocks) to a sequential scan (reading the entire table). Correlate this with `hits` vs `reads` ratio in pg_stat_database to see if the regression coincides with buffer pool pressure or a stats reset. If `stats_reset` is recent, you may have lost historical statistics that helped the planner make better choices.

Technologies

Related Insights

statistics out of date
Stub insight created from scenario discovery hint: statistics out of date
index not being used
Stub insight created from scenario discovery hint: index not being used
cost parameter tuning needed
Stub insight created from scenario discovery hint: cost parameter tuning needed
query plan regression
diagnostic_step
Stub insight created from scenario discovery hint: query plan regression
Stale statistics cause inefficient query plans with exponentially worse performance
prerequisitewarning
data distribution change affecting estimates
prerequisite
Stub insight created from scenario discovery hint: data distribution change affecting estimates
planner choosing suboptimal plan
outcome
Stub insight created from scenario discovery hint: planner choosing suboptimal plan
need for ANALYZE
supporting_evidence
Stub insight created from scenario discovery hint: need for ANALYZE
join order changed
supporting_evidence
Stub insight created from scenario discovery hint: join order changed

Relevant Metrics

Monitoring Interfaces

PostgreSQL Datadog
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Datadog Integration