Cache hit ratio degradation

warningProactive Health

PostgreSQL serving too many queries from disk instead of shared_buffers cache, indicating memory pressure or inefficient queries

Prompt: My PostgreSQL cache hit ratio has dropped from 99% to 85% over the past week. Is this a sign I need to increase shared_buffers, or could it be caused by new queries doing full table scans?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,310vs1,296
Latency
24.9smsvs25.2sms
Turns
6vs5
Tool Calls
2vs1
Schema Calls
0vs0
Response
2,728 charsvs2,996 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
Both responses provide technically accurate diagnostic SQL queries to investigate the cache hit ratio drop. The response with Schema includes additional queries to compare working set size against shared_buffers configuration, which is a crucial diagnostic step missing from the response without Schema. More significantly, the response with Schema makes a specific judgment that the sudden drop is more likely caused by full table scans rather than insufficient memory, providing concrete reasoning based on the pattern of change. The response without Schema provides solid diagnostic queries and interpretation guidance but stops short of making this diagnostic judgment, requiring more interpretation from the operator. Overall, the response with Schema offers a more complete diagnostic path with clearer next steps.
Without Schema
2 tool calls
I'll help you diagnose this cache hit ratio drop. Let me first check what PostgreSQL-related files or data we have in your current directory. A drop from 99% to 85% cache hit ratio is significant and worth investigating. This could be either issue, but there are specific ways to diagnose which. Here's how to investigate: ## Quick Diagnostic Queries Run these queries to understand what's happening: **1. Check for sequential scans on large tables:** ```sql SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, seq_tup_read / NULLIF(seq_scan, 0) as avg_seq_tuples, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 20; ``` **2. Check cache hit ratio by table:** ```sql SELECT schemaname, tablename, heap_blks_hit, heap_blks_read, CASE WHEN (heap_blks_hit + heap_blks_read) > 0 THEN round(100.0 * heap_blks_hit / (heap_blks_hit + heap_blks_read), 2) ELSE NULL END as cache_hit_ratio FROM pg_statio_user_tables WHERE (heap_blks_hit + heap_blks_read) > 0 ORDER BY (heap_blks_hit + heap_blks_read) DESC LIMIT 20; ``` **3. Identify recent query patterns:** ```sql SELECT query, calls, total_exec_time, mean_exec_time, shared_blks_hit, shared_blks_read, CASE WHEN (shared_blks_hit + shared_blks_read) > 0 THEN round(100.0 * shared_blks_hit / (shared_blks_hit + shared_blks_read), 2) ELSE NULL END as query_cache_hit_ratio FROM pg_stat_statements ORDER BY (shared_blks_hit + shared_blks_read) DESC LIMIT 20; ``` *(Requires `pg_stat_statements` extension)* ## What to Look For **Signs it's full table scans:** - High `seq_scan` counts on large tables - Queries with low cache hit ratios in `pg_stat_statements` - Missing indexes on frequently queried columns - New queries or data growth making indexes ineffective **Signs you need more shared_buffers:** - All queries show decreased cache hit ratios uniformly - Working set size has grown beyond current `shared_buffers` - No obvious problematic queries - Database size increased significantly ## Recommendations 1. **If it's full table scans**: Adding indexes or optimizing queries will be more effective than increasing memory 2. **If it's memory**: Only increase `shared_buffers` if your working set actually needs it. Rule of thumb: 25% of RAM for dedicated DB servers, 10-15% for shared servers 3. **Check for data growth**: Has your database size increased recently? Compare with your current `shared_buffers` setting Would you like me to help you analyze specific query results or create a monitoring script to track this over time?
With Schema
1 tool calls
I'll help you diagnose this PostgreSQL cache hit ratio drop. Let me first check what files you have available that might give us context. A drop from 99% to 85% cache hit ratio is significant and needs investigation. Both scenarios you mentioned are possible, but they have different indicators. Let me help you diagnose this: ## Key Diagnostic Queries **1. Check current cache hit ratio by database:** ```sql SELECT datname, blks_hit::float / (blks_hit + blks_read)::float * 100 AS cache_hit_ratio, blks_hit, blks_read FROM pg_stat_database WHERE blks_read > 0 ORDER BY cache_hit_ratio; ``` **2. Identify tables with low cache hit ratios:** ```sql SELECT schemaname, relname, heap_blks_hit::float / NULLIF(heap_blks_hit + heap_blks_read, 0)::float * 100 AS cache_hit_ratio, seq_scan, seq_tup_read, idx_scan, pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) AS size FROM pg_statio_user_tables WHERE heap_blks_read > 0 ORDER BY heap_blks_hit::float / NULLIF(heap_blks_hit + heap_blks_read, 0)::float LIMIT 20; ``` **3. Find queries doing sequential scans on large tables:** ```sql SELECT schemaname, relname, seq_scan, seq_tup_read, seq_tup_read / NULLIF(seq_scan, 0) AS avg_seq_tup_read, idx_scan, pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) AS table_size FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 20; ``` ## Distinguishing the Two Scenarios **Full table scans (more likely):** - You'll see high `seq_scan` and `seq_tup_read` values - Specific tables show up with poor cache ratios - The problem is localized to certain tables/queries - **Fix:** Add missing indexes, optimize queries **Insufficient shared_buffers:** - Cache hit ratio drops uniformly across all tables - Working set size > shared_buffers - You've recently added significant data volume - **Fix:** Increase shared_buffers (but carefully) ## Quick Checks **Compare your working set to shared_buffers:** ```sql -- Current shared_buffers setting SHOW shared_buffers; -- Active data being accessed SELECT pg_size_pretty(sum(pg_total_relation_size(schemaname||'.'||relname))) FROM pg_stat_user_tables WHERE last_seq_scan > NOW() - INTERVAL '7 days' OR last_idx_scan > NOW() - INTERVAL '7 days'; ``` ## Most Likely Cause A sudden drop like this (99% → 85% over a week) is **more likely to be full table scans** from: - New queries missing indexes - Changed query patterns - Reports/analytics queries added recently - Data growth making existing indexes less effective Increasing shared_buffers usually gives gradual improvements, not sudden drops when undersized. ## Next Steps 1. Run the sequential scan query above to identify culprits 2. Check `pg_stat_statements` for slow queries (if enabled) 3. Look at `EXPLAIN ANALYZE` for your most frequent queries 4. Consider: Did anything change in your application code this week? Would you like me to create a diagnostic script to run these checks and analyze the results?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When investigating cache hit ratio degradation in PostgreSQL, start by quantifying the current hit ratio across all databases and identifying which specific tables are experiencing the most cache misses. Then determine whether the root cause is new query patterns doing full table scans (fixable with indexes) or genuine memory pressure requiring shared_buffers increases — these have very different remediation paths.

1Calculate current cache hit ratio and trend
First, verify the exact cache hit ratio using `block-hit` (blks_hit) and `postgresql-blocks-read` (blks_read): calculate as blks_hit / (blks_hit + blks_read) * 100. The target is >99% — anything below 95% is concerning. Check this across all databases in your cluster using pg_stat_database. If you're seeing 85%, that's significant disk I/O overhead. The insight on cache-hit-ratio-below-99-percent explains that this indicates shared_buffers may be undersized for your working set, but we need to confirm whether it's a sizing issue or a query pattern issue.
2Identify tables with poor cache performance
Query pg_statio_user_tables to calculate per-table cache hit ratios using `heap-block-hit` (heap_blks_hit) and `heap-block-read` (heap_blks_read). Tables with high heap_blks_read relative to heap_blks_hit are your primary offenders — these are forcing PostgreSQL to disk repeatedly. If you see 1-2 large tables with terrible hit ratios while others are >99%, you've likely identified new or changed queries hitting those tables. The low-buffer-hit-ratio insight recommends monitoring per-table hit ratios to pinpoint which tables are evicting cache.
3Check for increased sequential scans on large tables
Compare `postgresql-heap-blocks-read` to `postgresql-index-blocks-read` on your problem tables. If heap blocks read is disproportionately high, you're likely doing full table scans instead of index scans — this is the "new queries doing full table scans" scenario you asked about. Query pg_stat_user_tables and check seq_scan vs idx_scan counts. If seq_scan is high on tables with >1M rows and those tables show poor cache hit ratios, missing indexes are your primary issue, not memory. The cache-hit-ratio-excessive-disk-io insight notes that cache misses often indicate queries reading from disk instead of using efficient index access patterns.
4Analyze heap vs index cache hit patterns
Separately calculate cache hit ratios for heap blocks (`postgresql-heap-blocks-hit` / total heap blocks) versus index blocks (`postgresql-index-blocks-hit` / total index blocks). If heap cache is poor but index cache is fine, you're reading too much table data (possibly missing covering indexes). If index cache is poor, you might have missing indexes entirely or non-selective indexes being scanned. This distinction tells you whether to focus on query optimization (heap issues) or index creation (index issues).
5Review shared_buffers configuration against working set
Check your current shared_buffers setting in postgresql.conf — the default 128MB is far too low for production. The cache-hit-ratio-low-memory and insufficient-shared-buffers-cache insights both recommend setting shared_buffers to 25% of total RAM (up to 40% on dedicated database servers). If you have 16GB RAM but only 128MB shared_buffers, you're bottlenecked on memory regardless of query patterns. However, if you're already at 25% of RAM and still seeing high `postgresql-disk-read`, then query optimization is likely more important than increasing memory further.
6Investigate recent workload or data volume changes
Look at trends in `postgresql-blocks-read` and total `block-hit` over the past two weeks to correlate with when the hit ratio dropped. Did you deploy new features, run large data imports, or see organic growth that pushed your working set beyond available cache? Check pg_stat_database for changes in database size and transaction patterns. If your working set grew from 3GB to 6GB but shared_buffers is still 4GB, you've outgrown your cache configuration through natural growth. The buffer-cache-hit-ratio-low insight notes that working sets exceeding available cache cause progressive degradation.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

PostgreSQL Prometheus
PostgreSQL Datadog
PostgreSQL Native
PostgreSQL CloudWatch Metrics
PostgreSQL OpenTelemetry Receiver