Cache hit ratio degradation

warningProactive Health

PostgreSQL serving too many queries from disk instead of shared_buffers cache, indicating memory pressure or inefficient queries

Prompt: “My PostgreSQL cache hit ratio has dropped from 99% to 85% over the past week. Is this a sign I need to increase shared_buffers, or could it be caused by new queries doing full table scans?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

1,310vs1,296

Latency

24.9smsvs25.2sms

Turns

6vs5

Tool Calls

2vs1

Schema Calls

0vs0

Response

2,728 charsvs2,996 chars

Evaluation

Without Schema

B+

With Schema

A-

Both responses provide technically accurate diagnostic SQL queries to investigate the cache hit ratio drop. The response with Schema includes additional queries to compare working set size against shared_buffers configuration, which is a crucial diagnostic step missing from the response without Schema. More significantly, the response with Schema makes a specific judgment that the sudden drop is more likely caused by full table scans rather than insufficient memory, providing concrete reasoning based on the pattern of change. The response without Schema provides solid diagnostic queries and interpretation guidance but stops short of making this diagnostic judgment, requiring more interpretation from the operator. Overall, the response with Schema offers a more complete diagnostic path with clearer next steps.

Without Schema

2 tool calls

I'll help you diagnose this cache hit ratio drop. Let me first check what PostgreSQL-related files or data we have in your current directory.
A drop from 99% to 85% cache hit ratio is significant and worth investigating. This could be either issue, but there are specific ways to diagnose which. Here's how to investigate:

## Quick Diagnostic Queries

Run these queries to understand what's happening:

**1. Check for sequential scans on large tables:**
```sql
SELECT 
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    seq_tup_read / NULLIF(seq_scan, 0) as avg_seq_tuples,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
```

**2. Check cache hit ratio by table:**
```sql
SELECT 
    schemaname,
    tablename,
    heap_blks_hit,
    heap_blks_read,
    CASE WHEN (heap_blks_hit + heap_blks_read) > 0 
        THEN round(100.0 * heap_blks_hit / (heap_blks_hit + heap_blks_read), 2)
        ELSE NULL 
    END as cache_hit_ratio
FROM pg_statio_user_tables
WHERE (heap_blks_hit + heap_blks_read) > 0
ORDER BY (heap_blks_hit + heap_blks_read) DESC
LIMIT 20;
```

**3. Identify recent query patterns:**
```sql
SELECT 
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    shared_blks_hit,
    shared_blks_read,
    CASE WHEN (shared_blks_hit + shared_blks_read) > 0
        THEN round(100.0 * shared_blks_hit / (shared_blks_hit + shared_blks_read), 2)
        ELSE NULL
    END as query_cache_hit_ratio
FROM pg_stat_statements
ORDER BY (shared_blks_hit + shared_blks_read) DESC
LIMIT 20;
```
*(Requires `pg_stat_statements` extension)*

## What to Look For

**Signs it's full table scans:**
- High `seq_scan` counts on large tables
- Queries with low cache hit ratios in `pg_stat_statements`
- Missing indexes on frequently queried columns
- New queries or data growth making indexes ineffective

**Signs you need more shared_buffers:**
- All queries show decreased cache hit ratios uniformly
- Working set size has grown beyond current `shared_buffers`
- No obvious problematic queries
- Database size increased significantly

## Recommendations

1. **If it's full table scans**: Adding indexes or optimizing queries will be more effective than increasing memory
2. **If it's memory**: Only increase `shared_buffers` if your working set actually needs it. Rule of thumb: 25% of RAM for dedicated DB servers, 10-15% for shared servers
3. **Check for data growth**: Has your database size increased recently? Compare with your current `shared_buffers` setting

Would you like me to help you analyze specific query results or create a monitoring script to track this over time?

With Schema

1 tool calls

I'll help you diagnose this PostgreSQL cache hit ratio drop. Let me first check what files you have available that might give us context.
A drop from 99% to 85% cache hit ratio is significant and needs investigation. Both scenarios you mentioned are possible, but they have different indicators. Let me help you diagnose this:

## Key Diagnostic Queries

**1. Check current cache hit ratio by database:**
```sql
SELECT 
  datname,
  blks_hit::float / (blks_hit + blks_read)::float * 100 AS cache_hit_ratio,
  blks_hit, blks_read
FROM pg_stat_database
WHERE blks_read > 0
ORDER BY cache_hit_ratio;
```

**2. Identify tables with low cache hit ratios:**
```sql
SELECT 
  schemaname, relname,
  heap_blks_hit::float / NULLIF(heap_blks_hit + heap_blks_read, 0)::float * 100 AS cache_hit_ratio,
  seq_scan, seq_tup_read, idx_scan,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) AS size
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY heap_blks_hit::float / NULLIF(heap_blks_hit + heap_blks_read, 0)::float
LIMIT 20;
```

**3. Find queries doing sequential scans on large tables:**
```sql
SELECT 
  schemaname, relname, seq_scan, seq_tup_read,
  seq_tup_read / NULLIF(seq_scan, 0) AS avg_seq_tup_read,
  idx_scan,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
```

## Distinguishing the Two Scenarios

**Full table scans (more likely):**
- You'll see high `seq_scan` and `seq_tup_read` values
- Specific tables show up with poor cache ratios
- The problem is localized to certain tables/queries
- **Fix:** Add missing indexes, optimize queries

**Insufficient shared_buffers:**
- Cache hit ratio drops uniformly across all tables
- Working set size > shared_buffers
- You've recently added significant data volume
- **Fix:** Increase shared_buffers (but carefully)

## Quick Checks

**Compare your working set to shared_buffers:**
```sql
-- Current shared_buffers setting
SHOW shared_buffers;

-- Active data being accessed
SELECT pg_size_pretty(sum(pg_total_relation_size(schemaname||'.'||relname)))
FROM pg_stat_user_tables
WHERE last_seq_scan > NOW() - INTERVAL '7 days' 
   OR last_idx_scan > NOW() - INTERVAL '7 days';
```

## Most Likely Cause

A sudden drop like this (99% → 85% over a week) is **more likely to be full table scans** from:
- New queries missing indexes
- Changed query patterns
- Reports/analytics queries added recently
- Data growth making existing indexes less effective

Increasing shared_buffers usually gives gradual improvements, not sudden drops when undersized.

## Next Steps

1. Run the sequential scan query above to identify culprits
2. Check `pg_stat_statements` for slow queries (if enabled)
3. Look at `EXPLAIN ANALYZE` for your most frequent queries
4. Consider: Did anything change in your application code this week?

Would you like me to create a diagnostic script to run these checks and analyze the results?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When investigating cache hit ratio degradation in PostgreSQL, start by quantifying the current hit ratio across all databases and identifying which specific tables are experiencing the most cache misses. Then determine whether the root cause is new query patterns doing full table scans (fixable with indexes) or genuine memory pressure requiring shared_buffers increases — these have very different remediation paths.

1Calculate current cache hit ratio and trend

First, verify the exact cache hit ratio using `block-hit` (blks_hit) and `postgresql-blocks-read` (blks_read): calculate as blks_hit / (blks_hit + blks_read) * 100. The target is >99% — anything below 95% is concerning. Check this across all databases in your cluster using pg_stat_database. If you're seeing 85%, that's significant disk I/O overhead. The insight on cache-hit-ratio-below-99-percent explains that this indicates shared_buffers may be undersized for your working set, but we need to confirm whether it's a sizing issue or a query pattern issue.

Cache hit ratio below 99% indicates undersized shared_buffers or high disk I/O Low buffer hit ratio indicates insufficient shared_buffers or cache pressure blks_hitpostgresql.blocks_readpostgresql.blocks_hit

2Identify tables with poor cache performance

Query pg_statio_user_tables to calculate per-table cache hit ratios using `heap-block-hit` (heap_blks_hit) and `heap-block-read` (heap_blks_read). Tables with high heap_blks_read relative to heap_blks_hit are your primary offenders — these are forcing PostgreSQL to disk repeatedly. If you see 1-2 large tables with terrible hit ratios while others are >99%, you've likely identified new or changed queries hitting those tables. The low-buffer-hit-ratio insight recommends monitoring per-table hit ratios to pinpoint which tables are evicting cache.

Low buffer hit ratio indicates insufficient shared_buffers or cache pressure Buffer cache hit ratio below 95% causes excessive disk I/O heap_blks_hitheap_blks_readpostgresql.heap_blocks_hitpostgresql.heap_blocks_read

3Check for increased sequential scans on large tables

Compare `postgresql-heap-blocks-read` to `postgresql-index-blocks-read` on your problem tables. If heap blocks read is disproportionately high, you're likely doing full table scans instead of index scans — this is the "new queries doing full table scans" scenario you asked about. Query pg_stat_user_tables and check seq_scan vs idx_scan counts. If seq_scan is high on tables with >1M rows and those tables show poor cache hit ratios, missing indexes are your primary issue, not memory. The cache-hit-ratio-excessive-disk-io insight notes that cache misses often indicate queries reading from disk instead of using efficient index access patterns.

Cache hit ratio below optimal causes excessive disk I/O Cache hit ratio below 99% indicates undersized shared_buffers or high disk I/O postgresql.heap_blocks_readpostgresql.index_blocks_readheap_blks_readidx_blks_read

4Analyze heap vs index cache hit patterns

Separately calculate cache hit ratios for heap blocks (`postgresql-heap-blocks-hit` / total heap blocks) versus index blocks (`postgresql-index-blocks-hit` / total index blocks). If heap cache is poor but index cache is fine, you're reading too much table data (possibly missing covering indexes). If index cache is poor, you might have missing indexes entirely or non-selective indexes being scanned. This distinction tells you whether to focus on query optimization (heap issues) or index creation (index issues).

Buffer cache hit ratio below 95% causes excessive disk I/O Low buffer hit ratio indicates insufficient shared_buffers or cache pressure postgresql.heap_blocks_hitpostgresql.heap_blocks_readpostgresql.index_blocks_hitpostgresql.index_blocks_read

5Review shared_buffers configuration against working set

Check your current shared_buffers setting in postgresql.conf — the default 128MB is far too low for production. The cache-hit-ratio-low-memory and insufficient-shared-buffers-cache insights both recommend setting shared_buffers to 25% of total RAM (up to 40% on dedicated database servers). If you have 16GB RAM but only 128MB shared_buffers, you're bottlenecked on memory regardless of query patterns. However, if you're already at 25% of RAM and still seeing high `postgresql-disk-read`, then query optimization is likely more important than increasing memory further.

Cache hit ratio below 99% indicates insufficient shared_buffers memory Insufficient shared_buffers limits PostgreSQL cache effectiveness Insufficient shared_buffers causes excessive disk reads and slow queries disk_readpostgresql.blocks_read

6Investigate recent workload or data volume changes

Look at trends in `postgresql-blocks-read` and total `block-hit` over the past two weeks to correlate with when the hit ratio dropped. Did you deploy new features, run large data imports, or see organic growth that pushed your working set beyond available cache? Check pg_stat_database for changes in database size and transaction patterns. If your working set grew from 3GB to 6GB but shared_buffers is still 4GB, you've outgrown your cache configuration through natural growth. The buffer-cache-hit-ratio-low insight notes that working sets exceeding available cache cause progressive degradation.

Buffer cache hit ratio below 95% causes excessive disk I/O Cache hit ratio below 99% indicates undersized shared_buffers or high disk I/O postgresql.blocks_readblks_hitbuffer_hit