Checkpoint Tuning for Write Performance

warningProactive Health

Frequent or long-running checkpoints are causing write performance issues and need tuning.

Prompt: “My PostgreSQL instance is experiencing periodic write latency spikes. I see these correlate with checkpoints happening every few minutes. How do I tune checkpoint settings to smooth out performance?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

724vs2,512

Latency

16.2smsvs1.1mms

Turns

2vs23

Tool Calls

0vs10

Schema Calls

0vs5

Response

1,791 charsvs2,933 chars

Evaluation

Without Schema

B+

With Schema

A-

The response without Schema provides solid, actionable guidance with specific SQL commands and covers the essential checkpoint tuning parameters. However, the response with Schema offers significantly more depth and structure, including additional parameters like min_wal_size, clearer explanations of trade-offs, more comprehensive monitoring guidance with pg_stat_bgwriter metrics, and explicit warnings about restart requirements. The with-Schema response also better contextualizes the problem by explaining the mechanism of checkpoint-related I/O spikes and provides clearer success criteria (timed vs requested checkpoints). Both responses are technically accurate, but the with-Schema version would more reliably guide an operator to a complete solution.

Without Schema

I'll help you tune PostgreSQL checkpoints to reduce those write latency spikes. The issue is that checkpoints cause a burst of writes to disk, and when they happen too frequently, you get periodic performance degradation.

## Key Parameters to Tune

**1. Increase `max_wal_size`** (most important)
```sql
-- Current setting (check yours first)
SHOW max_wal_size;

-- Recommended: increase significantly (e.g., 2-4GB or more)
ALTER SYSTEM SET max_wal_size = '4GB';
```
This lets PostgreSQL accumulate more WAL before triggering a checkpoint, reducing checkpoint frequency.

**2. Adjust `checkpoint_completion_target`**
```sql
-- Should be 0.9 to spread checkpoint I/O over 90% of interval
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
```
This spreads checkpoint writes over a longer period instead of a sharp spike.

**3. Increase `checkpoint_timeout`** (if needed)
```sql
-- Default is 5min, consider 10-15min for write-heavy workloads
ALTER SYSTEM SET checkpoint_timeout = '15min';
```

**4. Monitor with `checkpoint_warning`**
```sql
-- Warns if checkpoints happen more frequently than this
ALTER SYSTEM SET checkpoint_warning = '30s';
```

## Apply Changes
```sql
SELECT pg_reload_conf();  -- Reload config
-- Or restart PostgreSQL for all changes to take effect
```

## Verify Improvement
Check your PostgreSQL logs for checkpoint statistics:
```
LOG:  checkpoint complete: wrote X buffers (Y%), removed Z WAL files, ...
```

You want to see:
- Fewer checkpoints per hour
- More WAL files removed per checkpoint
- Lower "avg write" time

The combination of larger `max_wal_size` + high `checkpoint_completion_target` smooths out the I/O pattern significantly. Start with these settings and monitor - you may need to increase `max_wal_size` further depending on your write volume.

With Schema

10 tool calls5 Schema

I'll help you tune PostgreSQL checkpoints to reduce those write latency spikes. Let me get the expert guidance on this.
Based on your description of write latency spikes correlating with frequent checkpoints, here's how to tune PostgreSQL checkpoint settings to smooth out performance:

## Understanding the Problem

Checkpoints cause write spikes because PostgreSQL flushes all dirty pages from shared buffers to disk. When checkpoints happen too frequently or complete too quickly, you get concentrated I/O bursts instead of smooth, distributed writes.

## Key Configuration Parameters

### 1. **Increase `max_wal_size`** (most important)
```sql
-- Default is often 1GB, increase based on your write volume
max_wal_size = 4GB  -- or higher (8GB, 16GB for heavy writes)
```
- Controls how much WAL can accumulate before triggering a checkpoint
- Larger values = less frequent checkpoints
- Trade-off: longer recovery time after a crash

### 2. **Increase `checkpoint_timeout`**
```sql
-- Default is 5 minutes
checkpoint_timeout = 15min  -- or 30min
```
- Maximum time between checkpoints
- Prevents checkpoints from happening too frequently even with low write volume
- Trade-off: more WAL to replay on recovery

### 3. **Tune `checkpoint_completion_target`**
```sql
-- Default is 0.9 (use 90% of interval)
checkpoint_completion_target = 0.9
```
- Spreads checkpoint writes over this fraction of the checkpoint interval
- Keep at 0.9 to maximize spread of I/O
- Lower values = more concentrated I/O burst

### 4. **Adjust WAL buffer settings**
```sql
min_wal_size = 2GB
wal_buffers = 16MB  -- usually auto-tuned well
```

## Monitor Checkpoint Activity

Check your PostgreSQL logs with:
```sql
-- Enable checkpoint logging
log_checkpoints = on
```

Look for messages like:
```
LOG: checkpoint complete: wrote 12500 buffers (6.1%); 
     0 WAL file(s) added, 0 removed, 3 recycled; 
     write=45.123 s, sync=0.234 s, total=45.456 s
```

**Red flags:**
- Checkpoints happening every few minutes → increase `max_wal_size`
- "checkpoints are occurring too frequently" → increase `max_wal_size` or `checkpoint_timeout`
- Very short write times → increase `checkpoint_completion_target`

## Recommended Starting Point

For a write-heavy workload:
```sql
max_wal_size = 8GB
min_wal_size = 2GB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
log_checkpoints = on
```

## Verify Improvements

After tuning, monitor:
```sql
SELECT * FROM pg_stat_bgwriter;
```

Key metrics:
- `checkpoints_timed` vs `checkpoints_req` - you want more timed than requested
- `buffers_checkpoint` - total buffers written during checkpoints

The goal is to have checkpoints driven by `checkpoint_timeout` rather than `max_wal_size` being exceeded, indicating you've spread the I/O smoothly.

**Important**: Changes to these parameters require a PostgreSQL restart (except `log_checkpoints`). Test incrementally and monitor disk space for WAL files.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When diagnosing checkpoint-related write latency spikes, start by checking whether checkpoints are being triggered by time or by WAL size — requested vs timed checkpoint ratios tell the story. Then measure checkpoint frequency and WAL generation rate to size max_wal_size appropriately, and finally tune checkpoint_completion_target to spread I/O evenly instead of creating bursts.

1Check requested vs timed checkpoint ratio

First thing I check is `stat-checkpointer-count-requested` vs `stat-checkpointer-count-timed` in pg_stat_bgwriter. If requested checkpoints significantly exceed timed checkpoints, your max_wal_size is too small and WAL growth is forcing checkpoints before checkpoint_timeout expires. This is the most common cause of frequent checkpoint spikes. A healthy ratio is <10% requested.

Checkpoint frequency exceeds sustainable rate Undersized max_wal_size triggers frequent checkpoints and I/O spikes checkpointer_num_requestedcheckpointer_num_timed

2Measure actual checkpoint frequency

Calculate the time interval between checkpoints by tracking `checkpoint-timed` over a window. If checkpoints are happening every few minutes (< 5 minutes sustained), that's too frequent and explains your latency spikes. Enable log_checkpoints in postgresql.conf and watch the logs to see the exact interval and whether they're completing before the next one starts.

checkpoint frequency too high Checkpoint frequency exceeds sustainable rate checkpoint_timed

3Analyze WAL generation rate vs max_wal_size

Look at `wal-size` (WAL bytes generated) over your checkpoint interval. If you're generating more than max_wal_size worth of WAL between timed checkpoints, you'll trigger requested checkpoints. For write-heavy OLTP workloads, I typically set max_wal_size to 4GB; for bulk-loading or OLAP, 16GB or higher. The WAL generation rate directly determines how large max_wal_size needs to be.

WAL generation rate exceeding checkpoint capacity max_wal_size too small Undersized max_wal_size triggers frequent checkpoints and I/O spikes wal_bytespostgresql.wal_size

4Check checkpoint I/O duration and spreading

Review `checkpoint-write-time` and `checkpoint-sync-time` to see how long checkpoints take and whether I/O is concentrated in bursts. If these are high and spiky rather than smooth, checkpoint_completion_target is too low (default 0.5). Set it to 0.9 to spread checkpoint writes over 90% of the checkpoint interval, which prevents sudden I/O storms that cause your latency spikes.

Checkpoint spikes cause I/O contention and query latency checkpoint_completion_target needs tuning Checkpoint frequency and duration impact write performance checkpoint_write_timecheckpoint_sync_timecheckpointer_write_timecheckpointer_sync_time

5Review buffer writes during checkpoints

Check `buffer-checkpoint` to see the volume of buffers being flushed per checkpoint. High buffer writes concentrated in short checkpoint windows create the I/O contention you're seeing. This confirms the need to both reduce checkpoint frequency (larger max_wal_size, longer checkpoint_timeout) and spread the I/O (higher checkpoint_completion_target).

Checkpoint frequency and duration impact write performance Checkpoint spikes cause I/O contention and query latency buffers_checkpoint

6Tune checkpoint parameters based on workload

Based on your findings, set checkpoint_timeout=15min, max_wal_size=4GB (OLTP) or 16GB (OLAP), and checkpoint_completion_target=0.9 in postgresql.conf, then reload. This spreads checkpoint I/O over 13.5 minutes instead of concentrating it, and ensures checkpoints are triggered by time rather than WAL size exhaustion. Monitor the metrics above for 1-2 hours to verify latency spikes are smoothed out.

Frequent checkpoints cause performance spikes during write-heavy workloads need for checkpoint parameter optimization checkpoint tuning needed checkpoint_write_timecheckpointer_num_requestedcheckpointer_num_timed