Redis Persistence Strategy Selection - RDB vs AOF

warningProactive Health

Choosing appropriate persistence mechanism (RDB snapshots, AOF logging, or hybrid) based on durability requirements and performance constraints.

Prompt: “We're currently using RDB snapshots every 5 minutes but just lost 3 minutes of writes during a crash - should we switch to AOF for better durability, enable hybrid persistence, or is there a way to tune RDB to reduce data loss without killing performance?”

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When evaluating Redis persistence strategies after data loss, start by understanding your current RDB performance baseline and actual data loss exposure, then assess whether your application and infrastructure can tolerate AOF's latency impact before making changes. Often, tuning RDB frequency or using hybrid persistence gives better durability without the synchronous write penalties that can block your application.

1Check current RDB snapshot performance and health

Before making any changes, look at `redis-persistence-rdb-last-bgsave-time-sec` and `redis-persistence-rdb-last-bgsave-status` to establish your baseline. If your RDB saves are already taking 10+ seconds or failing intermittently, adding AOF overhead will make things worse. You need to know if your disk I/O can handle the current workload before adding more synchronous writes.

redis.persistence.rdb_last_bgsave_time_secredis.persistence.rdb_last_bgsave_status

2Quantify actual data loss exposure between snapshots

Check `redis-rdb-changes-since-last-save` to see how many write operations are at risk in your 5-minute window. If you're only seeing 1K-10K changes between saves, losing 3 minutes means losing a few thousand writes—now you can have a business conversation about whether that's acceptable. If it's millions of changes, the problem is more severe and justifies the complexity of AOF.

redis.rdb.changes_since_last_save

3Assess application latency tolerance for synchronous writes

This is critical: AOF with `appendfsync everysec` or `always` can block your application during disk writes. The insight on `redis-appendfsync-blocking-gunicorn-timeout` shows real cases where slow disk I/O caused 1+ second blocks, triggering worker timeouts and 502 errors. If you have strict latency SLAs (p99 < 100ms), AOF with `appendfsync everysec` may violate them during disk pressure, and `appendfsync no` defeats the durability purpose.

Redis appendfsync blocking causes Gunicorn worker timeout AOF Persistence Latency from Synchronous Disk Writes

4Try tuning RDB frequency before switching persistence models

The simplest solution is often the best: change your RDB config from `save 300 1` (5 minutes) to `save 60 1` (1 minute) or even `save 30 1000` (30 seconds if 1000+ writes). Monitor `redis-persistence-rdb-last-bgsave-time-sec` to ensure snapshots complete within the interval. If your snapshots take 5 seconds and you save every minute, you're only ever at risk of losing 1 minute of data—much better than 5 minutes, with no AOF complexity.

redis.persistence.rdb_last_bgsave_time_sec

5Evaluate disk I/O capacity if considering AOF

Before enabling AOF, understand the I/O cost: every write becomes a synchronous append to disk (with `appendfsync everysec`). The insight on `aof-persistence-latency-from-synchronous-disk-writes` warns that slow disk I/O amplifies as the AOF file grows between rewrites. Check if your disk has spare IOPS capacity—if you're already saturated, AOF will cause write latency spikes and potentially trigger the worker timeout issues described in the blocking insight.

AOF Persistence Latency from Synchronous Disk Writes redis.persistence.aof_last_rewrite_time_secredis.persistence.aof_current_size

6Consider hybrid persistence (RDB + AOF) as middle ground

Hybrid mode gives you RDB's fast restart times plus AOF's recent write durability. Enable AOF with `appendfsync everysec` and keep your RDB snapshots running—Redis uses the AOF for recovery but RDB provides a baseline. Tune `auto-aof-rewrite-percentage` and `auto-aof-rewrite-min-size` aggressively (e.g., 50% and 64mb) to keep the AOF small and minimize the disk I/O amplification described in the latency insight. Monitor `redis-persistence-aof-enabled` and both `redis-persistence-rdb-last-bgsave-status` to confirm both mechanisms are healthy.

AOF Persistence Latency from Synchronous Disk Writes redis.persistence.aof_enabledredis.persistence.rdb_last_bgsave_statusredis.persistence.aof_current_size

7Validate recovery time objectives with your choice

Don't forget about recovery time—RDB-only gives you fast restarts (seconds to minutes), while AOF-only can take much longer to replay on large datasets. Monitor `redis-loading-dump-file` during test restores to measure actual recovery time. If your RTO is tight (< 5 minutes) and you have 10GB+ of data, pure AOF might miss your recovery window even if it prevents data loss. Hybrid mode often hits the sweet spot for both RPO and RTO.

redis_loading_dump_file