Redis Persistence Strategy Selection - RDB vs AOF

warningProactive Health

Choosing appropriate persistence mechanism (RDB snapshots, AOF logging, or hybrid) based on durability requirements and performance constraints.

Prompt: We're currently using RDB snapshots every 5 minutes but just lost 3 minutes of writes during a crash - should we switch to AOF for better durability, enable hybrid persistence, or is there a way to tune RDB to reduce data loss without killing performance?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When evaluating Redis persistence strategies after data loss, start by understanding your current RDB performance baseline and actual data loss exposure, then assess whether your application and infrastructure can tolerate AOF's latency impact before making changes. Often, tuning RDB frequency or using hybrid persistence gives better durability without the synchronous write penalties that can block your application.

1Check current RDB snapshot performance and health
Before making any changes, look at `redis-persistence-rdb-last-bgsave-time-sec` and `redis-persistence-rdb-last-bgsave-status` to establish your baseline. If your RDB saves are already taking 10+ seconds or failing intermittently, adding AOF overhead will make things worse. You need to know if your disk I/O can handle the current workload before adding more synchronous writes.
2Quantify actual data loss exposure between snapshots
Check `redis-rdb-changes-since-last-save` to see how many write operations are at risk in your 5-minute window. If you're only seeing 1K-10K changes between saves, losing 3 minutes means losing a few thousand writes—now you can have a business conversation about whether that's acceptable. If it's millions of changes, the problem is more severe and justifies the complexity of AOF.
3Assess application latency tolerance for synchronous writes
This is critical: AOF with `appendfsync everysec` or `always` can block your application during disk writes. The insight on `redis-appendfsync-blocking-gunicorn-timeout` shows real cases where slow disk I/O caused 1+ second blocks, triggering worker timeouts and 502 errors. If you have strict latency SLAs (p99 < 100ms), AOF with `appendfsync everysec` may violate them during disk pressure, and `appendfsync no` defeats the durability purpose.
4Try tuning RDB frequency before switching persistence models
The simplest solution is often the best: change your RDB config from `save 300 1` (5 minutes) to `save 60 1` (1 minute) or even `save 30 1000` (30 seconds if 1000+ writes). Monitor `redis-persistence-rdb-last-bgsave-time-sec` to ensure snapshots complete within the interval. If your snapshots take 5 seconds and you save every minute, you're only ever at risk of losing 1 minute of data—much better than 5 minutes, with no AOF complexity.
5Evaluate disk I/O capacity if considering AOF
Before enabling AOF, understand the I/O cost: every write becomes a synchronous append to disk (with `appendfsync everysec`). The insight on `aof-persistence-latency-from-synchronous-disk-writes` warns that slow disk I/O amplifies as the AOF file grows between rewrites. Check if your disk has spare IOPS capacity—if you're already saturated, AOF will cause write latency spikes and potentially trigger the worker timeout issues described in the blocking insight.
6Consider hybrid persistence (RDB + AOF) as middle ground
Hybrid mode gives you RDB's fast restart times plus AOF's recent write durability. Enable AOF with `appendfsync everysec` and keep your RDB snapshots running—Redis uses the AOF for recovery but RDB provides a baseline. Tune `auto-aof-rewrite-percentage` and `auto-aof-rewrite-min-size` aggressively (e.g., 50% and 64mb) to keep the AOF small and minimize the disk I/O amplification described in the latency insight. Monitor `redis-persistence-aof-enabled` and both `redis-persistence-rdb-last-bgsave-status` to confirm both mechanisms are healthy.
7Validate recovery time objectives with your choice
Don't forget about recovery time—RDB-only gives you fast restarts (seconds to minutes), while AOF-only can take much longer to replay on large datasets. Monitor `redis-loading-dump-file` during test restores to measure actual recovery time. If your RTO is tight (< 5 minutes) and you have 10GB+ of data, pure AOF might miss your recovery window even if it prevents data loss. Hybrid mode often hits the sweet spot for both RPO and RTO.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

Redis Native Metrics
Redis Prometheus
Redis Datadog