High Latency Spikes from Slow Commands
warningIncident Response
Redis experiencing intermittent latency spikes due to slow commands, blocking operations, or persistence overhead, degrading application performance.
Prompt: “We're seeing Redis latency spike to 500ms+ randomly throughout the day, causing timeouts in our application. Our P99 latency is normally under 10ms. I need to figure out what's causing these spikes - is it slow commands, memory issues, or something with persistence? What should I look at in SLOWLOG and what other metrics correlate with these latency events?”
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
When investigating Redis latency spikes jumping from <10ms to 500ms+, start by examining the SLOWLOG to identify which commands are blocking the event loop. Then determine if the culprits are expensive O(N) operations on large datasets, persistence overhead from AOF/RDB, or a combination creating cascading connection pool exhaustion. Correlate command frequency patterns with spike timing to understand whether it's occasional heavy operations or sustained high-frequency calls.
1Check the Redis SLOWLOG for command patterns
Start by examining `redis-slowlog-length` and `redis-slowlog-micros-95percentile` to confirm that slowlog entries correlate with your 500ms+ latency spikes. If the 95th percentile is approaching or exceeding 500,000 microseconds during spike windows, you've confirmed slow commands are blocking Redis's single-threaded event loop. Query the actual slowlog entries (SLOWLOG GET 100) to see which specific commands, key patterns, and argument sizes are triggering the delays—this gives you the direct evidence of what's blocking.
2Identify expensive O(N) operations on large datasets
Examine `redis-commands-usec` and `redis-commands-usec-per-call` to find which commands are consuming the most CPU time. The `command-latency-spikes-from-expensive-operations` insight shows that KEYS, SMEMBERS, HGETALL, and unbounded LRANGE/ZRANGE are common culprits—these have O(N) complexity and block Redis when operating on large collections. If you see KEYS with thousands of microseconds per call or SMEMBERS on sets with 10K+ members, you've found your latency source. Check `redis-command-calls` to see if these expensive operations are being called frequently enough to explain your intermittent spikes.
3Examine AOF and RDB persistence blocking
Check if `redis-persistence-aof-last-rewrite-time-sec` or `redis-persistence-rdb-last-bgsave-time-sec` show multi-second durations that align with your latency spike timing. The `aof-persistence-latency-from-synchronous-disk-writes` insight explains that appendfsync=everysec can cause write latency when disk I/O is slow, especially as the AOF file grows between rewrites. The `redis-appendfsync-blocking-gunicorn-timeout` pattern shows this can block writes for over 1 second on caches of several hundred MB. If persistence durations exceed 1-2 seconds and correlate with your spikes, consider switching from appendfsync=everysec to appendfsync=no or tuning auto-aof-rewrite-percentage to trigger more frequent AOF rewrites.
4Monitor for connection pool exhaustion from blocked operations
When `redis-slowlog-length` trends upward during latency events, slow operations can hold client connections and exhaust your application's connection pool—even when Redis CPU appears healthy. The `slow-query-backlog-masks-redis-connection-pool-exhaustion` insight describes this cascade failure pattern. If you're seeing application-side connection timeout errors ("unable to acquire Redis connection") concurrent with Redis latency spikes, your slow operations are blocking application threads and starving the connection pool. This is a secondary effect that amplifies the impact of the slow commands you identified in steps 1-2.
5Correlate command frequency patterns with spike timing
Use `redis-command-calls` alongside `redis-commands-usec` to distinguish between occasionally-slow commands and high-frequency moderate-latency commands that saturate Redis. A command averaging 50ms but called 200 times per second contributes 10 seconds of blocking per second (impossible on single-threaded Redis, so it queues), versus a 500ms command called once per minute. Look for spikes in call frequency during your latency windows—if HGETALL calls jump 10x during certain application workflows, that explains the intermittent nature of your latency spikes.
Technologies
Related Insights
Command Latency Spikes from Expensive O(N) Operations
warning
Certain Redis commands have O(N) complexity (KEYS, SMEMBERS, HGETALL, LRANGE without limit) and can cause latency spikes when executed on large data structures. redis.commands.usec tracking per command identifies hot spots.
AOF Persistence Latency from Synchronous Disk Writes
warning
When AOF persistence is enabled with appendfsync always or everysec, slow disk I/O can cause write latency spikes. redis.persistence.aof_last_rewrite_time_sec increasing significantly indicates AOF file growth without rewrite, amplifying disk I/O overhead.
Slow Query Backlog Masks Redis Connection Pool Exhaustion
warning
Redis slowlog entries accumulating (redis.slowlog.length rising) can indicate operations blocking on network or disk I/O, exhausting connection pools and causing cascading failures in dependent services even when Redis CPU appears healthy.
Redis appendfsync blocking causes Gunicorn worker timeout
critical
Relevant Metrics
Monitoring Interfaces
Redis Native Metrics