Redis CPU Saturation and Slowlog Spike

criticalIncident Response

Redis EngineCPU hitting 100% with slow queries accumulating in the slowlog, causing request latency spikes and timeouts.

Prompt: Redis EngineCPU is pegged at 100% and the slowlog is filling up with queries taking over 500ms - help me identify which commands are blocking the single-threaded engine and whether I need to optimize queries or scale vertically.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When Redis CPU hits 100% with slowlog accumulation, start by examining the slowlog itself to identify which specific commands are blocking the single-threaded engine. Then trace those commands back to their CPU consumption patterns to distinguish between expensive O(N) operations, cache deletion table scans, and high-frequency moderate-cost commands. Finally, determine whether you need query optimization or vertical scaling by comparing per-command costs against overall throughput.

1Examine slowlog entries for command patterns
Start with `redis.slowlog.length` and `redis.slowlog.micros.95percentile` to understand the severity of the backlog. If slowlog length is growing and p95 latency is above 500ms, query the slowlog directly to see which specific commands (KEYS, SMEMBERS, HGETALL, etc.) are appearing most frequently. This is your most direct view into what's blocking the single-threaded Redis engine right now.
2Identify O(N) operations consuming CPU time
Check `redis.commands.usec` per command to find total CPU time consumed, and compare it with `redis.commands.usec_per_call` to understand per-call cost. Commands like KEYS, SMEMBERS, HGETALL, and unbounded LRANGE operations have O(N) complexity and will show both high total microseconds and high per-call averages. These are your primary optimization targets—replace KEYS with SCAN, use SSCAN/HSCAN instead of full-set operations, and limit LRANGE with reasonable start/stop values.
3Check for cache deletion operations causing table scans
Look for DEL or pattern-based deletion commands in `redis.commands.calls` that correlate with CPU spikes. Cache key deletions force Redis to scan the entire key table to find matching keys, which can push CPU utilization above 83% during normal traffic. If deletion operations are frequent, this is often the primary culprit and requires moving to expiration-based invalidation strategies instead.
4Compare command frequency against per-call cost
Cross-reference `redis.commands.calls` with `redis.commands.usec` and `redis.commands.usec_per_call` to find high-frequency commands. A command taking 10ms per call but executed 1000 times per second (10 seconds of CPU time) is far more damaging than a 100ms command executed once per minute. Focus optimization efforts on commands with high total CPU time, not just high per-call latency.
5Assess whether connection pool exhaustion is amplifying the issue
If `redis.slowlog.length` is climbing while slow operations accumulate, check if these long-running commands are exhausting connection pools and blocking application threads. This creates a cascading failure where even healthy Redis operations appear slow because clients are queued waiting for connections. If this pattern exists alongside persistence operations (RDB snapshots), the issue may be I/O-bound rather than CPU-bound, requiring investigation into disk performance.
6Evaluate overall throughput to determine scaling needs
Finally, check `redis.stats.instantaneous_ops_per_sec` to understand if the workload itself has simply outgrown your Redis instance. If you've optimized away expensive O(N) operations and cache deletions but CPU is still saturated at high but reasonable ops/sec, you likely need vertical scaling (larger instance) or horizontal scaling (sharding/clustering). Compare your current ops/sec against your instance's documented capacity to make this determination.

Technologies

Related Insights

Command Latency Spikes from Expensive O(N) Operations
warning
Certain Redis commands have O(N) complexity (KEYS, SMEMBERS, HGETALL, LRANGE without limit) and can cause latency spikes when executed on large data structures. redis.commands.usec tracking per command identifies hot spots.
Slow Query Backlog Masks Redis Connection Pool Exhaustion
warning
Redis slowlog entries accumulating (redis.slowlog.length rising) can indicate operations blocking on network or disk I/O, exhausting connection pools and causing cascading failures in dependent services even when Redis CPU appears healthy.
Redis CPU utilization spikes during cache key deletions
critical
Serial Execution Masking Redis Cache Effectiveness
warning
Event loop blocking creates false appearance of cache ineffectiveness - Redis cache hits are fast individually, but serial request processing prevents concurrent cache lookups from improving overall throughput during traffic bursts.

Relevant Metrics

Monitoring Interfaces

Redis Datadog
Redis Prometheus
Redis Native Metrics