Replica Lag Hides Behind Stale Cache Hits
criticalRedis replica falling behind master (redis.replication.master_last_io_seconds_ago increasing) continues serving stale cached data with normal hit rates, masking synchronization issues until applications encounter data inconsistencies or master failover fails.
Alert when redis.replication.master_last_io_seconds_ago exceeds expected replication delay threshold (e.g., >30 seconds) on replica nodes, especially if redis.replication.master_link_down_since_seconds is non-zero. Monitor redis.replication.offset divergence between master and replicas. High redis.keyspace.hits on replicas may mask this issue.
Investigate network connectivity between master and replica. Check redis.replication.repl_backlog_size to ensure it's large enough to handle temporary disconnections. Verify master isn't overloaded (check redis.stats.instantaneous_ops_per_sec). Consider increasing replication timeout values. Implement application-level read-after-write consistency checks when serving from replicas.