ElastiCache Redis causes stale clouddriver account configuration and pipeline failures

critical

configurationUpdated Mar 24, 2026

Sources

Random failures in pipeline executions · Issue #2855 - GitHubgithub.com

Technologies:

RedisThe root cause of this issue originates in Redis

How to detect:

After migrating from hal-created Redis pod to AWS ElastiCache Redis (r4.large with 1 read replica), random deployManifest failures occur due to inconsistent account configuration across clouddriver nodes. The hal-created Redis in Kubernetes performs better and provides consistent configuration.

Recommended action:

Revert to hal-created Redis pod deployed in Kubernetes cluster instead of using ElastiCache. If ElastiCache must be used, investigate cache consistency settings, replication lag between primary and read replica, and network latency between Spinnaker services and ElastiCache. Review clouddriver logs for 'Unable to run agents' and 'internal server errors'. Consider using S3 for persistent storage with in-cluster Redis for caching.