Premature caching agent rescheduling due to timeout
warningperformanceUpdated Oct 6, 2025(via Exa)
How to detect:
Clouddriver's caching agents are prematurely rescheduled when they take longer than the default 300-second (5-minute) timeout to complete their cache cycles. This causes duplicate agent runs, increased load on cloud provider APIs, and inefficient resource utilization without actual agent failures.
Recommended action:
Increase redis.poll.timeoutSeconds in ~/.hal/$DEPLOYMENT/profiles/clouddriver-local.yml if monitoring shows agents consistently taking longer than 5 minutes to complete successfully. Adjust the timeout to exceed typical agent completion times while maintaining reasonable bounds for detecting truly hung agents.