Logs lost when Airflow worker dies without remote storage
criticalavailabilityUpdated Feb 22, 2024(via Exa)
Technologies:
How to detect:
When Airflow workers die or terminate unexpectedly, logs stored only as local files on the worker are permanently lost, preventing debugging of the failure that caused the worker to die.
Recommended action:
Configure remote log storage for Airflow workers. Set up S3 remote logging (or equivalent cloud storage) so logs persist even if workers die. Ensure logs are written to files and persisted to remote storage, not just printed to Syserr.