MemoryError in watch_logs thread causes worker to hang
criticalResource ContentionUpdated Sep 13, 2023(via Exa)
Technologies:
How to detect:
Dramatiq workers hang after running for several days (typically 3 days). A MemoryError occurs in the watch_logs thread when trying to receive or write log data, causing the worker to stop processing messages. Logs also stop being written. The issue occurs even with --verbose mode enabled.
Recommended action:
Restart the worker process to temporarily restore functionality. For long-term fixes: (1) Reduce concurrency with --processes 1 --threads 1, (2) Monitor memory consumption in the log watching thread, (3) Consider switching to a thread-safe broker implementation like dramatiq-kombu-broker, or (4) Disable verbose logging if not needed.