DramatiqApache Kafka

Worker pool exhaustion from blocking retry delays prevents message processing

critical
Resource ContentionUpdated Sep 26, 2024(via Exa)
How to detect:

When workers are blocked performing retries with long delays (especially due to time.sleep), the worker pool can be exhausted, preventing new messages from being processed. Messages are pushed onto the work queue but never processed, missing the 'Received message' log entry.

Recommended action:

Check worker pool size vs incoming message rate. Monitor dramatiq.worker.busy and dramatiq.worker.idle metrics to detect full utilization. Review retry configuration for delay settings. Increase worker pool size if consistently at capacity. Investigate blocking operations like time.sleep in retry logic.