DramatiqApache Kafka

Millisecond to second conversion bug causes 1000x longer retry delays

critical
configurationUpdated Sep 26, 2024(via Exa)
How to detect:

A bug in the Kafka provider's retry delay logic fails to convert milliseconds to seconds before time.sleep(), causing retry delays to be 1000 times longer than intended. This blocks workers for extended periods, leading to pool exhaustion.

Recommended action:

Check Kafka provider code for time.sleep() calls with delay parameters. Verify time unit conversion between milliseconds and seconds is performed. Apply fix to convert ms to seconds before time.sleep. Review retry delay configuration to ensure values are in expected units. Monitor dramatiq.messages.retried metric for abnormal patterns.