Confluent PlatformApache Kafka

Poison pill messages cause deterministic partition processing failure

critical
availabilityUpdated Mar 5, 2026(via Exa)
How to detect:

Specific message payload triggers heavy processing (database query, GC pause) causing consumer to exceed max.poll.interval.ms. Consumer leaves group, rejoins, CooperativeStickyAssignor assigns same partitions, consumer re-consumes same message and fails again. With static membership, this creates guaranteed loop where failing node reclaims the poison pill every time, preventing offset advancement on that partition.

Recommended action:

Implement DeserializationExceptionHandler or custom Interceptor with local LRU cache tracking message IDs or hashes. If message ID seen >3 times within 10 minutes (indicating re-delivery due to crash), log payload to Dead Letter Queue (DLQ), return null/sentinel value, and commit the offset. Prevent single bad message from blocking entire partition. Alternatively, offload heavy processing to separate worker thread pool outside poll() loop.