Apache Kafka

Log Flush Latency Spikes Causing Write Stalls

warning
latencyUpdated Mar 2, 2026

When log flush operations take excessive time, produce requests are delayed as Kafka waits for data to be flushed to disk, impacting producer latency and throughput.

Technologies:
How to detect:

Monitor kafka.log.flush_rate dropping while kafka.request.produce_time_99p increases. Check kafka.log.LogFlushStats.LogFlushRateAndTimeMs.Percentile95th exceeding 100ms consistently.

Recommended action:

1. Check disk I/O: Monitor disk write latency and throughput. 2. Tune log.flush.interval.ms: Increase interval to reduce flush frequency (trades durability for performance). 3. Use faster storage: Consider SSD or NVMe for log directories. 4. Review RAID configuration: Ensure RAID controller has write cache enabled. 5. Monitor filesystem: Check for filesystem issues or fragmentation. 6. Adjust OS I/O scheduler: Use deadline or noop scheduler for better performance.