Hadoop HDFS

HDFS Storage Capacity Exhaustion Blocking Checkpoints

critical
Resource ContentionUpdated Jan 27, 2026

Checkpoint writes to HDFS fail when storage capacity is exhausted, causing checkpoint timeout failures and preventing state persistence for fault tolerance.

How to detect:

Monitor HDFS capacity metrics showing low free space (typically <10% remaining). Watch for checkpoint failure logs with storage-related errors and increasing checkpoint timeout occurrences. Track checkpoint directory size growth trends.

Recommended action:

Delete unnecessary data from checkpoint directories (old checkpoints beyond retention policy). Archive historical data to cheaper storage tiers. Add DataNode capacity by expanding cluster. Review and adjust state.checkpoints.num-retained to reduce checkpoint storage footprint.