Hidden Network Partition Detected via Cassandra Hints Growth
criticalreliabilityUpdated Jan 10, 2026
Rising cassandra.storage.total_hints.count while all nodes appear healthy indicates writes are being buffered for unreachable replicas. This suggests a network partition, misconfiguration, or 'zombie' node that appears up but isn't accepting writes, causing the coordinator to queue hints indefinitely.
Sources
Technologies:
How to detect:
Alert when cassandra.storage.total_hints.count grows continuously while all nodes report as healthy (no known outages). Cross-check with node liveness metrics and network connectivity between nodes.
Recommended action:
Investigate network connectivity between nodes, check for partial node failures or misconfigurations preventing write acceptance. Verify gossip protocol status and inter-node communication paths.