Hinted Handoff Backlog Indicating Network Partition

critical

ReplicationUpdated Jan 10, 2026

Rising cassandra_storage_count_hints while all nodes appear up in nodetool status indicates silent network partitions or zombie nodes that accept gossip but fail write requests. This causes eventual consistency drift between replicas.

Sources

Troubleshooting Common Issues in Apache Cassandra - Nextbricknextbrick.com

Apache Cassandra Monitoring: Tools, Challenges & Best Practiceslast9.io

Cassandra troubleshooting guide - Site24x7www.site24x7.com

Apache Cassandra Monitoring with OpenTelemetry [including dashboards and alerts] | SigNozsignoz.io

Technologies:

CassandraThe root cause of this issue originates in Cassandra

How to detect:

Monitor cassandra_storage_count_hints continuously growing despite all nodes showing 'UN' (Up/Normal) status. Cross-reference with cassandra_storage_count_hints_in_progress to verify hints are accumulating faster than replay. Check for cassandra_client_request_error showing UnavailableException from specific coordinator nodes.

Recommended action:

Use nodetool status on each node to identify discrepancies in cluster view. Check network connectivity between data centers and verify listen_address and rpc_address configuration. Inspect firewall rules and security groups. If a node is truly unreachable, perform nodetool repair after restoring connectivity. Consider adjusting phi_convict_threshold if network latency is causing false positives.