Cassandra

Hinted Handoff Backlog Indicating Network Partition

critical
ReplicationUpdated Jan 10, 2026

Rising cassandra_storage_count_hints while all nodes appear up in nodetool status indicates silent network partitions or zombie nodes that accept gossip but fail write requests. This causes eventual consistency drift between replicas.

How to detect:

Monitor cassandra_storage_count_hints continuously growing despite all nodes showing 'UN' (Up/Normal) status. Cross-reference with cassandra_storage_count_hints_in_progress to verify hints are accumulating faster than replay. Check for cassandra_client_request_error showing UnavailableException from specific coordinator nodes.

Recommended action:

Use nodetool status on each node to identify discrepancies in cluster view. Check network connectivity between data centers and verify listen_address and rpc_address configuration. Inspect firewall rules and security groups. If a node is truly unreachable, perform nodetool repair after restoring connectivity. Consider adjusting phi_convict_threshold if network latency is causing false positives.