OpenSearch

Network Partition Detection via TCP Metrics

critical
reliabilityUpdated Feb 18, 2026

Elevated Net_TCP_Lost packets and high Net_TCP_RxQ/Net_TCP_TxQ indicate network congestion or partial partitions between nodes, leading to cluster instability and potential split-brain scenarios if combined with cluster manager election failures.

How to detect:

Watch for Net_TCP_Lost increasing alongside Net_TCP_TxQ and Net_TCP_RxQ queue buildup. Correlate with LeaderCheck_Failure and FollowerCheck_Failure metrics to detect cluster manager communication issues. Monitor Net_PacketDropRate4/6 for general packet loss.

Recommended action:

Investigate network infrastructure for congestion, check for misconfigured MTU causing fragmentation, verify node-to-node connectivity, review firewall rules, and consider network segmentation or dedicated cluster manager nodes. Enable detailed transport layer logging via org.opensearch.transport.