crdb_cluster_node
Number of stores in the clusterKnowledge Base (8 documents, 0 chunks)
Technical Annotations (16)
Configuration Parameters (1)
num_replicasrecommended: 3Error Signatures (1)
out-of-memory panicsexceptionCLI Commands (3)
cockroach node status --certs-dir=certs --host=node1.example.com:26257diagnosticallocsimdiagnosticzerosumdiagnosticTechnical References (11)
liveness_livenodescomponentreplica checksum comparisonsconceptmulti-region distributioncomponentautomated failovercomponentmulti-cloud deploymentconceptRaft-based replicationprotocolUS-East-1componentranges.unavailablecomponentquorumconceptliveness rangecomponentnode livenessconceptRelated Insights (12)
Ranges fall below target replication factor, creating availability risk. Often caused by node failures, decommissioning, or insufficient cluster capacity during rebalancing.
Elevated crdb_cluster_liveness_heartbeat_time indicates network issues, CPU starvation, or node health problems that could trigger false node failure detection and unnecessary rebalancing.
When cluster loses quorum, ranges become unavailable and queries fail, yet DB Console and Prometheus endpoint may remain accessible (served from unavailable node's cache). Operators can be misled by accessible monitoring showing stale data while cluster is actually down, delaying incident response.
Declining crdb_cluster_gossip_infos_received rate indicates gossip network issues that can affect cluster coordination, liveness detection, and metadata propagation. This precedes more serious coordination failures.