Physical cluster replication lag escalation
warningGrowing replication lag in CockroachDB PCR indicates the standby cluster cannot keep pace with primary writes. Unchecked lag degrades failover readiness and increases potential data loss window.
SHOW VIRTUAL CLUSTER WITH REPLICATION STATUS shows replication_lag exceeding SLO threshold (e.g., 60s) or showing sustained upward trend. physical_replication.replicated_time_seconds metric (Prometheus) shows lag growth.
When replication lag grows: (1) Check standby cluster resource utilization (CPU, disk I/O) for bottlenecks, (2) Verify network bandwidth between clusters is sufficient, (3) Investigate physical_replication.logical_bytes for sudden data volume spikes, (4) Consider scaling standby cluster if lag persists during normal load. Review replication job status.