Replication Rate Mismatch Signals Cross-Region Lag or Failure
criticalDivergence between pulsar_replication_rate_in and pulsar_replication_rate_out, or sustained pulsar_replication_rate_expired messages, indicates geo-replication is failing or falling behind, risking data loss in disaster recovery scenarios.
Compare pulsar_replication_rate_in at target cluster with pulsar_replication_rate_out at source cluster to detect replication lag. Monitor pulsar_replication_rate_expired for messages expiring before replication completes. Check pulsar_replication_connected status for broken replication links. Correlate with pulsar_replication_throughput_out to assess bandwidth utilization.
Investigate network connectivity between regions and check for bandwidth saturation. Verify replication policies and ensure sufficient resources (bandwidth, broker capacity) at target cluster. Check for authentication/authorization failures blocking replication. Consider increasing message TTL if legitimate replication lag is expected. Scale broker and BookKeeper capacity at target cluster if inbound replication is bottlenecked.