Replication Lag Accumulation
warningBucket or site replication can fall behind due to network issues, target unavailability, or insufficient replication workers, causing growing backlogs that impact RPO objectives.
Track replication worker queue depth, objects_pending_replication growing over time, replication_active_workers at maximum while backlog increases, or replication_throughput_bytes dropping below baseline during normal load.
Use 'mc admin replicate resync status' to inspect replication backlog details. Verify target cluster health and network connectivity with hperf. Check if replication workers are saturated (CPU/network bound). Use 'mc admin replicate resync start' to manually trigger replication recovery. Alert if backlog exceeds RPO tolerance (e.g., >1 hour of writes).