Replication lag triggered by CPU saturation from bad query plans
criticalReplicationUpdated Nov 6, 2025
Technologies:
How to detect:
Replication lag exceeds 18 minutes when primary node experiences CPU saturation from inefficient query plans. The WAL application on replicas cannot keep pace with primary write activity, triggering failover alarms. Occurs as a cascading effect when stale statistics cause primary to execute queries inefficiently.
Recommended action:
Address root cause by refreshing statistics on primary using ANALYZE. Monitor replication lag using pg_stat_replication view. If lag persists after statistics refresh, check replica CPU and disk I/O capacity. Consider temporarily reducing application write load or scaling replica resources until lag recovers below 5 minutes.