Replication lag triggered by CPU saturation from bad query plans

critical

ReplicationUpdated Nov 6, 2025

Sources

PostgreSQL 100% CPU with Normal Query Load | Fixing Stale Statistics and Planner Lieshaiderzdbre.substack.com

Technologies:

PostgreSQLsubject

How to detect:

Replication lag exceeds 18 minutes when primary node experiences CPU saturation from inefficient query plans. The WAL application on replicas cannot keep pace with primary write activity, triggering failover alarms. Occurs as a cascading effect when stale statistics cause primary to execute queries inefficiently.

Recommended action:

Address root cause by refreshing statistics on primary using ANALYZE. Monitor replication lag using pg_stat_replication view. If lag persists after statistics refresh, check replica CPU and disk I/O capacity. Consider temporarily reducing application write load or scaling replica resources until lag recovers below 5 minutes.