When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
When diagnosing standby query conflicts on PostgreSQL read replicas, start by quantifying which conflict types are occurring most frequently, then identify the affected queries and evaluate your hot_standby settings. The key is finding the right balance between max_standby_streaming_delay (how long replay will wait for queries) and hot_standby_feedback (preventing conflicts but potentially causing primary bloat).
1Quantify the conflict types and frequency
Check `conflict-bufferpin` and `conflict-snapshot` in pg_stat_database_conflicts to understand which conflict type dominates. Buffer pin conflicts mean queries are holding buffers that recovery needs to modify, while snapshot conflicts mean the query's snapshot is too old for recovery to proceed. If you're seeing hundreds of conflicts per hour, this is severely impacting your analytical workload. The conflict type determines whether you need to adjust max_standby delays or address long-running transactions on the primary.
2Identify the canceled queries and their duration
Look at `postgresql-transactions-open` on the replica to see which queries are running when conflicts occur. If your analytical queries consistently run 30+ minutes, they'll conflict with the default max_standby_streaming_delay of 30 seconds. Check pg_stat_activity for query patterns — if these are scheduled ETL or reporting jobs, you have predictable conflict windows. Understanding query duration helps you decide whether to increase standby delays or optimize the queries themselves.
3Review current hot_standby configuration parameters
Check max_standby_streaming_delay and max_standby_archive_delay settings — the default 30 seconds is way too low for analytical workloads. For read replicas serving long-running queries, consider increasing to 300 seconds (5 minutes) or even higher based on your RPO requirements. Also check if hot_standby_feedback is enabled — this prevents conflicts entirely but has important tradeoffs we'll evaluate next. These settings directly control how long recovery waits before canceling conflicting queries.
4Evaluate hot_standby_feedback tradeoffs
Enabling hot_standby_feedback prevents query cancellations but can cause serious problems on the primary. The `hot-standby-feedback-cpu-spikes` insight shows that 30+ minute queries on the replica will block VACUUM on the primary, causing bloat and CPU spikes. The `hot-standby-feedback-causing-bloat` issue means dead tuples accumulate on frequently updated tables. Only enable this if your replica queries are relatively short (under 10 minutes) or if you can add indexes to speed them up significantly. Monitor replication lag with `stream` after enabling.
5Check for long-running transactions on the primary
Snapshot conflicts often originate from long-running transactions on the primary that haven't committed. Check `postgresql-transactions-open` on the primary database — idle transactions or forgotten BEGIN statements will cause the replica's snapshot to become invalid during recovery. Even if the primary transaction isn't actively running queries, an open transaction prevents VACUUM and causes snapshot conflicts on replicas. Find and terminate these transactions on the primary before adjusting replica settings.
6Implement a balanced approach based on workload
For most analytical workloads, start by increasing max_standby_streaming_delay to 300-600 seconds rather than enabling hot_standby_feedback. This allows queries to complete without causing primary bloat. If conflicts persist and you must use hot_standby_feedback, add indexes to replica-queried tables (per the `hot-standby-feedback-cpu-spikes` insight) to reduce query duration below 10 minutes. Monitor `conflict` metrics after changes — you should see them drop to near zero. If you're running without replicas due to these issues, the `no-replica-eliminates-fast-recovery` insight reminds us that losing your replica eliminates failover capability, so fixing this is critical.