PostgreSQL | Autovacuum Falling BehindMy Postgres tables are showing millions of dead tuples and autovacuum doesn't seem to be keeping up. How do I know if I need to tune autovacuum settings or if there's something blocking it? | Incident Response | warning | A- | A- | 2,380vs1,591 | 41.2svs31.8s | 9vs5 | 3vs1 | 0vs0 | 3,065 charsvs3,506 chars |
PostgreSQL | Background Writer and Checkpoint BottleneckI'm seeing high bgwriter_buffers_backend and low bgwriter_buffers_clean in PostgreSQL metrics. My write performance has degraded and I think backends are writing their own dirty buffers instead of the background writer handling it. How do I tune this? | performance | warning | — |
PostgreSQL | Backup and Recovery PerformanceI need to verify our PostgreSQL backup strategy is working well. How do I assess backup performance, storage costs, and whether we can actually meet our recovery time objectives? | Proactive Health | info | — |
PostgreSQL | Cache Hit Ratio DegradationI noticed our PostgreSQL cache hit ratio dropped from 99% to 85% over the past week. Is this a problem and should I increase shared_buffers or is something else going on? | Proactive Health | warning | B+ | A- | 1,310vs1,296 | 24.9svs25.2s | 6vs5 | 2vs1 | 0vs0 | 2,728 charsvs2,996 chars |
PostgreSQL | Checkpoint Tuning for Write PerformanceMy PostgreSQL instance is experiencing periodic write latency spikes. I see these correlate with checkpoints happening every few minutes. How do I tune checkpoint settings to smooth out performance? | Proactive Health | warning | — |
PostgreSQL | Comprehensive Operational Readiness ReviewI set up PostgreSQL a while ago and the workload has evolved a lot since then. How do I make sure it's well configured and provisioned? I want a comprehensive health check covering performance, capacity, reliability, and cost. | Proactive Health | info | — |
PostgreSQL | Concurrent Index Build MonitoringI'm running CREATE INDEX CONCURRENTLY on a 200GB production table in PostgreSQL. It's been running for 2 hours and I need to know if it's progressing normally or stuck. How do I monitor the operation and what could cause it to fail? | Proactive Health | info | — |
PostgreSQL | Connection Pool ExhaustionWe're getting 'FATAL: sorry, too many clients already' errors in our app logs. Our Cloud SQL PostgreSQL instance is at 95% of max_connections. What do I do right now and how do I prevent this? | Incident Response | critical | B+ | A- | 601vs852 | 15.9svs19.1s | 2vs2 | 0vs0 | 0vs0 | 1,490 charsvs2,053 chars |
PostgreSQL | Connection Timeout and Network IssuesOur app is getting intermittent 'could not connect to server' and connection timeout errors from PostgreSQL. How do I diagnose whether this is a network issue, database overload, or configuration problem? | Incident Response | critical | — |
PostgreSQL | Cost Optimization for Over-Provisioned InstanceOur monthly PostgreSQL bill on RDS is $5000 but CPU utilization averages 15% and memory is at 40%. How do I safely downsize to save money without risking performance issues? | Cost Optimization | info | — |
PostgreSQL | Cost optimization through performance tuningMy monthly AWS RDS PostgreSQL bill is $8,000 and I need to reduce costs by 30% without degrading performance. Help me identify whether I should optimize slow queries to downgrade my instance, tune autovacuum to reduce IOPS costs, or consolidate databases. | Cost Optimization | info | C+ | B- | 1,437vs1,648 | 30.0svs36.2s | 7vs7 | 2vs2 | 0vs0 | 1,558 charsvs1,310 chars |
PostgreSQL | Cross-platform metric mapping for PostgreSQLI'm migrating my PostgreSQL monitoring from Datadog to Prometheus. How do Datadog's postgresql.connections metric and AWS CloudWatch's DatabaseConnections metric map to the postgres_exporter metrics in Prometheus? | Cross Platform | info | A- | B+ | 868vs1,109 | 18.1svs44.1s | 2vs8 | 0vs3 | 0vs2 | 2,078 charsvs1,818 chars |
PostgreSQL | Data Type Selection and Schema Design ProblemsI'm reviewing our PostgreSQL schema and I see we're using timestamp without time zone in many places, char(20) for some text fields, and the money type for currency. Are these bad choices and what problems might they cause in production? | Proactive Health | warning | — |
PostgreSQL | Deadlock TroubleshootingWe're seeing deadlock errors in our application logs several times per hour. How do I find what queries are causing deadlocks and how to prevent them? | Incident Response | warning | — |
PostgreSQL | Disk space exhaustion from WAL filesMy PostgreSQL pg_wal directory is consuming 80% of available disk space and growing. I have replication configured—how do I safely clean this up without breaking replication or causing data loss? | Incident Response | critical | B+ | A- | 1,273vs1,355 | 25.4svs24.6s | 2vs2 | 0vs0 | 0vs0 | 3,427 charsvs3,755 chars |
PostgreSQL | Extension Management and CompatibilityWe're using several PostgreSQL extensions like PostGIS and pg_stat_statements. How do I check which extensions are installed, their performance impact, and whether they'll work if we upgrade from PostgreSQL 13 to 15? | Migration | info | — |
PostgreSQL | Fast Path Lock Exhaustion from PartitioningOur PostgreSQL queries on a partitioned table suddenly got 10x slower. I see locks showing up in pg_locks but not blocking each other. Someone mentioned fast path lock exhaustion - how do I diagnose this and what's the fix? | performance | warning | — |
PostgreSQL | Index bloat requiring maintenanceMy PostgreSQL indexes are showing significant bloat (>30%) on several high-traffic tables. Should I run REINDEX CONCURRENTLY or use pg_repack, and how do I avoid locking out my production workload? | Proactive Health | warning | B+ | A- | 1,348vs1,673 | 26.7svs32.6s | 2vs2 | 0vs0 | 0vs0 | 3,842 charsvs4,874 chars |
PostgreSQL | Index Health and Usage AnalysisI want to audit my PostgreSQL indexes. Which ones are never used and safe to drop, and where might I be missing indexes that would improve query performance? | Proactive Health | info | — |
PostgreSQL | Lock contention blocking transactionsMy PostgreSQL database has queries backing up because they're waiting on locks. Help me identify which queries are holding locks, which are blocked, and whether I have a deadlock situation or just long-running DDL. | Incident Response | warning | B | B+ | 1,386vs2,677 | 21.2svs39.9s | 2vs5 | 0vs1 | 0vs0 | 4,012 charsvs1,745 chars |
PostgreSQL | Logical Replication Conflict ResolutionMy PostgreSQL logical replication subscription stopped working with unique constraint violation errors. The replication worker keeps failing and I'm seeing apply errors in pg_stat_subscription. How do I diagnose what's causing the conflict and safely resume replication without losing data? | Incident Response | critical | — |
PostgreSQL | Memory Configuration OptimizationI inherited a PostgreSQL database and the memory settings seem to be defaults from years ago. How do I determine the right values for shared_buffers, work_mem, and maintenance_work_mem based on our current instance size and workload? | Proactive Health | info | — |
PostgreSQL | Memory Exhaustion and OOM Killer PreventionMy PostgreSQL server keeps crashing and the Linux OOM killer is terminating the postgres process. I have work_mem set to 256MB but we have hundreds of concurrent connections. How do I calculate safe memory limits and prevent these crashes? | Incident Response | critical | — |
PostgreSQL | Migration from RDS to Cloud SQLWe're planning to migrate our PostgreSQL database from AWS RDS to Google Cloud SQL. What are the key differences I need to understand in terms of performance metrics, configuration, and operational procedures? | Migration | info | — |
PostgreSQL | MultiXact Member Space ExhaustionMy PostgreSQL database is throwing errors about MultiXact member space exhaustion and all writes are failing. I see messages about emergency autovacuum in the logs. What is MultiXact member space and how do I fix this before we have a complete outage? | Incident Response | critical | — |
PostgreSQL | N+1 Query Pattern Detection and ResolutionOur PostgreSQL connection count spikes to 200+ during busy periods and API response times are terrible. The logs show thousands of nearly identical simple SELECT queries for related records. I think we have an N+1 query problem - how do I find and fix these patterns? | performance | warning | — |
PostgreSQL | Operational readiness reviewI set up Postgres a while ago. The workload has evolved a lot since then. How do I make sure it's well configured and provisioned? | Proactive Health | info | B+ | A- | 1,123vs2,515 | 23.4svs49.3s | 7vs23 | 3vs13 | 0vs3 | 2,386 charsvs3,590 chars |
PostgreSQL | PostgreSQL migration from AWS RDS to Google Cloud SQLI'm planning to migrate our PostgreSQL database from AWS RDS to Google Cloud SQL. What are the key differences in monitoring capabilities, performance characteristics, and operational features I need to account for during and after the migration? | Migration | info | B+ | A- | 1,196vs1,278 | 26.9svs47.1s | 2vs2 | 0vs0 | 0vs0 | 4,007 charsvs4,545 chars |
PostgreSQL | Query Plan Regression DetectionA query that used to run in 100ms is now taking 30 seconds, but the data volume hasn't changed much. How do I check if the query plan changed and why the optimizer might be making a bad choice now? | Incident Response | warning | — |
PostgreSQL | Replication Lag CrisisMy PostgreSQL read replica on RDS is showing 45 seconds of replication lag and climbing. What's causing this and how do I fix it before it impacts users? | Incident Response | critical | — |
PostgreSQL | Replication lag threatening data consistencyMy PostgreSQL read replicas are falling behind the primary by 30 seconds and climbing. Help me diagnose if this is a resource bottleneck, network issue, or replication slot problem before it causes an outage. | Incident Response | critical | B+ | A- | 7,911vs6,796 | 12.2mvs1.8m | 13vs12 | 6vs6 | 0vs0 | 2,654 charsvs2,942 chars |
PostgreSQL | Replication Slot Bloat and CleanupMy PostgreSQL disk is filling up with WAL files and I see an inactive replication slot that hasn't been used in days. It's preventing vacuum from cleaning up old data. Can I safely drop this slot or will it break replication? | Incident Response | warning | — |
PostgreSQL | Right-Sizing for Workload GrowthHelp me determine whether my Postgres deployment on Google Cloud SQL should be provisioned up or down based on my current workload. We started on this instance size 6 months ago and traffic has doubled. | Capacity Planning | warning | — |
PostgreSQL | Right-sizing PostgreSQL cloud instancesHelp me determine whether my Postgres deployment on Google Cloud SQL should be provisioned up or down based on my current workload. I'm seeing 45% CPU utilization on average but occasional spikes to 85%. | Capacity Planning | info | D+ | B+ | 1,162vs813 | 25.1svs18.8s | 6vs2 | 2vs0 | 0vs0 | 128 charsvs1,648 chars |
PostgreSQL | Slow Query DiagnosisOur app response times have doubled in the last hour. How do I quickly find which PostgreSQL queries are slow and whether they need better indexes or query optimization? | Incident Response | warning | — |
PostgreSQL | Slow query performance investigationSeveral of my PostgreSQL queries have gotten significantly slower over the past month. Help me identify the top offenders and determine whether this needs new indexes, query rewrites, or updated table statistics. | Proactive Health | warning | A- | C+ | 6,431vs1,553 | 1.8mvs12.4m | 16vs18 | 9vs9 | 0vs2 | 3,872 charsvs1,679 chars |
PostgreSQL | Slow Replica Performance InvestigationOur PostgreSQL read replica has zero replication lag but queries are running 3x slower than on the primary. What could cause this performance difference and how do I diagnose it? | Incident Response | warning | — |
PostgreSQL | Standby Query Conflict ResolutionOur PostgreSQL read replica is canceling long-running queries with errors about conflicting recovery processes. I see bufferpin and snapshot conflicts in pg_stat_database_conflicts. How do I balance replication consistency with allowing analytical queries to complete? | Incident Response | warning | — |
PostgreSQL | Storage Performance BottleneckMy PostgreSQL queries are slow and I see high disk I/O wait times. How do I determine if I need faster storage, more IOPS, or if there's a configuration issue I can fix? | Incident Response | warning | — |
PostgreSQL | Table Bloat Diagnosis and RemediationOne of my PostgreSQL tables is taking up 50GB of disk space but only has 10GB of actual data when I dump it. How do I measure bloat and safely reclaim this space? | Proactive Health | warning | — |
PostgreSQL | Temp Table and Temporary File ManagementI'm seeing hundreds of GB of temp_bytes in pg_stat_database and queries are slowing down. How do I determine if I should increase work_mem or if these queries need optimization? | Proactive Health | warning | — |
PostgreSQL | Transaction ID Wraparound Emergency ResponseI'm getting FATAL errors about transaction ID wraparound in PostgreSQL and the database is warning it will shut down in 1 million transactions. The age of the oldest transaction is over 2 billion. What do I need to do right now to prevent an outage? | Incident Response | critical | — |
PostgreSQL | Transaction ID Wraparound RiskI'm getting warnings about transaction ID wraparound in my PostgreSQL logs. The age of the oldest transaction is over 1.5 billion. How urgent is this and what do I need to do? | Incident Response | critical | B+ | A- | 1,098vs814 | 24.3svs18.7s | 2vs2 | 0vs0 | 0vs0 | 2,971 charsvs2,097 chars |
PostgreSQL | VACUUM FULL Execution PlanningI have a 500GB PostgreSQL table with 60% bloat that needs VACUUM FULL to reclaim space. How do I plan this operation for production - how long will it take, how much extra disk space do I need, and what's the impact on running queries? | Proactive Health | warning | — |
PostgreSQL | Version Upgrade Planning and ValidationWe need to upgrade from PostgreSQL 12 to PostgreSQL 15. What should I check for compatibility issues, and how do I validate performance won't regress after the upgrade? | Migration | info | — |
PostgreSQL | WAL Accumulation and Disk Space CrisisMy PostgreSQL instance on RDS is running out of disk space and I see the WAL directory has grown to 200GB. Why isn't WAL archiving or cleanup happening and how do I fix this urgently? | Incident Response | critical | — |