Comprehensive Operational Readiness Review
infoProactive Health
Baseline assessment of PostgreSQL instance health, configuration, and operational maturity across all dimensions.
Prompt: “I set up PostgreSQL a while ago and the workload has evolved a lot since then. How do I make sure it's well configured and provisioned? I want a comprehensive health check covering performance, capacity, reliability, and cost.”
With vs Without Schema
Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.
Tokens
16,126vs1,195
Latency
3.9mmsvs26.7sms
Turns
21vs16
Tool Calls
13vs7
Schema Calls
0vs1
Response
2,218 charsvs799 chars
Evaluation
Without Schema
A-
vs
With Schema
D+
The response without Schema delivers a complete, immediately actionable health check toolkit with specific SQL queries covering cache hit ratios, index bloat, transaction wraparound risk, replication status, and table growth metrics, plus ready-to-run shell scripts and detailed interpretation guides with thresholds. It comprehensively addresses all four requested areas (performance, capacity, reliability, cost) and provides concrete files the operator can execute right away. The response with Schema, however, provides no actionable guidance at all—it only requests permission to access the Schema.ai catalog without delivering any concrete health check steps, metrics, or commands. The operator would need to continue the conversation to receive the kind of comprehensive, immediately usable guidance that the non-Schema response provided from the start.
Without Schema
13 tool calls
I'll help you perform a comprehensive PostgreSQL health check. Let me start by understanding your setup.
I'll create a comprehensive PostgreSQL health check toolkit for you. Since I don't see an active PostgreSQL instance in this directory, I'll build you a complete assessment framework you can run against your database.
I've created a comprehensive PostgreSQL health check toolkit for you. Here's what's included:
## 📋 Files Created:
1. **postgres_health_check.sql** - Comprehensive SQL queries covering:
- **Performance**: Cache hit ratios, slow queries, query statistics, missing/unused indexes, index bloat
- **Capacity**: Table growth rates, disk space, connection usage
- **Reliability**: Replication status, WAL archiving, transaction wraparound risk, table bloat, vacuum stats, locks, checkpoints
- **Configuration**: Critical settings review, memory configuration, extensions, security
2. **postgres_health_recommendations.md** - Detailed interpretation guide with:
- What to look for in each metric
- Threshold values (healthy/warning/critical)
- Specific action items for each issue
- Configuration optimization templates
- Cost reduction strategies
- Cloud-specific recommendations (AWS/GCP/Azure)
- Quick action checklist
- Monitoring setup guide
3. **run_health_check.sh** - Interactive script to:
- Connect to your PostgreSQL database
- Run all health check queries
- Generate timestamped reports
- Check for required extensions
4. **cloud_health_check.sh** - Cloud provider-specific checks for:
- AWS RDS/Aurora (CloudWatch metrics, instance details)
- Google Cloud SQL (instance info, operations)
- Azure Database for PostgreSQL (server details)
- Cost optimization recommendations per provider
## 🚀 How to Use:
```bash
# For database-level analysis
./run_health_check.sh
# For cloud infrastructure analysis
./cloud_health_check.sh
# Or run SQL directly
psql -U username -d database -f postgres_health_check.sql > report.txt
```
The toolkit covers all four areas you requested: **performance, capacity, reliability, and cost** - giving you a complete picture of your PostgreSQL health and actionable optimization opportunities.
With Schema
7 tool calls1 Schema
I'll help you perform a comprehensive PostgreSQL health check. Let me start by exploring your setup to understand your current configuration.
I see you have a Schema.ai MCP server configured. This is perfect for PostgreSQL health checks! Schema.ai specializes in database observability and can provide comprehensive insights. Let me access it to help with your PostgreSQL assessment.
Perfect! Let me search the Schema.ai catalog for PostgreSQL health check guidance covering all your areas of concern.
I need your permission to access the Schema.ai catalog to provide you with comprehensive PostgreSQL health check guidance. This will give us access to industry best practices, key metrics to monitor, and specific insights about PostgreSQL performance, capacity, reliability, and cost optimization.
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
For a comprehensive operational readiness review, start by understanding how your workload has evolved since initial setup, then ensure your reliability fundamentals (backups, replication, critical risks) are sound before diving into resource constraints, autovacuum effectiveness, and configuration tuning opportunities. This prioritizes preventing outages over optimizing performance.
1Establish baseline health and understand workload evolution
Start with `overall-instance-health-status` and `workload-characteristics-and-trends` to understand what's actually changed. Check `postgresql-backends` and `postgresql-connections-by-process` to see current connection patterns, and `postgresql-database-size` to understand growth. If your database has grown 10x but your config hasn't changed, you'll find issues in every other area. This baseline tells you what to prioritize in subsequent steps.
2Verify backup and disaster recovery readiness
Check `backup-and-recovery-readiness` and `replication-health` before anything else—if you can't recover from failure, performance tuning is meaningless. If you have replicas, verify `replication-lag` is consistently under 10 seconds; sustained lag over 60s means your replica can't keep up. Test your backup restoration process if you haven't in the last quarter; I've seen too many teams discover broken backups during an outage.
3Assess critical reliability risks that cause outages
Check `reliability-risk-assessment` for silent killers like transaction ID wraparound. Look at `postgresql-before-xid-wraparound`—if this is below 100 million, you're at risk of forced shutdown. Check `long-running-transaction-oldest-timestamp`—transactions open for hours block autovacuum and can cascade into performance emergencies. These risks tend to grow slowly as workload evolves, then bite you suddenly.
4Identify resource bottlenecks and capacity headroom
Check `resource-bottleneck-identification` and `capacity-headroom-adequacy` to see if you're running out of headroom. Look at `postgresql-percent-usage-connections`—if you're consistently above 80%, you'll start seeing connection errors. High `postgresql-temp-bytes` means `work_mem` is too low for your current workload. Review `postgresql-locks` for contention patterns that didn't exist when the workload was smaller.
5Evaluate autovacuum effectiveness and table bloat
Check `autovacuum-effectiveness` because configs that worked at low volume often fall behind as workload grows. Look at `dead-tuple` counts and `postgresql-table-bloat`—bloat consistently above 20% means autovacuum can't keep up with your update/delete rate. This degrades performance slowly over time, so it's easy to miss until queries that used to take milliseconds now take seconds. Tuning autovacuum is often the highest-impact fix for evolved workloads.
6Review configuration tuning and performance optimization opportunities
Check `configuration-tuning-needed` and `performance-optimization-opportunities` to align settings with your current workload. Look at `postgresql-buffer-hit` or `hit`—if this is below 99%, your `shared_buffers` may be undersized for current working set. Review `checkpoint-timed` frequency—if checkpoints happen too often, you're creating unnecessary I/O; too rarely, and recovery time suffers. Initial configurations rarely scale optimally with workload evolution.
7Identify cost optimization opportunities
Finally, check `cost-optimization-opportunities` now that you understand reliability and performance needs. Look for over-provisioned resources—if `postgresql-percent-usage-connections` is consistently below 30%, you may have room to reduce connection pooling overhead or consolidate instances. Review if monitoring and alerting gaps from `monitoring-and-alerting-gaps` mean you're over-provisioned "just in case." Cost optimization comes last because you need reliability and performance baselines first.
Technologies
Related Insights
overall instance health status
diagnostic_step
Stub insight created from scenario discovery hint: overall instance health status
backup and recovery readiness
Stub insight created from scenario discovery hint: backup and recovery readiness
replication health
Stub insight created from scenario discovery hint: replication health
cost optimization opportunities
diagnostic_step
Stub insight created from scenario discovery hint: cost optimization opportunities
version upgrade considerations
Stub insight created from scenario discovery hint: version upgrade considerations
monitoring and alerting gaps
Stub insight created from scenario discovery hint: monitoring and alerting gaps
operational best practice adherence
Stub insight created from scenario discovery hint: operational best practice adherence
workload characteristics and trends
diagnostic_step
Stub insight created from scenario discovery hint: workload characteristics and trends
resource bottleneck identification
Stub insight created from scenario discovery hint: resource bottleneck identification
configuration tuning needed
diagnostic_step
Stub insight created from scenario discovery hint: configuration tuning needed
autovacuum effectiveness
diagnostic_step
Stub insight created from scenario discovery hint: autovacuum effectiveness
capacity headroom adequacy
diagnostic_step
Stub insight created from scenario discovery hint: capacity headroom adequacy
reliability risk assessment
diagnostic_step
Stub insight created from scenario discovery hint: reliability risk assessment
performance optimization opportunities
supporting_evidence
Stub insight created from scenario discovery hint: performance optimization opportunities
Relevant Metrics
hitspostgresql.backends(primary_diagnostic)percent_usage_connections(threshold_indicator)n_dead_tup(primary_diagnostic)buffer_hit(primary_diagnostic)checkpoint_timed(primary_diagnostic)postgresql.database.size(primary_diagnostic)before_xid_wraparound(threshold_indicator)replication_lag_seconds(primary_diagnostic)table_bloat(secondary_diagnostic)postgresql.locks(secondary_diagnostic)postgresql.temp_bytes(secondary_diagnostic)long_running_transaction_oldest_timestamp(secondary_diagnostic)uptime(monitor)connections_by_process(primary_diagnostic)transactions_open(monitor)Monitoring Interfaces
PostgreSQL Native