Operational readiness review

infoProactive Health

Reviewing whether a long-running PostgreSQL deployment is well configured and properly provisioned as workloads have evolved over time.

Prompt: I set up Postgres a while ago. The workload has evolved a lot since then. How do I make sure it's well configured and provisioned?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,123vs2,515
Latency
23.4smsvs49.3sms
Turns
7vs23
Tool Calls
3vs13
Schema Calls
0vs3
Response
2,386 charsvs3,590 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
Both responses provide technically sound PostgreSQL tuning guidance with specific configuration parameters and diagnostic queries. The response without Schema is more immediately actionable, jumping straight into diagnostics and offering to create scripts, but lacks some modern best practices. The response with Schema is more comprehensive and structured, including important modern tuning parameters like checkpoint_completion_target (0.9) and SSD-specific settings (random_page_cost: 1.1, effective_io_concurrency: 200) that the baseline missed. While the Schema response spends time exploring MCP tools upfront, its additional technical depth and coverage of contemporary infrastructure patterns make it more likely to fully address a modern PostgreSQL workload.
Without Schema
3 tool calls
I'll help you assess your Postgres configuration and provisioning. Let me first check what's in your current directory to see if there are any relevant configuration files. I don't see any Postgres configuration files in your current directory. Let me provide you with a comprehensive approach to assess and optimize your Postgres setup: ## Key Areas to Check ### 1. **Analyze Current Workload** First, understand what's changed: ```sql -- Check current connection usage SELECT count(*), state FROM pg_stat_activity GROUP BY state; -- Find slow queries SELECT query, calls, total_exec_time, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20; -- Check database size and growth SELECT pg_size_pretty(pg_database_size(current_database())); -- Identify most accessed tables SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables ORDER BY seq_scan + idx_scan DESC LIMIT 20; ``` ### 2. **Memory Configuration** (most critical) These should scale with your RAM and workload: - **`shared_buffers`**: 25% of RAM (for dedicated DB server) - **`effective_cache_size`**: 50-75% of RAM - **`work_mem`**: RAM / (max_connections × 2-3) - tune per query complexity - **`maintenance_work_mem`**: 5-10% of RAM (for VACUUM, CREATE INDEX) ### 3. **Connection & Worker Settings** - **`max_connections`**: Based on actual concurrent connections + headroom - **`max_worker_processes`**, **`max_parallel_workers`**: Use available CPU cores - Consider connection pooling (PgBouncer) if connection count is high ### 4. **Write Performance** - **`checkpoint_timeout`** & **`max_wal_size`**: Reduce checkpoint frequency for write-heavy workloads - **`wal_buffers`**: 16MB for high-write workloads - **`synchronous_commit`**: Consider `off` for less critical writes (with trade-offs) ### 5. **Provisioning Check** ```bash # Check system resources free -h # Available RAM df -h # Disk space iostat -x 1 10 # I/O utilization top -bn1 | grep postgres # CPU usage ``` Would you like me to: 1. Create a script to collect all these diagnostics? 2. Help you connect to your Postgres instance to run these checks? 3. Analyze specific performance issues you're experiencing? Let me know your Postgres version, hardware specs, and primary workload type (OLTP/OLAP/mixed) for more specific recommendations.
With Schema
13 tool calls3 Schema
I'll help you assess your PostgreSQL configuration and provisioning. Let me start by exploring your setup to understand the current state. I see you have an MCP server configured for schema.ai. Let me check what tools are available to help assess your PostgreSQL setup. Perfect! Let me use these tools to help you assess and optimize your PostgreSQL setup. I can provide you with a comprehensive framework for assessing your PostgreSQL configuration and provisioning. Here are the key areas to review: ## **1. Workload Analysis** First, understand your current workload: - **Query patterns**: OLTP (many small transactions) vs OLAP (analytical queries)? - **Read/write ratio**: Heavy reads, writes, or balanced? - **Connection patterns**: Short-lived or long-lived connections? - **Data size**: How much has your database grown? ## **2. Key Configuration Areas** **Memory Settings:** - `shared_buffers`: Should be 25-40% of total RAM (start with 25%) - `effective_cache_size`: 50-75% of total RAM - `work_mem`: Per-operation memory; calculate as (Total RAM - shared_buffers) / (max_connections * 2-3) - `maintenance_work_mem`: For VACUUM, index creation (256MB-2GB) **Connection Settings:** - `max_connections`: Match your actual concurrency needs (+ headroom) - Consider using connection pooling (PgBouncer, pgpool-II) if you have many connections **Write Performance:** - `wal_buffers`: 16MB is usually sufficient - `checkpoint_completion_target`: 0.9 for smoother I/O - `max_wal_size`: Increase if you have write-heavy workloads (2-4GB+) **Query Planning:** - `random_page_cost`: Lower to 1.1 if using SSD storage - `effective_io_concurrency`: 200 for SSDs, higher for NVMe ## **3. Monitoring Metrics** Check these regularly: - **CPU & Memory**: Is Postgres using resources efficiently? - **Disk I/O**: IOPS, throughput, latency - **Cache hit ratio**: Should be >99% for reads - **Connection usage**: Are you hitting max_connections? - **Replication lag**: If using replicas - **Slow queries**: Use `pg_stat_statements` - **Table/index bloat**: Regular VACUUM needed? ## **4. Provisioning Assessment** **CPU:** - Complex queries or high concurrency → more cores - Check `pg_stat_activity` for active queries **Memory:** - More RAM = better cache performance - Monitor buffer cache hit ratio **Storage:** - IOPS requirements based on workload - SSD/NVMe for production workloads - Separate WAL from data directory if possible **Network:** - Bandwidth for replication and backups ## **5. Quick Health Checks** Run these queries to assess your current state: ```sql -- Cache hit ratio (should be >99%) SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as heap_hit, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio FROM pg_statio_user_tables; -- Index usage SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes ORDER BY idx_scan DESC; -- Slow queries (requires pg_stat_statements) SELECT query, calls, total_exec_time, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10; -- Database size SELECT pg_size_pretty(pg_database_size(current_database())); -- Connection usage SELECT count(*) FROM pg_stat_activity; ``` **Would you like me to help you:** 1. Review your specific configuration file? 2. Analyze your current metrics? 3. Set up monitoring for these key indicators? 4. Create a tuning plan based on your workload type? Let me know your setup details (cloud provider, instance size, workload type) and I can provide more targeted recommendations.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When reviewing a long-running PostgreSQL deployment, start with the critical stability checks — connection capacity and memory efficiency — then assess maintenance health through autovacuum performance and table bloat. Finally, verify that query planner statistics are fresh enough for the evolved workload. This progression catches the most likely provisioning issues first before diving into optimization details.

1Check connection capacity headroom
Compare `postgresql-backends` (active connections) to `postgresql-connection-max` to see how close you are to exhaustion. If you're regularly using >70% of max_connections, you're at risk — PostgreSQL will reject new connections when the limit is hit. This is the most common provisioning issue as workloads scale. The `connection-limit-approaching` insight will flag this, and the fix is either implementing connection pooling (pgBouncer) or increasing max_connections (though pooling is almost always better).
2Assess memory efficiency via cache hit ratio
Calculate cache hit ratio using `postgresql-blocks-hit` divided by (`postgresql-blocks-hit` + `postgresql-blocks-read`). If this ratio is below 99%, your shared_buffers is undersized for the current working set, forcing excessive disk I/O. The `cache-hit-ratio-below-99-percent` insight identifies this condition. For evolved workloads with more data or different access patterns, you may need to increase shared_buffers — start with 25% of total RAM as a baseline.
3Verify autovacuum is keeping up
Check `postgresql-last-vacuum` and `postgresql-autovacuumed` timestamps alongside `postgresql-dead-rows` counts. If tables haven't been autovacuumed recently (>24 hours) and dead row counts are climbing, autovacuum can't keep pace with your write rate. The `autovacuum-falling-behind-bloat` insight tracks this. Increase autovacuum_max_workers from the default 3 to 5-6, and consider lowering autovacuum_vacuum_scale_factor from 0.2 to 0.05 for high-churn tables.
4Measure accumulated table bloat
Review `postgresql-table-bloat` and the dead tuple ratio (`postgresql-dead-rows` relative to total rows). The `high-dead-tuple-ratio` insight flags tables where dead tuples exceed 10% of total rows, which causes significant performance degradation and wasted disk space. If bloat is >20%, you'll need manual VACUUM or even pg_repack. This commonly happens when autovacuum settings weren't adjusted as the workload evolved from small to large tables.
5Check for stale query planner statistics
Look at last_analyze timestamps in pg_stat_user_tables (via `postgresql-last-vacuum` as a proxy metric). If you've had major data changes — bulk loads, large deletes, significant workload shifts — and statistics haven't been updated in days, the query planner is using outdated cardinality estimates. The `stale-statistics` insight identifies this as a cause of suddenly slow queries. Run ANALYZE manually on high-traffic tables and review autoanalyze settings to ensure they trigger frequently enough for your current write volume.
6Review growth trends and long-running queries
Check `postgresql-database-size` growth rate and `postgresql-queries-duration-max` to spot capacity planning needs. If database size is growing linearly but you haven't adjusted disk provisioning, you'll hit storage limits. If max query duration has crept up from seconds to minutes as data volume increased, you may need new indexes or query rewrites. These are slower-burning issues than the previous steps but signal when your original provisioning assumptions no longer match reality.

Technologies

Related Insights

Relevant Metrics

Monitoring Interfaces

PostgreSQL Prometheus
PostgreSQL Datadog
PostgreSQL Native