Right-Sizing for Workload Growth

warningCapacity Planning

Need to determine if current instance size is appropriate as workload has grown and performance may be degrading.

Prompt: “Help me determine whether my Postgres deployment on Google Cloud SQL should be provisioned up or down based on my current workload. We started on this instance size 6 months ago and traffic has doubled.”

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When determining if you need to scale your PostgreSQL instance, start by checking buffer cache hit ratios and memory pressure indicators—these reveal the most common bottleneck first. Then examine I/O patterns, checkpoint frequency, and workload characteristics over your 6-month growth period to understand whether you're hitting sustained resource limits or just peak spikes. Finally, assess headroom to decide if you need vertical scaling, config tuning, or if you're actually over-provisioned.

1Check buffer cache hit ratio for memory adequacy

Start with `postgresql-buffer-hit` and `hit`—your buffer cache hit ratio should be >95% for a healthy system. If it's dropped below 90% as traffic doubled, you're likely memory-constrained and forcing too many disk reads. Also check `buffercache-summary-buffer-used` to confirm you're actually using all available shared_buffers—if you're maxing out buffers and hit ratio is low, that's a clear signal the `memory-bottleneck` insight applies and you need more RAM.

memory bottleneck buffer_hithitsbuffercache_summary_buffer_used

2Look for memory pressure in temp file usage

Check `postgresql-temp-bytes` for temporary file usage—this is one of the clearest signs you're memory-constrained. If you're seeing sustained temp file writes (especially multi-GB per hour), your work_mem or shared_buffers are too small for your query workload. Temp file usage directly translates to slower queries and higher I/O, and often means you need to scale up instance size to get more RAM rather than just tuning config.

memory bottleneck I/O bottleneck postgresql.temp_bytes

3Analyze disk read patterns and I/O pressure

Compare `postgresql-disk-read` trends over the past 6 months—if disk reads have more than doubled while traffic only doubled, you're likely hitting a `memory-bottleneck` that's spilling to disk. Cross-reference this with `checkpoint-timed` frequency; if checkpoints are happening more frequently than every 5-10 minutes, you may have I/O pressure from writes as well. The `i-o-bottleneck` insight helps determine if this is a disk throughput limit or just a symptom of insufficient RAM.

memory bottleneck I/O bottleneck disk_readcheckpoint_timed

4Examine workload growth and connection patterns

Use `postgresql-backends` and `postgresql-connections-by-process` to understand your `workload-growth-pattern`—has connection count doubled linearly with traffic, or are you seeing connection spikes? Check if the growth follows `peak-vs-sustained-load-characteristics`; if you're only hitting limits during brief peak hours, you might benefit from connection pooling or auto-scaling rather than a larger instance. The `workload-characteristics-and-trends` insight helps determine if this is organic growth or inefficient connection management.

workload growth pattern peak vs sustained load characteristics workload characteristics and trends postgresql.backendsconnections_by_process

5Assess current capacity headroom and utilization thresholds

Look at whether you're consistently exceeding 70-80% utilization on key resources—that's the threshold where `resource-utilization-exceeding-headroom-threshold` becomes a concern. The `capacity-headroom-adequacy` insight considers whether you have enough buffer for traffic spikes or continued growth. If you're routinely hitting 85%+ on memory or seeing degraded buffer hit ratios during normal load, you've already outgrown your current tier and the `need-for-vertical-scaling` is clear.

resource utilization exceeding headroom threshold capacity headroom adequacy need for vertical scaling buffercache_summary_buffer_usedbuffer_hit

6Make scaling decision based on bottleneck patterns

If you've identified a clear `cpu-bottleneck`, `memory-bottleneck`, or `i-o-bottleneck` with sustained utilization above 80%, scale up. If metrics show you're only using 40-60% of resources even during peak traffic, consider scaling down to save costs. In my experience with Cloud SQL, if buffer cache hit ratio is >98%, temp files are minimal, and you have <60% resource utilization, you're likely over-provisioned—but if any single resource is consistently maxed out, that's your scaling signal regardless of what the other metrics show.

CPU bottleneck memory bottleneck I/O bottleneck need for vertical scaling buffer_hitpostgresql.temp_bytesbuffercache_summary_buffer_used