Cost Optimization for Over-Provisioned Instance

infoCost Optimization

Database instance appears over-provisioned with low resource utilization, presenting cost saving opportunity.

Prompt: “Our monthly PostgreSQL bill on RDS is $5000 but CPU utilization averages 15% and memory is at 40%. How do I safely downsize to save money without risking performance issues?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

1,395vs1,408

Latency

30.4smsvs32.1sms

Turns

6vs6

Tool Calls

2vs2

Schema Calls

0vs0

Response

3,105 charsvs3,200 chars

Evaluation

Without Schema

B+

With Schema

A-

Both responses provide actionable downsizing guidance. The response without Schema is extremely comprehensive with detailed AWS CLI commands, specific metrics, and thorough coverage of cost optimizations, but doesn't mention Blue/Green deployments. The response with Schema takes a more structured, safety-first approach by leading with Blue/Green deployment testing (zero-downtime, safest method) and clear risk mitigation phases. While slightly less encyclopedic, the Schema-enhanced response better aligns with the question's emphasis on "safely" downsizing by prioritizing testing strategies over jumping directly to execution.

Without Schema

2 tool calls

I'll help you safely downsize your RDS instance. Let me first check if you have any infrastructure code or configuration files in your project.
Based on your metrics (15% CPU, 40% memory), you have significant room to downsize. Here's a safe approach:

## Immediate Analysis Steps

**1. Identify current instance details:**
```bash
# Get instance type, storage, and IOPS
aws rds describe-db-instances --db-instance-identifier <your-db-name> \
  --query 'DBInstances[0].{Type:DBInstanceClass,Storage:AllocatedStorage,IOPS:Iops,MultiAZ:MultiAZ}'
```

**2. Analyze peak usage patterns:**
```bash
# Check CloudWatch metrics for the last 30 days
# Look for CPU/Memory peaks, not just averages
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=<your-db-name> \
  --start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Maximum,Average
```

## Safe Downsizing Strategy

**Target sizing based on your metrics:**
- 15% CPU → safe to go 2-3 instance sizes down
- 40% memory → aim for 60-70% utilization on new instance

**Example sizing path:**
- If current: `db.r6g.2xlarge` (8 vCPU, 64GB) ~$1,200/mo
- Target: `db.r6g.xlarge` (4 vCPU, 32GB) ~$600/mo
- Savings: ~50%

## Step-by-Step Migration Plan

**1. Enable Enhanced Monitoring (if not already):**
```bash
aws rds modify-db-instance \
  --db-instance-identifier <your-db-name> \
  --monitoring-interval 60 \
  --monitoring-role-arn <monitoring-role-arn> \
  --apply-immediately
```

**2. Create a test window:**
- Take a snapshot before changes
- Plan during low-traffic period
- Have rollback plan ready

**3. Modify instance (causes brief downtime):**
```bash
aws rds modify-db-instance \
  --db-instance-identifier <your-db-name> \
  --db-instance-class db.r6g.xlarge \
  --apply-immediately
```

**4. Monitor for 48-72 hours:**
- Watch for CPU spikes > 80%
- Memory pressure indicators
- Query performance changes
- Connection pool saturation

## Critical Checks Before Downsizing

1. **Connection count** - Smaller instances have lower max_connections
2. **Burst workloads** - Check 99th percentile, not averages
3. **Background jobs** - VACUUM, backups, analytics queries
4. **Read replicas** - Can offload reporting queries
5. **Storage IOPS** - Separate from compute, may need adjustment

## Red Flags to Watch

- Regular CPU spikes > 70% (even if average is low)
- Swap usage > 0
- High checkpoint wait times
- Increasing query latency
- Connection errors

## Additional Cost Optimizations

1. **Reserved Instances** - 40-60% discount for 1-year commitment
2. **Graviton instances** (r6g/t4g) - 20% cheaper than Intel (r6i/t3)
3. **Storage optimization** - Move to gp3 from gp2 (10-20% cheaper)
4. **Connection pooling** - PgBouncer to reduce connection overhead

Would you like me to help you:
1. Analyze your specific RDS metrics?
2. Create a monitoring dashboard?
3. Write a migration script with automatic rollback?
4. Set up CloudWatch alarms for the new instance size?

With Schema

2 tool calls

I'll help you safely downsize your RDS PostgreSQL instance. Let me first check if you have any infrastructure configuration files in your working directory that might be relevant.
## Safe RDS PostgreSQL Downsizing Strategy

With 15% CPU and 40% memory utilization, you have significant room to downsize. Here's how to do it safely:

### 1. **Gather Baseline Metrics** (1-2 weeks)
```bash
# Key metrics to analyze in CloudWatch:
# - CPU Utilization (you have this)
# - FreeableMemory & SwapUsage
# - DatabaseConnections (peak and average)
# - ReadIOPS & WriteIOPS
# - ReadLatency & WriteLatency
# - NetworkReceiveThroughput & NetworkTransmitThroughput
```

**Critical**: Look at **peak** usage during:
- Business hours vs off-hours
- Month-end processing
- Backup windows
- Any batch jobs

### 2. **Calculate Target Instance Size**

Rule of thumb for safety margins:
- **CPU**: Peak should stay under 60-70%
- **Memory**: Keep 20-30% headroom
- **IOPS**: Maintain current performance tier

If your peaks are CPU 25% and memory 55%, you can likely go down **2 instance sizes**.

### 3. **Test Plan**

**Option A: Blue/Green Deployment** (safest, zero downtime)
```bash
# AWS RDS Blue/Green deployments allow testing with real traffic
aws rds create-blue-green-deployment \
  --blue-green-deployment-name downsize-test \
  --source-arn arn:aws:rds:region:account:db:your-db \
  --target-engine-version <current-version>
```
Then modify the green environment to smaller instance type and test.

**Option B: Read Replica Test**
1. Create read replica with target instance size
2. Route read-only queries to replica for 1 week
3. Monitor performance metrics

### 4. **Downsize Execution**

```bash
# Modify instance (requires brief downtime during maintenance window)
aws rds modify-db-instance \
  --db-instance-identifier your-db \
  --db-instance-class db.r6g.xlarge \  # example target
  --apply-immediately  # or schedule for maintenance window
```

**Downtime**: Typically 5-15 minutes depending on instance size.

### 5. **Post-Change Monitoring** (2-4 weeks)

Watch for:
- Increased query latency
- Connection pool saturation
- Memory pressure (SwapUsage > 0 is a red flag)
- CPU spikes during peak times

### 6. **Quick Rollback Plan**

Keep this ready:
```bash
# Can scale back up in ~5-15 minutes
aws rds modify-db-instance \
  --db-instance-identifier your-db \
  --db-instance-class <previous-size> \
  --apply-immediately
```

### Instance Type Recommendations

If you're on:
- **db.r5.2xlarge** → Try **db.r6g.xlarge** (ARM-based, better price/performance)
- **db.r5.xlarge** → Try **db.r6g.large**
- **db.m5.2xlarge** → Try **db.r6g.xlarge**

**Graviton2 (r6g/m6g)** instances offer ~20% better price-performance than r5/m5.

### Red Flags to Watch

❌ Don't downsize if:
- SwapUsage is already > 0
- Connection count is near max_connections
- IOPS are frequently throttled
- You have unpredictable traffic spikes

---

**Expected Savings**: Going down 1 instance size typically saves 40-50%. For example:
- db.r5.2xlarge (~$700/mo) → db.r6g.xlarge (~$350/mo) = **$4,200/year saved**

Would you like help analyzing your specific CloudWatch metrics or setting up a blue/green deployment test?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When evaluating an over-provisioned RDS PostgreSQL instance for downsizing, start by validating buffer cache efficiency and connection headroom against current resource allocation. Then review workload patterns for peaks and growth trends to ensure the downsized instance maintains adequate headroom. Finally, layer in storage optimizations and reserved instance purchasing for maximum cost savings.

1Validate buffer cache utilization and hit ratio

Check `buffercache-summary-buffer-used` against your current shared_buffers allocation — if you're using less than 60% of allocated buffers at peak times, there's clear over-provisioning. Pair this with `postgresql-buffer-hit` ratio: if you're maintaining >95% buffer hits with low buffer utilization, you can safely reduce memory without degrading performance. This is the first check because buffer cache is often the largest memory consumer and the best indicator of whether you're truly over-provisioned.

instance over-provisioned for workload safe to downsize instance class buffercache_summary_buffer_usedbuffer_hit

2Assess connection utilization and headroom

Review `postgresql-percent-usage-connections` and `postgresql-backends` to understand connection pressure. If connection utilization is consistently below 50% and `postgresql-backends` shows stable, low counts relative to your max_connections limit, you're safely under-utilizing connection capacity. Connection limits scale with instance class, so this validates whether a smaller instance's reduced connection limit would still serve your workload comfortably.

instance over-provisioned for workload maintain headroom for growth percent_usage_connectionspostgresql.backendsconnections_by_process

3Analyze workload patterns for peak usage and variability

Look at historical trends in `postgresql-backends`, `checkpoint-timed`, and `postgresql-disk-read` over at least 30 days to identify peak load periods — are your peaks 2x your average or 10x? If your 95th percentile CPU is still below 40% and your memory peaks stay under 60%, downsizing is safe. The 15% average is meaningless if you have weekly batch jobs that spike to 90%; always size for peaks plus 20-30% headroom, not averages.

safe to downsize instance class maintain headroom for growth postgresql.backendscheckpoint_timeddisk_read

4Evaluate storage tier optimization opportunities

Check the ratio of `postgresql-disk-read` to your IOPS allocation and review `postgresql-database-size` to assess if you're paying for high-performance storage (io2, gp3 with high IOPS) that you're not using. If disk reads are low relative to provisioned IOPS, you can often downgrade storage tier independently of instance class for immediate savings. This is particularly valuable because storage costs can represent 30-40% of total RDS spend.

storage tier optimization possible cost optimization opportunity disk_readpostgresql.database.size

5Project growth trends to maintain future headroom

Calculate the growth rate of `postgresql-database-size` and trend `postgresql-backends` over the past 6 months to forecast resource needs 12-18 months out. If your database is growing at 10% monthly or you're onboarding new applications, factor that into your downsizing decision — you don't want to right-size only to need an upgrade in 6 months. The insight here is that downsizing should leave enough headroom for predictable growth without requiring another migration soon.

maintain headroom for growth safe to downsize instance class postgresql.database.sizepostgresql.backends

6Consider reserved instance purchasing after right-sizing

Once you've identified the correctly-sized instance class and confirmed stable workload patterns with `postgresql-backends` and other metrics, evaluate 1-year or 3-year reserved instance commitments for additional 30-60% cost savings. Don't buy reserved instances before downsizing — right-size first on on-demand instances, validate performance for 2-4 weeks, then commit. This two-step approach maximizes savings while minimizing risk of under-provisioning.

reserved instance purchase opportunity cost optimization opportunity postgresql.backends