Cost optimization through performance tuning

infoCost Optimization

Reducing cloud database spending by optimizing queries and right-sizing instances without sacrificing performance

Prompt: “My monthly AWS RDS PostgreSQL bill is $8,000 and I need to reduce costs by 30% without degrading performance. Help me identify whether I should optimize slow queries to downgrade my instance, tune autovacuum to reduce IOPS costs, or consolidate databases.”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

3,465vs2,695

Latency

1.0mmsvs58.8sms

Turns

5vs19

Tool Calls

1vs10

Schema Calls

0vs8

Response

2,644 charsvs3,757 chars

Evaluation

Without Schema

B-

With Schema

B+

The response without Schema provides a reasonable phased approach with concrete SQL queries and autovacuum parameters, but lacks technical depth in explaining prioritization logic and ends with multiple clarifying questions rather than directive guidance. The response with Schema demonstrates stronger technical expertise through detailed reasoning about write amplification (unused indexes causing 11x writes), cost economics (query optimization enabling instance downsizing for recurring savings), and RDS-specific constraints. It includes an authoritative source link for autovacuum tuning and provides clearer diagnostic prioritization, making it notably more actionable and directive for an SRE to execute immediately.

Without Schema

1 tool calls

I'll help you create a comprehensive cost analysis to identify the best optimization strategy. Let me build a framework to analyze your RDS costs and performance metrics.
I'll provide you with a comprehensive cost optimization analysis directly.

## Quick Assessment Framework

To reduce your $8,000/month RDS bill by 30% ($2,400 savings), here's my recommendation:

### **Best Approach: Hybrid Strategy**

**Start with IOPS optimization + Storage changes** (quickest wins), then pursue instance rightsizing:

### Phase 1: IOPS & Storage (Week 1) - Low Risk, High Impact
**Expected savings: $1,000-$1,500/month**

1. **Switch from gp2 to gp3 storage** (if using gp2)
   - 20% cheaper with same/better performance
   - Can be done with minimal downtime
   
2. **Tune autovacuum aggressively**:
   ```sql
   ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05;
   ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 2000;
   ALTER SYSTEM SET autovacuum_max_workers = 4;
   SELECT pg_reload_conf();
   ```

3. **Clean up snapshots** - check retention policy

### Phase 2: Query Optimization (Weeks 2-3)
**Expected savings: $600-$1,000/month** (enables instance downgrade)

Run these diagnostic queries:
```sql
-- Find your worst offenders
SELECT substring(query, 1, 100) AS query,
       calls, 
       mean_exec_time,
       total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 10;

-- Find missing indexes
SELECT schemaname, tablename, seq_scan, seq_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 100 AND seq_tup_read > 10000
ORDER BY seq_tup_read DESC;
```

### Phase 3: Instance Rightsizing (Week 4)
**Expected savings: $800-$1,200/month**

After optimizations, test downgrading instance size.

---

## Decision Criteria

**Choose IOPS optimization if:**
- ✓ You're using io1/io2 or provisioned IOPS
- ✓ IOPS costs are >30% of your bill
- ✓ Table bloat exists (check `pg_stat_user_tables`)

**Choose instance rightsizing if:**
- ✓ CPU < 40% and memory < 60% consistently
- ✓ Instance costs are >50% of bill
- ✓ After query optimization

**Choose consolidation if:**
- ✓ You have 3+ separate RDS instances
- ✓ Each instance has <50% utilization
- ✓ Acceptable complexity increase

---

## Next Steps

I can help you:
1. **Generate diagnostic scripts** to collect your current metrics
2. **Analyze your CloudWatch metrics** to identify cost drivers
3. **Create a specific optimization plan** based on your actual data

**What information do you have available?**
- Your current instance type(s)?
- Storage type (gp2/gp3/io1)?
- Can you run SQL queries on your database?
- Access to AWS Cost Explorer or billing details?

With Schema

10 tool calls8 Schema

I'll help you develop a data-driven approach to reduce your AWS RDS PostgreSQL costs by 30%. Let me consult Schema.ai's expert guidance on RDS cost optimization.
Based on Schema.ai's expert guidance, here's a data-driven approach to achieve your 30% cost reduction ($2,400/month):

## Diagnostic Priority Framework

**Run these diagnostics in parallel** to identify your biggest cost drivers:

### 1. **Query Performance Analysis** (Highest ROI potential)

```sql
-- Enable timing if not already on (RDS parameter: track_io_timing = on)
-- Check current slow queries
SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  blk_read_time,
  shared_blks_read
FROM pg_stat_statements 
ORDER BY total_exec_time DESC 
LIMIT 20;

-- Identify sequential scans causing excessive disk I/O
EXPLAIN (ANALYZE, BUFFERS) <your_slow_query>;
```

**If you see:**
- Queries with `Seq Scan` and high `Buffers: shared read` (28K+ pages)
- `blk_read_time` elevated during scans
- **Action**: Create composite indexes → can enable 100x performance improvement → downgrade instance size (biggest cost lever)

### 2. **Unused Index Detection** (Quick win for IOPS)

```sql
-- Find indexes never used
SELECT 
  schemaname, 
  tablename, 
  indexname, 
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes 
WHERE idx_scan = 0 
ORDER BY pg_relation_size(indexrelid) DESC;
```

**Impact**: Each unused index requires write I/O for every INSERT/UPDATE/DELETE. A table with 10 indexes = 11 physical writes per INSERT. Dropping these reduces IOPS **without** any query optimization work.

### 3. **Autovacuum IOPS Analysis**

Check current RDS parameter group settings:

```bash
# Via AWS CLI
aws rds describe-db-parameters \
  --db-parameter-group-name <your-group> \
  --query 'Parameters[?ParameterName==`autovacuum_vacuum_cost_limit` || ParameterName==`autovacuum_vacuum_cost_delay`]'
```

**Key tuning parameters** ([source](https://www.citusdata.com/blog/2022/07/28/debugging-postgres-autovacuum-problems-13-tips)):
- `autovacuum_vacuum_cost_limit`: Increase to 10000 (from default 200) → reduces sleep frequency
- `autovacuum_vacuum_cost_delay`: Decrease to 2ms (from default 20ms) → reduces sleep time
- `autovacuum_work_mem`: Increase (currently max 1GB per worker) → faster vacuums

## Recommended Decision Tree

**Start here:**
1. **Week 1**: Run query diagnostics above. If you find sequential scans with high disk I/O:
   - Create composite B-tree indexes on filtered/sorted columns
   - Measure query improvement with EXPLAIN ANALYZE
   - **This unlocks instance downsizing** (e.g., db.r5.4xlarge → db.r5.2xlarge = ~$1,200-1,500/month savings)

2. **Week 1-2**: Drop unused indexes (safe, reversible)
   - Reduces write amplification immediately
   - Lower IOPS costs (typically 10-20% reduction if you have many unused indexes)

3. **Week 2**: Only if above don't hit 30% target:
   - Tune autovacuum parameters to reduce cost limiting
   - This is lower ROI than fixing queries because autovacuum runs periodically, while query optimizations affect every request

**Database consolidation** should be your last resort - it adds operational complexity and doesn't address root performance issues.

## Next Steps

Run the diagnostic queries above and share:
- Top 5 slowest queries with their EXPLAIN (ANALYZE, BUFFERS) output
- Count of unused indexes and their total size
- Current autovacuum parameter settings

This will tell us which lever gives you the biggest bang for your optimization effort.

**Key insight**: Query optimization → instance downsizing is almost always the highest ROI path for RDS cost reduction, because it's a recurring monthly cost cut versus IOPS optimization which only reduces variable costs.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When optimizing RDS PostgreSQL costs, start by eliminating obvious waste (unused indexes, missing indexes) that drive unnecessary IOPS and CPU usage, then assess whether your instance is right-sized by checking cache hit ratios and query patterns. Only after fixing these inefficiencies can you confidently determine if downsizing, storage optimization, or autovacuum tuning will deliver the needed 30% cost reduction.

1Identify and drop unused indexes to reduce write IOPS costs

Start here because it's the lowest-risk, highest-impact quick win. Query `postgresql.index.scans` and look for indexes with zero scans—these waste write IOPS on every INSERT/UPDATE/DELETE without providing any read benefit. A table with 10 unused indexes does 11x the write I/O it needs. Dropping unused indexes immediately reduces provisioned IOPS costs and is a prerequisite for accurately assessing remaining workload. Focus on large indexes (`postgresql.index.size` > 100MB) first for maximum impact.

Unused indexes consume space and write I/O without providing benefit Unused indexes waste write performance and storage Unused PostgreSQL indices consuming write IO and disk space postgresql.index.scanspostgresql.index.sizepostgresql.io.writes

2Find missing indexes causing expensive sequential scans

Check `postgresql.sequential_scans` on tables with high row counts (`postgresql.table-size` > 1GB) and correlate with `postgresql.blocks_read` to find tables doing millions of sequential scans. These sequential scans waste both read IOPS and CPU time, often driving oversized instance selection. Adding the right indexes can reduce IOPS by 10-100x and CPU usage enough to enable a 1-2 instance size downgrade. Use `postgresql.queries-duration-max` to identify which slow queries are triggering these scans.

Missing or ineffective indexes cause sequential scans on large tables High sequential scan count may indicate missing indexes requiring query optimization Missing database indexes cause 30+ second query times in production need for index creation postgresql.sequential_scanstable_sizepostgresql.blocks_readqueries_duration_maxpostgresql.io.reads

3Check cache hit ratio to determine if instance memory is right-sized

Calculate cache hit ratio using `postgresql.blocks_hit / (postgresql.blocks_hit + postgresql.blocks_read)`. If this is consistently >99%, your shared_buffers (and thus instance memory) may be oversized for your working set—you could potentially downgrade to a smaller instance. If it's <95%, you have memory pressure causing excessive disk I/O, and downsizing would make things worse. This metric is critical for deciding whether instance downsizing is even viable as a cost optimization strategy.

Cache hit ratio below 99% indicates insufficient shared_buffers memory postgresql.blocks_hitpostgresql.blocks_readbuffer_hit

4Assess table bloat for storage and IOPS waste

Check `postgresql.table-bloat` percentage and correlate with `dead-tuple` counts to identify tables with excessive bloat. Bloat directly increases storage costs (EBS charges) and indirectly increases IOPS costs because queries scan more blocks than necessary. If bloat is >20% on large tables, you're likely wasting 20%+ of your storage and IOPS budget. Calculate potential savings by checking total `postgresql.database-size` and identifying how much is reclaimable bloat.

Table bloat consuming excessive disk space storage cost impact from bloat table_bloatn_dead_tuppostgresql.database.sizetable_size

5Analyze top queries to identify CPU optimization opportunities

Query `postgresql.queries-count` and `postgresql.queries-duration-max` to find queries with high total execution time (frequency × duration). A query running 100ms called 10K times/hour is burning more CPU than a 10s query called once. If you can optimize the top 5 queries by 50% through better indexing or query rewrites, you might reduce CPU usage enough to downgrade from a db.r5.4xlarge to db.r5.2xlarge—saving $2K+/month immediately.

slow queries degrading performance Inefficient query plans cause production slowdowns despite dev performance need for query optimization postgresql.queries.countqueries_duration_maxpostgresql.backends

6Check temp file usage to find work_mem tuning opportunities

Monitor `postgresql.temp-files` and `postgresql.temp-bytes` to identify queries spilling to disk due to inadequate work_mem. High temp file creation (>100 files/min) wastes IOPS and indicates either work_mem is too low or queries need optimization. Check `postgresql.blk-read-time` and `postgresql.blk-write-time` during temp file creation spikes—if these correlate, you're paying for expensive disk I/O that could be eliminated with proper memory tuning. This can reduce IOPS costs without requiring instance changes.

Inadequate work_mem causes query spill to disk postgresql.temp_filespostgresql.temp_bytestemp_bytespostgresql.blk_read_timepostgresql.blk_write_time

7Evaluate autovacuum effectiveness vs IOPS cost tradeoff

Check `postgresql-autovacuumed` frequency and correlate with bloat growth rate from step 4. If autovacuum is running frequently but bloat is still growing, you need more aggressive settings—but this increases IOPS costs. Conversely, if bloat is low and autovacuum runs constantly, you might be over-vacuuming and wasting IOPS. Review `postgresql.io-reads` and `postgresql.io-writes` during autovacuum windows to quantify the IOPS cost, then balance this against the cost of bloat-related storage and query performance degradation.

Autovacuum cost limits throttle vacuum during peak I/O periods autovacuum effectiveness autovacuumedpostgresql.io.readspostgresql.io.writestable_bloatn_dead_tup