Cost optimization through performance tuning

infoCost Optimization

Reducing cloud database spending by optimizing queries and right-sizing instances without sacrificing performance

Prompt: My monthly AWS RDS PostgreSQL bill is $8,000 and I need to reduce costs by 30% without degrading performance. Help me identify whether I should optimize slow queries to downgrade my instance, tune autovacuum to reduce IOPS costs, or consolidate databases.

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
3,465vs2,695
Latency
1.0mmsvs58.8sms
Turns
5vs19
Tool Calls
1vs10
Schema Calls
0vs8
Response
2,644 charsvs3,757 chars
Evaluation
Without Schema
B-
vs
With Schema
B+
The response without Schema provides a reasonable phased approach with concrete SQL queries and autovacuum parameters, but lacks technical depth in explaining prioritization logic and ends with multiple clarifying questions rather than directive guidance. The response with Schema demonstrates stronger technical expertise through detailed reasoning about write amplification (unused indexes causing 11x writes), cost economics (query optimization enabling instance downsizing for recurring savings), and RDS-specific constraints. It includes an authoritative source link for autovacuum tuning and provides clearer diagnostic prioritization, making it notably more actionable and directive for an SRE to execute immediately.
Without Schema
1 tool calls
I'll help you create a comprehensive cost analysis to identify the best optimization strategy. Let me build a framework to analyze your RDS costs and performance metrics. I'll provide you with a comprehensive cost optimization analysis directly. ## Quick Assessment Framework To reduce your $8,000/month RDS bill by 30% ($2,400 savings), here's my recommendation: ### **Best Approach: Hybrid Strategy** **Start with IOPS optimization + Storage changes** (quickest wins), then pursue instance rightsizing: ### Phase 1: IOPS & Storage (Week 1) - Low Risk, High Impact **Expected savings: $1,000-$1,500/month** 1. **Switch from gp2 to gp3 storage** (if using gp2) - 20% cheaper with same/better performance - Can be done with minimal downtime 2. **Tune autovacuum aggressively**: ```sql ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05; ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 2000; ALTER SYSTEM SET autovacuum_max_workers = 4; SELECT pg_reload_conf(); ``` 3. **Clean up snapshots** - check retention policy ### Phase 2: Query Optimization (Weeks 2-3) **Expected savings: $600-$1,000/month** (enables instance downgrade) Run these diagnostic queries: ```sql -- Find your worst offenders SELECT substring(query, 1, 100) AS query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; -- Find missing indexes SELECT schemaname, tablename, seq_scan, seq_tup_read FROM pg_stat_user_tables WHERE seq_scan > 100 AND seq_tup_read > 10000 ORDER BY seq_tup_read DESC; ``` ### Phase 3: Instance Rightsizing (Week 4) **Expected savings: $800-$1,200/month** After optimizations, test downgrading instance size. --- ## Decision Criteria **Choose IOPS optimization if:** - ✓ You're using io1/io2 or provisioned IOPS - ✓ IOPS costs are >30% of your bill - ✓ Table bloat exists (check `pg_stat_user_tables`) **Choose instance rightsizing if:** - ✓ CPU < 40% and memory < 60% consistently - ✓ Instance costs are >50% of bill - ✓ After query optimization **Choose consolidation if:** - ✓ You have 3+ separate RDS instances - ✓ Each instance has <50% utilization - ✓ Acceptable complexity increase --- ## Next Steps I can help you: 1. **Generate diagnostic scripts** to collect your current metrics 2. **Analyze your CloudWatch metrics** to identify cost drivers 3. **Create a specific optimization plan** based on your actual data **What information do you have available?** - Your current instance type(s)? - Storage type (gp2/gp3/io1)? - Can you run SQL queries on your database? - Access to AWS Cost Explorer or billing details?
With Schema
10 tool calls8 Schema
I'll help you develop a data-driven approach to reduce your AWS RDS PostgreSQL costs by 30%. Let me consult Schema.ai's expert guidance on RDS cost optimization. Based on Schema.ai's expert guidance, here's a data-driven approach to achieve your 30% cost reduction ($2,400/month): ## Diagnostic Priority Framework **Run these diagnostics in parallel** to identify your biggest cost drivers: ### 1. **Query Performance Analysis** (Highest ROI potential) ```sql -- Enable timing if not already on (RDS parameter: track_io_timing = on) -- Check current slow queries SELECT query, calls, total_exec_time, mean_exec_time, blk_read_time, shared_blks_read FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20; -- Identify sequential scans causing excessive disk I/O EXPLAIN (ANALYZE, BUFFERS) <your_slow_query>; ``` **If you see:** - Queries with `Seq Scan` and high `Buffers: shared read` (28K+ pages) - `blk_read_time` elevated during scans - **Action**: Create composite indexes → can enable 100x performance improvement → downgrade instance size (biggest cost lever) ### 2. **Unused Index Detection** (Quick win for IOPS) ```sql -- Find indexes never used SELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) as size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC; ``` **Impact**: Each unused index requires write I/O for every INSERT/UPDATE/DELETE. A table with 10 indexes = 11 physical writes per INSERT. Dropping these reduces IOPS **without** any query optimization work. ### 3. **Autovacuum IOPS Analysis** Check current RDS parameter group settings: ```bash # Via AWS CLI aws rds describe-db-parameters \ --db-parameter-group-name <your-group> \ --query 'Parameters[?ParameterName==`autovacuum_vacuum_cost_limit` || ParameterName==`autovacuum_vacuum_cost_delay`]' ``` **Key tuning parameters** ([source](https://www.citusdata.com/blog/2022/07/28/debugging-postgres-autovacuum-problems-13-tips)): - `autovacuum_vacuum_cost_limit`: Increase to 10000 (from default 200) → reduces sleep frequency - `autovacuum_vacuum_cost_delay`: Decrease to 2ms (from default 20ms) → reduces sleep time - `autovacuum_work_mem`: Increase (currently max 1GB per worker) → faster vacuums ## Recommended Decision Tree **Start here:** 1. **Week 1**: Run query diagnostics above. If you find sequential scans with high disk I/O: - Create composite B-tree indexes on filtered/sorted columns - Measure query improvement with EXPLAIN ANALYZE - **This unlocks instance downsizing** (e.g., db.r5.4xlarge → db.r5.2xlarge = ~$1,200-1,500/month savings) 2. **Week 1-2**: Drop unused indexes (safe, reversible) - Reduces write amplification immediately - Lower IOPS costs (typically 10-20% reduction if you have many unused indexes) 3. **Week 2**: Only if above don't hit 30% target: - Tune autovacuum parameters to reduce cost limiting - This is lower ROI than fixing queries because autovacuum runs periodically, while query optimizations affect every request **Database consolidation** should be your last resort - it adds operational complexity and doesn't address root performance issues. ## Next Steps Run the diagnostic queries above and share: - Top 5 slowest queries with their EXPLAIN (ANALYZE, BUFFERS) output - Count of unused indexes and their total size - Current autovacuum parameter settings This will tell us which lever gives you the biggest bang for your optimization effort. **Key insight**: Query optimization → instance downsizing is almost always the highest ROI path for RDS cost reduction, because it's a recurring monthly cost cut versus IOPS optimization which only reduces variable costs.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When optimizing RDS PostgreSQL costs, start by eliminating obvious waste (unused indexes, missing indexes) that drive unnecessary IOPS and CPU usage, then assess whether your instance is right-sized by checking cache hit ratios and query patterns. Only after fixing these inefficiencies can you confidently determine if downsizing, storage optimization, or autovacuum tuning will deliver the needed 30% cost reduction.

1Identify and drop unused indexes to reduce write IOPS costs
Start here because it's the lowest-risk, highest-impact quick win. Query `postgresql.index.scans` and look for indexes with zero scans—these waste write IOPS on every INSERT/UPDATE/DELETE without providing any read benefit. A table with 10 unused indexes does 11x the write I/O it needs. Dropping unused indexes immediately reduces provisioned IOPS costs and is a prerequisite for accurately assessing remaining workload. Focus on large indexes (`postgresql.index.size` > 100MB) first for maximum impact.
2Find missing indexes causing expensive sequential scans
Check `postgresql.sequential_scans` on tables with high row counts (`postgresql.table-size` > 1GB) and correlate with `postgresql.blocks_read` to find tables doing millions of sequential scans. These sequential scans waste both read IOPS and CPU time, often driving oversized instance selection. Adding the right indexes can reduce IOPS by 10-100x and CPU usage enough to enable a 1-2 instance size downgrade. Use `postgresql.queries-duration-max` to identify which slow queries are triggering these scans.
3Check cache hit ratio to determine if instance memory is right-sized
Calculate cache hit ratio using `postgresql.blocks_hit / (postgresql.blocks_hit + postgresql.blocks_read)`. If this is consistently >99%, your shared_buffers (and thus instance memory) may be oversized for your working set—you could potentially downgrade to a smaller instance. If it's <95%, you have memory pressure causing excessive disk I/O, and downsizing would make things worse. This metric is critical for deciding whether instance downsizing is even viable as a cost optimization strategy.
4Assess table bloat for storage and IOPS waste
Check `postgresql.table-bloat` percentage and correlate with `dead-tuple` counts to identify tables with excessive bloat. Bloat directly increases storage costs (EBS charges) and indirectly increases IOPS costs because queries scan more blocks than necessary. If bloat is >20% on large tables, you're likely wasting 20%+ of your storage and IOPS budget. Calculate potential savings by checking total `postgresql.database-size` and identifying how much is reclaimable bloat.
5Analyze top queries to identify CPU optimization opportunities
Query `postgresql.queries-count` and `postgresql.queries-duration-max` to find queries with high total execution time (frequency × duration). A query running 100ms called 10K times/hour is burning more CPU than a 10s query called once. If you can optimize the top 5 queries by 50% through better indexing or query rewrites, you might reduce CPU usage enough to downgrade from a db.r5.4xlarge to db.r5.2xlarge—saving $2K+/month immediately.
6Check temp file usage to find work_mem tuning opportunities
Monitor `postgresql.temp-files` and `postgresql.temp-bytes` to identify queries spilling to disk due to inadequate work_mem. High temp file creation (>100 files/min) wastes IOPS and indicates either work_mem is too low or queries need optimization. Check `postgresql.blk-read-time` and `postgresql.blk-write-time` during temp file creation spikes—if these correlate, you're paying for expensive disk I/O that could be eliminated with proper memory tuning. This can reduce IOPS costs without requiring instance changes.
7Evaluate autovacuum effectiveness vs IOPS cost tradeoff
Check `postgresql-autovacuumed` frequency and correlate with bloat growth rate from step 4. If autovacuum is running frequently but bloat is still growing, you need more aggressive settings—but this increases IOPS costs. Conversely, if bloat is low and autovacuum runs constantly, you might be over-vacuuming and wasting IOPS. Review `postgresql.io-reads` and `postgresql.io-writes` during autovacuum windows to quantify the IOPS cost, then balance this against the cost of bloat-related storage and query performance degradation.

Technologies

Related Insights

Unused indexes consume space and write I/O without providing benefit
info
Unused indexes waste write performance and storage
info
Unused PostgreSQL indices consuming write IO and disk space
warning
Missing or ineffective indexes cause sequential scans on large tables
warning
Inefficient query plans cause production slowdowns despite dev performance
warning
High sequential scan count may indicate missing indexes requiring query optimization
warning
Missing database indexes cause 30+ second query times in production
warning
Autovacuum cost limits throttle vacuum during peak I/O periods
info
Cache hit ratio below 99% indicates insufficient shared_buffers memory
warning
cost optimization opportunities
diagnostic_step
Stub insight created from scenario discovery hint: cost optimization opportunities
slow queries degrading performance
prerequisite
Stub insight created from scenario discovery hint: slow queries degrading performance
autovacuum effectiveness
prerequisite
Stub insight created from scenario discovery hint: autovacuum effectiveness
storage cost impact from bloat
supporting_evidence
Stub insight created from scenario discovery hint: storage cost impact from bloat
need for query optimization
prerequisite
Stub insight created from scenario discovery hint: need for query optimization
need for index creation
prerequisite
Stub insight created from scenario discovery hint: need for index creation
Table bloat consuming excessive disk space
prerequisitewarning
Inadequate work_mem causes query spill to disk
prerequisitewarning

Relevant Metrics

Monitoring Interfaces

PostgreSQL CloudWatch Metrics
PostgreSQL Datadog Integration
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Prometheus
PostgreSQL Datadog
PostgreSQL Native