Connection pool exhaustion under load
criticalIncident Response
Application connection pool hitting max_connections limit, causing new requests to fail
Prompt: “My PostgreSQL database is rejecting new connections with 'too many connections' errors. Current connections are at 95 out of 100. Should I increase max_connections or is this masking a connection leak in my application?”
With vs Without Schema
Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.
Tokens
601vs852
Latency
15.9smsvs19.1sms
Turns
2vs2
Tool Calls
0vs0
Schema Calls
0vs0
Response
1,490 charsvs2,053 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
Both responses correctly identify this as a connection leak rather than a capacity issue and provide similar diagnostic SQL queries and connection pooling recommendations. The response with Schema has slightly better organization with explicit 'red flags' and 'common causes' sections that make the guidance more scannable and actionable for an SRE under pressure. The response without Schema provides more specific pool sizing guidance (10-20 connections per instance, 100-200 pooled to 20-40 actual), which is valuable quantitative advice. Overall, both responses would reliably get an operator on the right track, with the Schema version edging ahead due to clearer structure and the helpful 'red flags' checklist for interpreting query results.
Without Schema
This is likely masking a connection leak or poor connection management. Here's why:
## Don't just increase max_connections
**Why it's usually wrong:**
- Each PostgreSQL connection consumes ~10MB RAM plus overhead
- 95/100 usage suggests connections aren't being released properly
- Increasing the limit just delays the same problem
**What to check first:**
1. **Identify connection sources:**
```sql
SELECT client_addr, state, count(*)
FROM pg_stat_activity
GROUP BY client_addr, state
ORDER BY count DESC;
```
2. **Find long-running idle connections:**
```sql
SELECT pid, usename, application_name, state, state_change, now() - state_change AS idle_duration
FROM pg_stat_activity
WHERE state = 'idle'
ORDER BY state_change;
```
3. **Check for leaked connections in your app:**
- Are you closing connections in error handlers?
- Using connection pools without limits?
- Missing `finally` blocks or defer statements?
## The right fix
**Use connection pooling** (if not already):
- **Application-level:** PgBouncer or your framework's pool (set max pool size)
- **Database-level:** Keep `max_connections` at 100, let pooler multiplex
- **Typical setup:** 100-200 pooled connections → ~20-40 actual DB connections
**For your app:**
- Set connection timeout/max lifetime
- Limit pool size to 10-20 per instance (not hundreds)
- Always close connections in defer/finally blocks
What's your current setup? Are you using a connection pool, and if so, what are its settings?
With Schema
This is almost certainly a **connection leak** in your application rather than a genuine need for more connections. Here's why:
## Why not to just increase max_connections
1. **100 connections is plenty** for most applications - even high-traffic sites typically need far fewer active connections than you'd expect
2. **PostgreSQL memory overhead** - each connection consumes ~10MB of RAM, so scaling connections has real costs
3. **Masks the real problem** - you'll just hit the new limit eventually if there's a leak
## Diagnose the issue first
Run these queries on your database:
```sql
-- See what connections are doing
SELECT
state,
COUNT(*)
FROM pg_stat_activity
WHERE datname = 'your_database_name'
GROUP BY state;
-- Find idle connections
SELECT
pid,
usename,
application_name,
client_addr,
state,
state_change,
query
FROM pg_stat_activity
WHERE state = 'idle'
AND state_change < NOW() - INTERVAL '5 minutes'
ORDER BY state_change;
```
**Red flags:**
- Lots of `idle` connections (not `idle in transaction`)
- Connections from the same application piling up
- Very old `state_change` timestamps
## Common causes & solutions
1. **No connection pooling** → Use pgBouncer or a connection pool in your app (like HikariCP, pg-pool, SQLAlchemy pooling)
2. **Connections not closed** → Ensure `conn.close()` in finally blocks or use context managers
3. **Poor pool configuration** → Pool size too large or connections never timing out
4. **Long-running transactions** → Transactions holding connections open indefinitely
## The right fix
Instead of increasing max_connections:
1. **Add connection pooling** (pgBouncer recommended - can handle 1000s of clients with 100 DB connections)
2. **Set connection timeouts** in your app
3. **Use connection lifecycle management** (context managers, try-finally blocks)
4. **Monitor connection age** and kill leaked connections as a temporary measure
Would you like me to help investigate your application code to find where connections might be leaking?
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
When diagnosing PostgreSQL connection exhaustion, first distinguish between genuine concurrent load and connection leaks by examining connection state distribution. Check for idle or leaked connections and long-running queries before considering configuration changes—connection pooling with PgBouncer is usually the right solution over blindly increasing max_connections, which can actually degrade performance.
1Check connection utilization and state distribution
Start by examining `postgresql.backends` against `postgresql.connection.max` and checking `postgresql.percent-usage-connections` to confirm you're actually at the limit. Query pg_stat_activity to break down connections by state (active, idle, idle in transaction). If most connections are 'idle' or 'idle in transaction' for extended periods, you have a connection leak, not a capacity problem. The `connection-exhaustion-max-connections` insight emphasizes monitoring connection state distribution to identify idle or long-running connections before adjusting limits.
2Identify leaked connections by application or process
Use `postgresql-connections-by-process` to see which applications or processes are holding connections. Look for processes with a high count of idle connections that aren't being released. The `connection-limit-approaching` insight specifically mentions checking for connections in 'idle' state for hours in pg_stat_activity—these are classic connection leaks indicating the application isn't properly closing connections or has misconfigured timeout settings.
3Check for long-running queries holding connections
Examine `postgresql-active-queries` to see if connections are being held by long-running operations. Query pg_stat_activity for queries with long query_start times. Even if individual queries aren't slow, connection pool exhaustion can occur when legitimate concurrent operations all need connections simultaneously. This helps distinguish between connection leaks (idle) and genuine concurrent load (active).
4Evaluate application connection pool configuration
Before touching PostgreSQL settings, review your application's connection pool configuration. Check pool size limits, connection timeout settings, and whether connections are being properly returned to the pool after use. The `connection-limit-exhaustion` insight notes that each PostgreSQL connection consumes 5-10MB RAM via the fork model before queries even run, so throwing more connections at the problem without fixing application-side leaks will waste memory and degrade performance through excessive context-switching.
5Implement PgBouncer connection pooling instead of raising max_connections
For most applications hitting 100 connections, the solution is PgBouncer in transaction mode, not increasing max_connections. The `connection-limit-exhaustion` insight specifically recommends configuring PgBouncer with default_pool_size=20 actual database connections while supporting max_client_conn=1000 application connections. This multiplexes many app connections onto few database connections, avoiding the context-switching overhead that makes PostgreSQL unresponsive at 500+ connections. Only increase max_connections to 200-300 if you have genuine concurrent transaction needs and sufficient RAM (each connection adds 5-10MB overhead).
Technologies
Related Insights
Connection exhaustion prevents new client connections
critical
Connection exhaustion prevents new client connections
critical
Each PostgreSQL connection consumes 5-10MB RAM via fork model. At 200 connections: 1-2GB RAM overhead before queries run. At 500+ connections: excessive context-switching dominates query execution time. Database becomes unresponsive under concurrent load.
Connection failure when requests exceed default 100 connection limit
critical
Connection count approaching max_connections limit
critical
Connection limit approaching maximum causes application timeouts
critical
Connection count approaching max_connections causes connection failures
critical
Connection exhaustion blocks new client connections
critical
Slow Query Backlog Masks Redis Connection Pool Exhaustion
warning
Redis slowlog entries accumulating (redis.slowlog.length rising) can indicate operations blocking on network or disk I/O, exhausting connection pools and causing cascading failures in dependent services even when Redis CPU appears healthy.
Connection Rejection Cascade From Maxclient Saturation
critical
When Redis reaches maximum client connections (redis.connections.rejected increasing), new connection attempts fail silently while existing connections continue working normally, creating intermittent failures that are difficult to diagnose from application metrics alone.
Redis Connection Pool Starvation from Blocking Patterns
warning
When async endpoints make synchronous Redis calls, they hold connections longer than necessary while blocking the event loop, causing artificial connection pool exhaustion even when Redis server capacity is available.
Redis Connection Saturation Stalls Async Event Loop
critical
When Redis connection pool exhausts under high concurrency, blocking Redis operations (even from async endpoints) stall the FastAPI event loop, causing serial-like request processing and tail latency spikes despite low CPU utilization.
PostgreSQL connection count exceeds limits causing memory exhaustion
warning
Relevant Metrics
Monitoring Interfaces
PostgreSQL Native
Redis Native Metrics