Connection Timeout and Network Issues

criticalIncident Response

Application experiencing connection timeouts or network-related PostgreSQL errors.

Prompt: Our app is getting intermittent 'could not connect to server' and connection timeout errors from PostgreSQL. How do I diagnose whether this is a network issue, database overload, or configuration problem?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When troubleshooting PostgreSQL connection timeouts, start by confirming the server is running and accessible, then check for connection pool exhaustion—the most common culprit. Only after ruling out pool issues should you investigate network connectivity, database-side resource constraints, and timeout configurations. The key is distinguishing between client-side pool exhaustion, network problems, and server overload.

1Verify PostgreSQL server is running and accessible
First things first: check that PostgreSQL is actually running with `sudo systemctl status postgresql` or equivalent. Look at the `uptime` metric—if it's recently reset or shows zero, the server crashed. Verify you're connecting to the correct port (default 5432) by checking postgresql.conf. If the server is down or the port is mismatched, you'll get immediate connection refusals rather than intermittent timeouts, but it's the quickest thing to rule out.
2Check for connection pool exhaustion
In my experience, this is the #1 cause of intermittent connection errors. Check `postgresql.backends` against max_connections—if you're consistently at 95%+ capacity, new connection attempts will timeout or fail. Use `connections_by_process` to identify which applications are hogging connections. Run `SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction'` to find stuck connections that aren't being released. If using Django, verify CONN_MAX_AGE is set appropriately (0 for dev, 60-300 seconds for production)—values above 600 almost always cause pool exhaustion.
3Test basic network connectivity and DNS resolution
From your application server, test raw connectivity with `telnet <db-host> 5432` or `nc -zv <db-host> 5432`. If using a hostname, verify DNS with `nslookup` or `dig`—DNS caching issues or round-robin DNS can cause intermittent failures when one IP in the pool is bad. Check firewall rules and security groups to confirm port 5432 is open bidirectionally. Intermittent failures that resolve after 30-60 seconds often point to security group rule propagation delays or misconfigured health checks.
4Assess database-side resource constraints
Check if PostgreSQL is overwhelmed and slow to accept new connections. Look at `waiting_queries`—if consistently above 10, the server is backed up. Examine `postgresql.locks` for blocking operations; high lock counts can cause connection establishment to timeout as the server struggles to allocate resources. If `postgresql.backends` is near max_connections and `checkpoint_timed` shows frequent checkpoints with high I/O, the server may be rejecting connections or taking 5+ seconds to accept them, triggering client timeouts.
5Investigate network latency and packet loss
Run `mtr` tests from app to database server for at least 100 packets to measure latency distribution and identify packet loss. Even 1-2% packet loss can cause TCP connection establishment to retry and timeout. Check monitoring for network-related log flooding—sudden spikes in network error logs often indicate hardware failures, saturated links, or routing issues. If latency percentiles show p95 > 100ms on what should be a local network, investigate network path and switch configurations.
6Review connection timeout configurations
Verify your application's connection timeout isn't too aggressive for your network conditions. If you're seeing timeouts on a 50ms latency network but your timeout is 1-2 seconds, you have almost no margin for TCP handshake retries. Check PostgreSQL's `tcp_keepalives_idle`, `tcp_keepalives_interval`, and `tcp_keepalives_count` settings in postgresql.conf, and verify client-side TCP keepalive is enabled. A timeout that's too short manifests as intermittent failures during normal load, while a genuine network issue shows 100% failure during problem windows.

Technologies

Related Insights

database availability problem
Stub insight created from scenario discovery hint: database availability problem
Database connection pool exhaustion causes Connection refused errors
critical
timeout settings too aggressive
Stub insight created from scenario discovery hint: timeout settings too aggressive
Network issues cause storage overflow with logs
diagnostic_stepwarning
connectivity issues impacting application
diagnostic_step
Stub insight created from scenario discovery hint: connectivity issues impacting application
DNS resolution issues
prerequisite
Stub insight created from scenario discovery hint: DNS resolution issues
firewall or security group misconfiguration
prerequisite
Stub insight created from scenario discovery hint: firewall or security group misconfiguration
network latency or packet loss
prerequisite
Stub insight created from scenario discovery hint: network latency or packet loss
PostgreSQL server process not running causes connection refusal
prerequisitecritical

Relevant Metrics

Monitoring Interfaces

PostgreSQL CloudWatch Metrics
PostgreSQL Datadog
PostgreSQL Native
PostgreSQL Prometheus Exporter (pg_exporter)