Deadlock Troubleshooting

warningIncident Response

Application is experiencing deadlocks that are failing transactions and need investigation.

Prompt: “We're seeing deadlock errors in our application logs several times per hour. How do I find what queries are causing deadlocks and how to prevent them?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

2,198vs2,274

Latency

39.4smsvs43.7sms

Turns

8vs14

Tool Calls

3vs6

Schema Calls

0vs1

Response

5,901 charsvs4,810 chars

Evaluation

Without Schema

B+

With Schema

A-

Both responses provide comprehensive, technically accurate guidance with specific SQL queries and prevention strategies. The response with Schema covers an additional database system (SQL Server), includes Python retry logic code, and provides a dedicated monitoring strategy section, making it slightly more complete. The response without Schema focuses more deeply on PostgreSQL and MySQL with detailed lock queries. Neither could tailor guidance to the specific database or framework in use since no application code was found. The Schema.ai MCP server mention in the second response didn't add value as it wasn't actually used.

Without Schema

3 tool calls

I'll help you identify and fix the deadlock issues. Let me first explore your codebase to understand your database setup and application structure.
Since I don't see application code in the current directory, I'll provide you with comprehensive guidance on identifying and preventing database deadlocks. The approach varies slightly by database system, so I'll cover the most common ones.

## Finding Deadlock-Causing Queries

### PostgreSQL

**1. Enable deadlock logging:**
```sql
-- Check current settings
SHOW deadlock_timeout;
SHOW log_lock_waits;

-- Set in postgresql.conf or via ALTER SYSTEM
ALTER SYSTEM SET deadlock_timeout = '1s';
ALTER SYSTEM SET log_lock_waits = on;
SELECT pg_reload_conf();
```

**2. Query deadlock information:**
```sql
-- View current locks and blocking queries
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
```

**3. Check PostgreSQL logs** for detailed deadlock information (usually includes the full query and lock details).

### MySQL/MariaDB

**1. Check InnoDB status:**
```sql
SHOW ENGINE INNODB STATUS\G
```
Look for the "LATEST DETECTED DEADLOCK" section.

**2. Enable deadlock logging:**
```sql
SET GLOBAL innodb_print_all_deadlocks = ON;
```
This logs all deadlocks to the MySQL error log.

**3. Query current locks:**
```sql
-- MySQL 8.0+
SELECT * FROM performance_schema.data_locks;
SELECT * FROM performance_schema.data_lock_waits;

-- Older versions
SHOW OPEN TABLES WHERE In_use > 0;
SHOW PROCESSLIST;
```

## Common Deadlock Patterns & Prevention

### 1. **Inconsistent Lock Ordering**
**Problem:** Two transactions acquire locks in different orders.
```sql
-- Transaction 1
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Transaction 2 (runs simultaneously)
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;  -- Locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1;  -- Waits for row 1
COMMIT;
```

**Solution:** Always acquire locks in the same order (e.g., by ID).
```sql
-- Both transactions should order by ID
UPDATE accounts SET balance = balance - 100 WHERE id IN (1, 2) ORDER BY id;
```

### 2. **Missing or Wrong Indexes**
**Problem:** Range scans lock more rows than necessary.

**Solution:**
- Add proper indexes so queries lock specific rows
- Use `EXPLAIN` to verify query plans
- Avoid full table scans in transactions

### 3. **Long-Running Transactions**
**Problem:** Transactions hold locks too long, increasing collision probability.

**Solution:**
- Keep transactions short
- Move non-critical work outside transactions
- Use appropriate isolation levels (READ COMMITTED instead of SERIALIZABLE when possible)

### 4. **Multiple Updates in Different Order**
**Solution:** Use batch operations with consistent ordering:
```sql
-- Instead of multiple updates
UPDATE table SET ... WHERE id = 5;
UPDATE table SET ... WHERE id = 3;
UPDATE table SET ... WHERE id = 7;

-- Do this
UPDATE table SET ... WHERE id IN (3, 5, 7) ORDER BY id;
```

### 5. **Foreign Key Deadlocks**
**Problem:** Updates to parent/child tables in different orders.

**Solution:**
- Ensure proper indexes on foreign keys
- Consider using `SELECT ... FOR UPDATE` to explicitly lock rows
- Update in consistent order (parent → child or vice versa)

## Debugging Steps

1. **Parse your application logs** to identify the queries mentioned in deadlock errors
2. **Reproduce in a test environment** with concurrent load
3. **Use database monitoring tools:**
   - PostgreSQL: pg_stat_statements extension
   - MySQL: Performance Schema
   - Third-party: Datadog, New Relic, etc.

4. **Add application-level logging** around transactions:
```python
# Example pseudo-code
logger.info(f"Starting transaction: {operation_name}")
logger.info(f"Acquiring lock on resource: {resource_id}")
# ... transaction code
logger.info(f"Transaction completed: {operation_name}")
```

## Quick Fixes to Try

1. **Reduce transaction scope:**
   - Only include necessary operations
   - Release locks earlier

2. **Add explicit locking:**
   ```sql
   -- PostgreSQL
   SELECT * FROM table WHERE id = ? FOR UPDATE;
   
   -- MySQL
   SELECT * FROM table WHERE id = ? FOR UPDATE;
   ```

3. **Use optimistic locking** where appropriate (version columns)

4. **Increase deadlock detection timeout** (temporary workaround):
   ```sql
   -- PostgreSQL
   SET deadlock_timeout = '5s';
   
   -- MySQL
   SET innodb_lock_wait_timeout = 10;
   ```

Would you like help analyzing specific queries from your logs, or setting up monitoring for your particular database system?

With Schema

6 tool calls1 Schema

I'll help you investigate the deadlock errors. Let me first check your application structure to understand what database and technologies you're using.
I see you have access to the Schema.ai MCP server. Let me search for documentation on database deadlocks and how to diagnose them.
While you grant permission for that tool, let me provide you with comprehensive guidance on finding and preventing deadlocks:

## 🔍 **Finding Deadlock-Causing Queries**

### **1. Check Database Deadlock Logs**

**PostgreSQL:**
```sql
-- Enable deadlock logging (in postgresql.conf)
-- log_lock_waits = on
-- deadlock_timeout = 1s

-- Query recent deadlocks from logs
SELECT * FROM pg_stat_database_conflicts;
```

**MySQL/MariaDB:**
```sql
-- Show latest detected deadlock
SHOW ENGINE INNODB STATUS\G

-- Look for the "LATEST DETECTED DEADLOCK" section
```

**SQL Server:**
```sql
-- Enable trace flags for deadlock capture
DBCC TRACEON(1222, -1);

-- Query deadlock graph from system health session
SELECT 
    XEvent.query('(event/data/value/deadlock)[1]') AS DeadlockGraph
FROM (
    SELECT XEvent.query('.') AS XEvent
    FROM (
        SELECT CAST(target_data AS XML) AS TargetData
        FROM sys.dm_xe_session_targets st
        JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
        WHERE name = 'system_health'
    ) AS Data
    CROSS APPLY TargetData.nodes('//RingBufferTarget/event') AS XEventData(XEvent)
) AS src;
```

### **2. Monitor Active Locks**

**PostgreSQL:**
```sql
-- See blocking queries
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks 
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
```

### **3. Application-Level Logging**

Add detailed logging around database operations:
- Log query execution with timestamps
- Track transaction isolation levels
- Monitor lock wait times

## 🛡️ **Preventing Deadlocks**

### **1. Consistent Lock Ordering**
Always access tables/rows in the same order across all transactions:
```sql
-- GOOD: Always update users before orders
BEGIN;
UPDATE users SET ... WHERE id = 1;
UPDATE orders SET ... WHERE user_id = 1;
COMMIT;

-- BAD: Inconsistent ordering leads to deadlocks
-- Transaction A: users → orders
-- Transaction B: orders → users (DEADLOCK!)
```

### **2. Keep Transactions Short**
- Minimize time between BEGIN and COMMIT
- Avoid user input or external API calls inside transactions
- Move heavy computations outside transaction boundaries

### **3. Use Appropriate Isolation Levels**
```sql
-- Consider READ COMMITTED instead of SERIALIZABLE
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
```

### **4. Use Row-Level Locking Explicitly**
```sql
-- PostgreSQL: Lock specific rows to prevent conflicts
SELECT * FROM users WHERE id = 1 FOR UPDATE NOWAIT;

-- Or with a timeout
SELECT * FROM users WHERE id = 1 FOR UPDATE SKIP LOCKED;
```

### **5. Add Indexes**
Poor indexes cause table scans and excessive locking:
```sql
-- Create indexes on frequently queried columns
CREATE INDEX idx_orders_user_id ON orders(user_id);
```

### **6. Batch Updates Carefully**
```sql
-- BAD: Large batch updates hold locks too long
UPDATE orders SET status = 'processed' WHERE created_at < '2026-01-01';

-- GOOD: Process in smaller batches
UPDATE orders SET status = 'processed' 
WHERE id IN (
    SELECT id FROM orders 
    WHERE created_at < '2026-01-01' 
    LIMIT 1000
);
```

### **7. Implement Retry Logic**
```python
import time

def execute_with_retry(query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return execute_query(query)
        except DeadlockError:
            if attempt == max_retries - 1:
                raise
            time.sleep(0.1 * (2 ** attempt))  # Exponential backoff
```

## 📊 **Monitoring Strategy**

1. **Set up alerts** for deadlock frequency spikes
2. **Log deadlock details** automatically to a monitoring system
3. **Track query patterns** that correlate with deadlocks
4. **Review slow query logs** - slow queries often contribute to deadlocks

Would you like me to search the Schema.ai documentation for specific database system guidance once you grant the MCP tool permission?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When troubleshooting deadlocks in PostgreSQL, start by confirming frequency and trend, then immediately dive into the logs to identify the specific queries involved and their lock graph. From there, analyze whether it's a lock ordering problem (most common) or a transaction scope issue, then implement consistent ordering and timeouts to prevent recurrence.

1Confirm deadlock frequency and establish baseline

First, verify the problem is real and measure its severity. Check the `postgresql.deadlocks` metric to see current rate, then query `SELECT deadlocks FROM pg_stat_database WHERE datname = 'your_db'` for the cumulative count since stats reset. If you're seeing several per hour and the count is trending up, you have a real problem. The `deadlocks-not-monitored` insight reminds us this isn't always tracked — add it to your dashboards now so you can measure improvement.

Deadlocks occurring but not tracked in metrics deadlock frequency increasing postgresql.deadlocks

2Extract the actual deadlock details from logs

The most critical step: grep your PostgreSQL logs for 'deadlock detected' and examine the lock graph output that follows. PostgreSQL prints the exact queries, process IDs, and lock wait chains for each deadlock. This tells you which tables and rows are involved and in what order transactions acquired locks. The `deadlocks-cause-transaction-rollback` insight emphasizes this — you need the lock graph to understand the circular dependency pattern.

Deadlocks cause transaction rollback and data operation failure

3Analyze current lock contention in real-time

Join `pg_locks` and `pg_stat_activity` to see what's blocking what right now: look for rows where `blocked_locks.GRANTED = false` and the blocking_locks PID differs from blocked_locks PID on the same resource. Check `postgresql.locks` and `waiting_queries` metrics — if you have consistently high waiting queries, you have lock contention even outside deadlocks. The `deadlocks-transaction-failures` insight gives the specific query pattern to identify these blocked sessions.

Deadlocks causing transaction failures and timeouts concurrent update conflicts postgresql.lockswaiting_queries

4Identify inconsistent lock ordering patterns

From your log analysis in step 2, check if the same tables or rows are being accessed in different orders by different transactions. This is the #1 cause of deadlocks. For example, Transaction A locks row 1 then row 2, while Transaction B locks row 2 then row 1. The `deadlock-inconsistent-lock-order` insight is critical here — if you see this pattern, you must standardize the order. Always lock rows sorted by primary key (use `ORDER BY id` in your `SELECT ... FOR UPDATE` or in CTEs).

Deadlocks from inconsistent row lock ordering lock ordering issue application logic causing deadlocks

5Check for overly large transaction scopes

Look at `long_running_transaction_oldest_timestamp` and `transactions_open` — if you have transactions staying open for many seconds or minutes, they're holding locks for too long and increasing deadlock probability. The `transaction-scope-too-large` insight points to this: large transactions that do extensive work while holding locks make the collision window bigger. Break up large transactions or move non-critical work outside the transaction boundary.

transaction scope too large long_running_transaction_oldest_timestamptransactions_open

6Implement consistent ordering and protective timeouts

Fix the root cause by enforcing consistent lock acquisition order across all code paths — sort by primary key before any batch updates or use `ORDER BY` in queries. Additionally, set `statement_timeout` to prevent locks from being held indefinitely if something hangs. The `deadlocks-transaction-failures` and `need-for-explicit-lock-ordering` insights both emphasize this: ordering prevents the circular wait, and timeouts provide a safety net.

Deadlocks from inconsistent row lock ordering Deadlocks causing transaction failures and timeouts need for explicit lock ordering