Locking and Deadlocks

How PostgreSQL's lock manager works: the eight table lock modes and their conflict matrix, the four row-lock modes and why FOR NO KEY UPDATE is the right default, SKIP LOCKED and NOWAIT for queue patterns, advisory locks as application mutexes, lock_timeout and friends, deadlock cycle detection with a worked example, ALTER TABLE as the silent killer with its fast/slow classification and add-then-validate workaround, and the pg_locks/pg_blocking_pids diagnostics to find what is blocking what right now.

advanced 29 min read in Concurrency And Maintenance

postgresqllockingdeadlocksconcurrencyalter-tableadvisory-locksskip-lockeddiagnostics

Learning outcomes

Snapshots solve the easy half of concurrency. The hard half, the half where two transactions actually want to change the same row, or where one wants to rewrite a whole table while another is reading it, is solved by locks. Locks are also where most outages live: a ten-second ALTER TABLE that freezes the whole application, a queue worker stack that deadlocks under load, an advisory lock that never gets released because the connection died.

After studying this page, you can:

Name each of the eight table-level lock modes, what statement takes it, and what it blocks.
Choose the right row-level lock (FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR KEY SHARE) for an update path and explain why FOR NO KEY UPDATE is the right default.
Build a queue with SELECT FOR UPDATE SKIP LOCKED and a fail-fast path with NOWAIT, and explain how each avoids contention.
Set lock_timeout, statement_timeout, and idle_in_transaction_session_timeout to the values production should use, and say what each one bounds.
Walk through how PostgreSQL detects a deadlock cycle after deadlock_timeout, and rewrite a deadlocking pair of transactions to remove the cycle.
Run a migration as ALTER TABLE without taking down the application, by classifying which variants are metadata-only and using the add-validate split for the slow ones.
Use pg_locks, pg_blocking_pids, pg_stat_activity, and pg_stat_database.deadlocks to find what is blocking what right now.

Before we dive in

You should be comfortable with PostgreSQL transactions and the snapshot model from mvcc-and-tuple-visibility and isolation-levels-and-mvcc-anomalies. The first explains why a SELECT and an UPDATE of the same row can run at the same time without either waiting. This page is about what happens when they cannot: when two writers fight over the same row, when DDL needs to rewrite a table, or when the application itself needs a mutex across processes.

A few words you will need. A lock mode is a named level of access to an object (a table or a row); modes are partially ordered by strength. A conflict is when two requested modes are not allowed to coexist: the second waiter blocks until the first holder releases. The lock graph is the directed graph of “transaction A waits for transaction B.” A deadlock is a cycle in that graph. Hold those four. Everything below is the conflict matrix of one kind of lock or another, plus the rules for how PostgreSQL hands them out and breaks deadlocks.

Mental Model

The wrong model, comforting and common, is that PostgreSQL has “a lock” the way a Java synchronized block does: you grab it, you do work, you release it. Under that model SELECT FOR UPDATE and LOCK TABLE look the same, and you reason about contention by imagining a single mutex per row or per table.

The real model is a multi-mode lock manager with a conflict matrix. Every lock is held in a named mode (ACCESS SHARE, ROW EXCLUSIVE, ACCESS EXCLUSIVE, and the rest), and what blocks what is decided by a small table that says “mode X requested while mode Y is held: wait or proceed?”. Two transactions can hold weak locks on the same table at the same time without any contention, because their modes do not conflict. The same two transactions deadlock the moment one of them upgrades to a mode that does conflict.

Keep this picture. A lock is not a binary “free or taken.” It is a mode, and conflict is a property of pairs of modes. Once that lands, the eight table modes stop looking like an arbitrary list and start looking like a deliberately graded scale: each step up forbids one more thing.

Breaking it down

1. Where snapshots stop and locks take over

PostgreSQL’s headline rule, the one mvcc-and-tuple-visibility teaches, is that readers never block writers and writers never block readers. That rule lets a thirty-second report and a flood of concurrent writers coexist without either side waiting. It is also, on its own, not enough.

Two situations break out of MVCC. The first is write-write conflict on the same row: two transactions both UPDATE the same row, and the database has to pick a serial order. The second is structural change to a shared object: CREATE INDEX rewrites a table’s index, ALTER TABLE may rewrite the heap, DROP TABLE removes it. You cannot solve either with version-stamped tuples alone. You need locks.

So PostgreSQL keeps both. Most reads and writes ride on snapshots and never take more than the lightest table lock. The moment two writers target the same row, or the moment any DDL touches an object, the lock manager takes over and enforces the conflict matrix below. The rest of this page is that matrix and its consequences.

2. The eight table-level lock modes and what they fight

Every statement that touches a table acquires a lock on that table at one of eight modes. The modes are ordered from weakest (acquired by plain SELECT) to strongest (acquired by ALTER TABLE), and each mode conflicts with a wider set of modes than the one before it. The full conflict matrix is in the official docs; the practical view is the one you use every day.

flowchart TB
    A["ACCESS SHARE (SELECT)"] --> B["ROW SHARE (SELECT FOR UPDATE)"]
    B --> C["ROW EXCLUSIVE (INSERT, UPDATE, DELETE)"]
    C --> D["SHARE UPDATE EXCLUSIVE (CREATE INDEX CONCURRENTLY, ANALYZE)"]
    D --> E["SHARE (CREATE INDEX)"]
    E --> F["SHARE ROW EXCLUSIVE"]
    F --> G["EXCLUSIVE"]
    G --> H["ACCESS EXCLUSIVE (ALTER TABLE, DROP, REINDEX, VACUUM FULL)"]

Two rules carry most of the everyday intuition. Readers and writers do not block each other at the table level: ACCESS SHARE (taken by SELECT) and ROW EXCLUSIVE (taken by INSERT, UPDATE, DELETE) do not conflict. Two writers do not block each other at the table level either: two UPDATEs of different rows in the same table both hold ROW EXCLUSIVE and proceed in parallel. Row-level conflict only happens when they target the same row, and that is handled by the row-level lock below.

The Tabs lay out each mode at the level you need to reach for it.

The eight table lock modes

Taken by a plain SELECT. Conflicts only with ACCESS EXCLUSIVE. So a SELECT can run alongside any other read or write, but freezes the moment a DROP or an ALTER TABLE wants the table. This is the lock every read holds, briefly.

The pattern to internalize: the lower modes are designed to coexist with normal traffic, and ACCESS EXCLUSIVE is designed not to. Almost every “the database stopped responding” incident traces to ACCESS EXCLUSIVE being held longer than expected.

3. Row-level locks: FOR UPDATE and its quieter siblings

Table-level locks decide who can use the table. Row-level locks decide who can change a specific row. They live on the tuple itself (in its xmax field, the same one MVCC uses for delete) and only conflict when two transactions target the same row. A row writer never blocks anyone on a different row.

There are four row-lock modes, and choosing the right one is one of the most underrated correctness levers in PostgreSQL.

Mode	Taken by	What it blocks on the same row	When to use
`FOR KEY SHARE`	foreign-key check from a child table	FOR UPDATE, FOR NO KEY UPDATE	Held automatically by FK checks; rarely written by hand.
`FOR SHARE`	`SELECT ... FOR SHARE`	FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE on key cols	Read-then-write workflows where the row must not change under you, but you do not intend to update.
`FOR NO KEY UPDATE`	`SELECT ... FOR NO KEY UPDATE`, plain `UPDATE` of non-key columns	FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE	The right default when you will UPDATE non-key columns. It lets concurrent FK checks (which need FOR KEY SHARE) proceed.
`FOR UPDATE`	`SELECT ... FOR UPDATE`, `DELETE`, `UPDATE` of key columns	All four row-lock modes	The strongest. Use when you will delete the row or change a key column, or you explicitly want to block FK checks.

The subtle one is FOR NO KEY UPDATE. A plain UPDATE to a non-key column (say, balance) takes it automatically. The reason it exists is to let foreign-key validation keep working: when a child row is inserted, the database needs to check that the parent row still exists, and that check takes FOR KEY SHARE. If the parent’s update had taken FOR UPDATE, the child insert would block; with FOR NO KEY UPDATE, the two coexist. So when you write SELECT ... FOR UPDATE by hand on a row whose key you will not touch, you are stronger than you need to be and you are blocking unrelated work.

The rule is: write FOR NO KEY UPDATE if you will only change non-key columns, and FOR UPDATE only if you will change the key or delete the row.

4. Queue patterns: SKIP LOCKED and NOWAIT

The classic mistake when building a job queue in PostgreSQL is to write the obvious thing and watch it serialize.

-- Naive: every worker tries to lock the oldest pending job.
-- All but one worker wait. Throughput collapses to one job at a time.
begin;
select id from jobs where status = 'pending'
  order by created_at limit 1 for update;
-- ... do the work ...
update jobs set status = 'done' where id = $1;
commit;

Each worker’s SELECT ... FOR UPDATE blocks behind the previous worker’s lock on the same row. Concurrency is zero, no matter how many workers you start.

SKIP LOCKED is the fix. It tells PostgreSQL to skip rows that are already locked by another transaction, returning the next unlocked row instead. Each worker grabs a different job; they all proceed in parallel.

-- Worker pattern: each worker claims a disjoint row.
begin;
select id from jobs where status = 'pending'
  order by created_at
  for update skip locked
  limit 1;
-- ... do the work ...
update jobs set status = 'done' where id = $1;
commit;

SKIP LOCKED is the right tool for queues, batch dispatchers, and any “many workers pulling from a shared inbox” pattern. It trades fairness (a heavily contended row near the top of the queue may be skipped repeatedly by waves of workers) for throughput.

The other modifier is NOWAIT. Instead of waiting on a locked row, it aborts the statement immediately with error 55P03 (lock_not_available). Use it when you would rather fail and retry than queue: in an interactive endpoint where a five-second wait is worse than a clean error, or in a migration where one stuck connection should not freeze the deploy.

-- Fail fast if someone else holds the row.
select * from inventory where sku = 'X' for update nowait;

The two modifiers compose: FOR UPDATE SKIP LOCKED LIMIT 1 for queue workers, FOR UPDATE NOWAIT for interactive fail-fast.

5. Advisory locks: application mutexes on a 64-bit key

Sometimes the thing you want to serialize is not a row at all. It is a piece of application work: “only one process at a time may run the daily reconciliation,” or “do not let two replicas of the cron job fire on the same minute.” For this PostgreSQL offers advisory locks: locks the database does not interpret at all, identified by a 64-bit integer key (or a pair of 32-bit keys) that you choose. They are perfect application-level mutexes that ride along with your existing database connection.

There are two scopes, and the difference matters operationally.

Session-scope locks (pg_advisory_lock(key)) are held until you call pg_advisory_unlock(key) or the session ends. They survive across transactions, which makes them useful for “hold this for the whole worker run.”
Transaction-scope locks (pg_advisory_xact_lock(key)) are released automatically when the transaction commits or rolls back. There is no unlock call. They are the safer default, because there is no leak path if your code forgets to release.

A typical use looks like this:

-- Only one nightly job at a time. Key is a hash of the job name.
begin;
select pg_advisory_xact_lock(hashtext('nightly-reconciliation'));
-- ... do the work; lock released automatically on commit ...
commit;

The variants pg_try_advisory_lock and pg_try_advisory_xact_lock return false instead of waiting, the advisory-lock equivalent of NOWAIT. Use them when the right behaviour is “another worker is already doing it, so I do nothing.”

The trap is session-scope advisory locks plus connection pooling. A session-scope lock survives the transaction, so if your connection pool returns the connection to the pool without releasing the lock, the lock outlives the worker that took it, and the next worker on the same connection inherits a locked state it does not understand. Worse, if the connection drops mid-work (network blip, OOM kill), the lock dies with it and any retry finds the resource unlocked even though the original worker may still be running on another connection. Advisory locks do not solve “exactly once” by themselves; they only serialize. If you take session-scope locks, you must own the connection for the lock’s full lifetime, or use transaction-scope locks and accept that the lock window is just the transaction.

6. Waiting on locks: lock_timeout and its cousins

By default, a PostgreSQL statement that hits a lock waits forever. That is a fine answer for a long batch job. It is a disaster for a web request, a deployment, or anything attached to an SLO. Production runs lean on four timeout knobs.

Setting	Default	What it bounds	Where to set it
`lock_timeout`	`0` (none)	Time spent waiting to ACQUIRE a lock; not the time the lock is held	Per-session for migrations; per-pool for web traffic.
`statement_timeout`	`0` (none)	Total runtime of any statement, including planning, execution, and lock waits	Per-role for analyst queries, per-pool for app traffic.
`idle_in_transaction_session_timeout`	`0` (none)	Time a transaction may sit IDLE between statements before the server aborts it	Globally, almost always.
`idle_session_timeout` (PG14+)	`0` (none)	Time an idle connection (no open transaction) may sit before the server closes it	When a pool’s idle reaper is not enough.

lock_timeout is the unsung hero of safe migrations. The default of zero means a single ALTER TABLE that cannot acquire its ACCESS EXCLUSIVE lock will sit in the lock queue forever, and every subsequent query on that table will queue behind it. With SET lock_timeout = '3s' at the top of the migration, the same ALTER aborts cleanly when it cannot get in, and the application keeps running. Set it on every migration; treat its absence as a bug.

idle_in_transaction_session_timeout is the production fix for the long-transaction trap that mvcc-and-tuple-visibility teaches: a connection left in idle in transaction holds the xmin horizon back and stalls vacuum database-wide. Setting this to a small value (think 60 seconds, or as low as your pool tolerates) means a misbehaving connection ends itself instead of bloating every table you own. Set it globally in postgresql.conf.

# postgresql.conf
lock_timeout = 0
statement_timeout = 0
idle_in_transaction_session_timeout = 60s
idle_session_timeout = 0

The pattern in real systems: leave lock_timeout and statement_timeout at zero in postgresql.conf, then override them per-pool, per-role, or per-session, depending on the workload. Always set idle_in_transaction_session_timeout.

7. Deadlocks: cycle detection and the worked example

A deadlock is a cycle in the wait-for graph: transaction A waits on a lock held by B, and B waits on a lock held by A. Neither can proceed. PostgreSQL detects this automatically. After a transaction has been waiting for deadlock_timeout (default 1s), the lock manager walks the wait-for graph from that transaction. If it finds a cycle, it picks one transaction in the cycle and aborts it with SQLSTATE 40P01 (deadlock_detected). The other transaction proceeds.

Deadlocks are correctness-safe: the database never deadlocks silently, never corrupts data, and always recovers. They are throughput-killing: the aborted transaction has to be retried, the detector itself is not free, and at scale a deadlock storm can wipe out a queue.

The canonical example is two transfers in opposite order. Transaction A locks account 1 and tries to lock account 2; transaction B has already locked account 2 and tries to lock account 1.

A two-transaction deadlock and how the detector breaks it

Initial stateAccount 1 and account 2 both hold $100. Transaction A wants to transfer $10 from account 1 to account 2. Transaction B wants to transfer $5 from account 2 to account 1. Both start at the same instant.

Step 1 of 7

The fix is a discipline, not a setting: lock acquisition in a consistent order across every code path. The order can be by primary key (lowest id first), by name (alphabetical), or anything else, as long as every transaction in the system follows the same rule.

-- The fixed version: lock the lower id first, always.
begin;
update accounts set balance = balance - 10 where id = least(1, 2);
update accounts set balance = balance + 10 where id = greatest(1, 2);
-- Reverse the sign per direction; the LOCK order is fixed.
commit;

When the order is naturally hard (the application sees account names, not ids), an explicit SELECT ... FOR UPDATE ... ORDER BY id before the writes works just as well: the lock order is then a property of the SELECT, not of the UPDATE order. Always pair a multi-row FOR UPDATE with an ORDER BY, or the planner is free to lock rows in any order, and a different query plan tomorrow becomes today’s deadlock.

A deadlock here looks like this in the server log:

ERROR:  deadlock detected
DETAIL:  Process 18432 waits for ShareLock on transaction 998877; blocked by 18433
         Process 18433 waits for ShareLock on transaction 998876; blocked by 18432
         Process 18432: UPDATE accounts SET balance = balance - 10 WHERE id = 2
         Process 18433: UPDATE accounts SET balance = balance - 5  WHERE id = 1
HINT:  See server log for query details.

Read it as a cycle: 18432 waits on the transaction held by 18433, and 18433 waits on the transaction held by 18432. The two queries at the bottom tell you exactly which two paths in your code disagree on lock order.

8. ALTER TABLE: the silent killer and its safer patterns

Most ALTER TABLE variants take ACCESS EXCLUSIVE on the table, which conflicts with even a plain SELECT. While the ALTER holds that lock, the entire table is unreadable. If your application sends so much as a health-check query against the table during that window, that health-check waits, then a request behind it waits, then your connection pool fills up, then your app falls over.

The fix is not to avoid ALTER TABLE. It is to know which variants are metadata-only (the lock is held briefly, no data is rewritten) and which actually rewrite the table.

Fast (metadata-only on PG 11 and later):

ALTER TABLE ... ADD COLUMN col TYPE with no default, or with a non-volatile default (constant or stable expression).
ALTER TABLE ... DROP COLUMN col. The column is hidden; storage is reclaimed later by VACUUM.
ALTER TABLE ... SET NOT NULL when an existing CHECK (col IS NOT NULL) NOT VALID constraint has already been validated, so the planner can prove the column has no nulls without a table scan.
ALTER TABLE ... ATTACH PARTITION ... CONCURRENTLY (PG 14+ on hash-partitioned, available in narrower forms earlier).
Adding or dropping an unvalidated CHECK constraint: ADD CONSTRAINT ... NOT VALID.

Slow (rewrites the table or does a full scan):

ALTER COLUMN col TYPE new_type where the type change is not binary-compatible. This rewrites every row.
ADD COLUMN col WITH VOLATILE DEFAULT (for example, default now() in a way the planner cannot prove constant): rewrites every row.
ADD FOREIGN KEY validated synchronously: full scan to check every existing row.
ADD CONSTRAINT ... CHECK (...) (without NOT VALID): full scan.
SET NOT NULL without a pre-validated NOT VALID check: full scan to prove the column is null-free.

The workaround pattern for the slow ones is the add-then-validate split. You take the ACCESS EXCLUSIVE briefly for the structural change, then do the slow work without holding the heavy lock.

-- BAD: adds the FK and scans the whole table under ACCESS EXCLUSIVE.
alter table orders add constraint fk_customer
  foreign key (customer_id) references customers(id);

-- GOOD: two steps, second step uses SHARE UPDATE EXCLUSIVE (online).
alter table orders add constraint fk_customer
  foreign key (customer_id) references customers(id) not valid;
alter table orders validate constraint fk_customer;

The same pattern applies to NOT NULL: add CHECK (col IS NOT NULL) NOT VALID, run VALIDATE CONSTRAINT, then SET NOT NULL is cheap because the proof already exists. And for column-type changes: add a new column, backfill it in batches, drop the old column, rename. None of those steps hold ACCESS EXCLUSIVE for more than a moment.

The non-negotiable habit, even for “fast” migrations: always set a short lock_timeout at the top of every migration file.

-- Migration prelude. Fail in 3 seconds rather than freeze the table.
set lock_timeout = '3s';
alter table orders add column shipped_at timestamptz;

Why this matters: even a metadata-only ALTER still has to acquire ACCESS EXCLUSIVE, which conflicts with every other lock mode. If a slow SELECT is mid-scan, your fast ALTER queues behind it, AND every subsequent SELECT queues behind your ALTER. Within seconds the whole table is frozen by one stuck transaction. With lock_timeout = '3s', the ALTER aborts after three seconds, the queue clears, and the deploy fails loudly rather than the application failing silently.

9. Diagnosing lock waits in production

When the symptom is “the app is slow on table X right now,” the lock view to know is pg_locks joined to pg_stat_activity. Each row in pg_locks is a lock request: granted ones have granted = true, queued ones (the waiters) have granted = false. The granted ones holding the table’s ACCESS EXCLUSIVE or whatever heavy mode is in play are the ones you need to find.

-- Who holds what right now on which table.
select l.relation::regclass as table, l.mode, l.granted,
       a.pid, a.state, a.query_start, left(a.query, 80) as query
from pg_locks l
join pg_stat_activity a using (pid)
where l.relation is not null
order by l.granted desc, l.relation, a.query_start;

For the “who is blocking whom right now” question, pg_blocking_pids(pid) is the single best function. It returns an array of every PID currently blocking the given PID. The canonical blocker query is:

-- Who is blocking whom, and on what query.
select waiting.pid as waiting_pid,
       waiting.query as waiting_query,
       blocking.pid as blocking_pid,
       blocking.query as blocking_query,
       blocking.state as blocking_state,
       now() - blocking.xact_start as blocking_age
from pg_stat_activity waiting
join lateral unnest(pg_blocking_pids(waiting.pid)) as blocker(pid) on true
join pg_stat_activity blocking on blocking.pid = blocker.pid
where waiting.wait_event_type = 'Lock';

pg_stat_activity.wait_event_type = 'Lock' and wait_event like relation, tuple, or transactionid tell you what KIND of lock the waiter is on. wait_event_type = 'Lock' with wait_event = 'relation' is “blocked on a table-level lock”; 'tuple' is “blocked on a row-level lock”; 'transactionid' is “waiting for another transaction to commit or roll back” (the row-lock wait often shows up here, since you wait for the transaction that holds the row’s xmax).

For aggregate health, pg_stat_database.deadlocks is a cumulative counter of deadlocks detected per database. Graph it. A flat-line database that suddenly grows deadlocks per minute has a new bug.

select datname, deadlocks, stats_reset from pg_stat_database
where datname = current_database();

Finally, set log_lock_waits = on in postgresql.conf. With deadlock_timeout = 1s, any wait longer than one second gets logged with the blocking process and query. This is how slow-lock incidents show up in your logs without you needing to be watching pg_locks at the moment they happen.

# postgresql.conf
log_lock_waits = on
deadlock_timeout = 1s

Lock failure modes you will meet in production

Mastery Questions

A teammate writes SELECT ... FOR UPDATE on a users row because they intend to update the user’s last_seen_at column. Reviews are quiet for weeks, then foreign-key inserts on the sessions table start blocking on this user lock. Why, and what should the SELECT have been?

Answer. FOR UPDATE is the strongest row-lock mode and conflicts with FOR KEY SHARE, which is exactly what a foreign-key validation needs: when a new row is inserted into sessions(user_id), PostgreSQL takes FOR KEY SHARE on the parent users row to verify it still exists. The teammate did not change the user’s primary key, so the lock should have been FOR NO KEY UPDATE, the lock a plain UPDATE of non-key columns takes automatically. FOR NO KEY UPDATE conflicts with other writers on the same row (so the update is still serialized) but does not conflict with FOR KEY SHARE, so FK checks on child tables proceed in parallel. The rule to internalize: use FOR UPDATE only when you will change a key column or delete the row, and prefer FOR NO KEY UPDATE everywhere else. The same logic explains why the unmodified UPDATE of balance never had this problem: PostgreSQL was already taking the weaker mode for it.
Your queue has eight workers and one wide row at the head that is taking a long time. With FOR UPDATE SKIP LOCKED, the workers correctly grab disjoint rows, but you notice that the head row is being repeatedly skipped by every wave of workers and is now an hour old. What is happening at the lock level, and how would you make the queue fair without giving up parallelism?

Answer. SKIP LOCKED is doing exactly what its name says: it skips any row currently locked, returning the next unlocked row instead. So while worker 1 is grinding through the head row, the other seven workers each SELECT ... FOR UPDATE SKIP LOCKED ORDER BY created_at LIMIT 1, see row 1 is locked, skip it, take rows 2 through 8. When they finish, they take rows 9 through 15, and so on. The head row is locked the entire time, so it is always skipped. SKIP LOCKED gives you throughput, not fairness. The fix is to bound the head row’s lock with a clear policy and let the queue claim, not run, in the lock window. Two common patterns: claim rows by UPDATE jobs SET claimed_by = $worker, claimed_at = now() WHERE id IN (SELECT id FROM jobs WHERE status = 'pending' ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1) RETURNING id; so the row’s lock is released as soon as the claim commits, and the long work happens with no lock held; or, if the work must hold a lock, set a watchdog that bounds runtime and re-queues stalled claims. Either way, the lock window is no longer the work window.
A migration that runs ALTER TABLE orders ADD COLUMN priority int DEFAULT 1 NOT NULL worked fine in staging and brings down production for forty seconds. Both environments are PostgreSQL 16. What is the most probable cause at the lock level, and what would you change?

Answer. The lock the ALTER takes is ACCESS EXCLUSIVE, which conflicts with every other mode including ACCESS SHARE, so for as long as the ALTER holds it, every SELECT on orders waits. Two things are different about production. First, production has live traffic, so even a metadata-only ALTER (the default is a non-volatile constant, which on PG 11+ does not rewrite the table) has to wait for an existing SELECT or write to finish before it can acquire ACCESS EXCLUSIVE, and every subsequent query queues behind the ALTER while it waits, so the freeze is not the ALTER’s own runtime, it is the queue that forms behind it. Second, production has no lock_timeout, so the ALTER waits indefinitely, the queue grows indefinitely, and the application’s pool fills up. The fix is twofold. Set lock_timeout = '3s' at the top of every migration (the staging mistake was that nothing was concurrent, so the lock was instant). And design the migration to fail fast: if lock_timeout fires, the script aborts and the deploy retries during a quieter moment, instead of holding the table hostage. For column changes that genuinely need to rewrite or validate (a CHECK, a foreign key, a type change), use the add-then-validate split (NOT VALID, then VALIDATE CONSTRAINT online), so the heavy work runs under SHARE UPDATE EXCLUSIVE rather than ACCESS EXCLUSIVE.

Recommended next

Table Bloat and HOT Updates
Builds directly on this page: Table Bloat and HOT Updates is the next step in the PostgreSQL performance ladder.

Sources & evidence15 claims · 4 cited

Lock modes (table and row), the conflict rules, FOR UPDATE/FOR NO KEY UPDATE/FOR SHARE/FOR KEY SHARE semantics, SKIP LOCKED/NOWAIT, advisory locks, deadlock detection after deadlock_timeout with SQLSTATE 40P01, and lock_timeout/statement_timeout/idle_in_transaction_session_timeout are grounded in the official PostgreSQL documentation (Explicit Locking, Concurrency Control, Transaction Isolation, CREATE INDEX). The fast/slow ALTER TABLE classification, the log_lock_waits setting, pg_blocking_pids, and idle_session_timeout (PG14+) are stable common knowledge not specific to any one cited section.

PostgreSQL defines exactly eight table-level lock modes, from ACCESS SHARE (taken by SELECT) to ACCESS EXCLUSIVE (taken by ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM FULL), each with strictly more conflicts than the one below it.verified
ACCESS SHARE conflicts only with ACCESS EXCLUSIVE, and ROW EXCLUSIVE (taken by INSERT/UPDATE/DELETE) does not conflict with ACCESS SHARE, so plain reads and plain writes never block each other at the table level.verified
CREATE INDEX CONCURRENTLY takes SHARE UPDATE EXCLUSIVE (which self-conflicts but does not block ordinary reads or writes), while CREATE INDEX without CONCURRENTLY takes SHARE and blocks writers for the duration of the build.verified
Row-level locks have four modes (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); FOR NO KEY UPDATE is what a plain UPDATE of non-key columns takes automatically, and unlike FOR UPDATE it does not conflict with the FOR KEY SHARE that foreign-key validation takes, so child-table FK checks proceed in parallel with non-key updates on the parent.verified
SELECT FOR UPDATE SKIP LOCKED skips any rows that cannot be locked immediately rather than waiting, and SELECT FOR UPDATE NOWAIT aborts with SQLSTATE 55P03 (lock_not_available) instead of waiting; the two compose so a queue worker can claim disjoint rows in parallel.verified
Advisory locks are identified by a 64-bit integer key (or a pair of 32-bit keys) and come in two scopes: pg_advisory_lock holds until pg_advisory_unlock or end of session, while pg_advisory_xact_lock is released automatically at transaction end.verified
lock_timeout bounds the time a statement will wait to ACQUIRE any lock (default 0, no limit), statement_timeout bounds total statement runtime, and idle_in_transaction_session_timeout aborts a transaction that has been idle between statements past the limit, which is the production fix for the long-transaction trap that pins the xmin horizon.verified
PostgreSQL 14 added idle_session_timeout, which closes connections that have been idle (with no open transaction) beyond the configured duration.stable common knowledge
PostgreSQL's lock manager runs deadlock-cycle detection only after a transaction has been waiting on a lock for deadlock_timeout (default 1 second); if a cycle is found, one transaction in the cycle is aborted with SQLSTATE 40P01 (deadlock_detected) and the others proceed.verified
Deadlocks are prevented by acquiring locks in a consistent order across all code paths; a multi-row SELECT FOR UPDATE without ORDER BY is unsafe because the planner may return rows in different orders for different queries, creating cycle potential under load.verified
Since PostgreSQL 11, ALTER TABLE ADD COLUMN with a non-volatile default no longer rewrites the table; it stores the default in catalog metadata and synthesises it for old rows on read, so the operation is metadata-only and the ACCESS EXCLUSIVE lock is held only briefly.stable common knowledge
Adding a foreign key or a CHECK constraint without NOT VALID requires a full table scan under ACCESS EXCLUSIVE; the safe pattern is to ADD CONSTRAINT ... NOT VALID first (cheap, ACCESS EXCLUSIVE for a moment) and then VALIDATE CONSTRAINT (slow, but under SHARE UPDATE EXCLUSIVE which does not block readers or writers).stable common knowledge
pg_locks shows each granted or pending lock request with its mode and the relation it targets, joined to pg_stat_activity by pid; rows with granted = false are the queue of waiters.verified
pg_blocking_pids(pid) returns the array of PIDs whose locks are blocking the given PID, which is the canonical way to answer 'who is blocking whom right now' without manually joining pg_locks to itself.stable common knowledge
log_lock_waits causes any lock wait exceeding deadlock_timeout (default 1 second) to be logged, so slow lock incidents leave a trace in postgres logs even when nobody is watching pg_locks at the moment they happen.stable common knowledge

Cited sources

PostgreSQL Documentation: Explicit Locking · PostgreSQL Global Development Group
PostgreSQL Documentation: CREATE INDEX · PostgreSQL Global Development Group
PostgreSQL Documentation: Concurrency Control (MVCC) · PostgreSQL Global Development Group
PostgreSQL Documentation: Transaction Isolation · PostgreSQL Global Development Group