Isolation Levels and MVCC Anomalies

How PostgreSQL implements the three available isolation levels on top of MVCC, which anomalies each prevents (including write skew, which only Serializable catches), the SSI predicate-lock mechanism and its tuning knobs, the syntax for setting levels and using DEFERRABLE, the application retry-loop pattern for SQLSTATE 40001, how Oracle, SQL Server, and MySQL/InnoDB differ, and how to diagnose the cost in production.

Learning outcomes

Isolation is the contract your database gives your application about what concurrent transactions can and cannot see of each other. Get the contract wrong and you ship a double-spend, a missing audit row, or a phantom inventory unit; pay too much for it and your write throughput collapses under serialization aborts. PostgreSQL gives you three real isolation levels built on top of the snapshot rules from the mvcc-and-tuple-visibility page, and the choice between them is one of the highest-leverage decisions you make in any transactional system.

After studying this page, you can:

  • Name the four ANSI anomalies plus write skew, and decide which PostgreSQL level prevents each.
  • Explain why PostgreSQL silently behaves as Read Committed when you ask for Read Uncommitted, and what changes between Read Committed and Repeatable Read at the snapshot level.
  • Read and write a retry loop that handles SQLSTATE 40001 correctly, with backoff.
  • Use Serializable Snapshot Isolation (SSI) to fix a write-skew bug, and reason about its predicate-lock costs.
  • Diagnose serialization failures and deadlocks from pg_stat_database, and tune the SSI predicate-lock budget.

Before we dive in

You should already be comfortable with the mvcc-and-tuple-visibility page: row versions stamped with xmin and xmax, snapshots that pick exactly one visible version, and the fact that PostgreSQL never updates in place. This page promotes those snapshot rules from a mechanism into a contract, and shows how each contract changes which anomalies a workload can still produce.

A few words you will need, defined here. An anomaly is an outcome that a single serial execution of your transactions could never have produced. An isolation level is a named set of guarantees about which anomalies cannot happen. A snapshot is the rule that decides which committed data your transaction sees (the mvcc-and-tuple-visibility page builds this in detail). Predicate locking, used only by Serializable, is PostgreSQL tracking which rows your transaction read by predicate so it can detect a dangerous dependency with a concurrent writer. Keep these four. Everything below is built from them.

Mental Model

The wrong model, and a common one for engineers coming from MySQL or SQL Server defaults, is that higher isolation just means more locks and slower writes. Under that model Read Committed is the cheap permissive option, Serializable is the expensive paranoid one, and you trade safety for throughput on a sliding scale of locking.

PostgreSQL does not work that way. All three levels run on the same MVCC engine, so reads never block writers and writers never block readers at any level. What changes between the levels is the snapshot rule (when a snapshot is taken and how long it lives) and, at Serializable, an extra detector that watches for read-write dependency cycles and aborts one transaction to break them. The cost of higher isolation is not blocking; it is aborts. Your application sees more transactions fail with a serialization error and has to retry them.

Keep that picture. Choosing an isolation level in PostgreSQL is choosing how often a transaction is allowed to be wrong about what it read, traded against how often the database must throw it away and ask the application to try again. Once you read it that way, every rule that follows, the 40001 error code, the retry loop, the predicate-lock budget, falls into place.

Breaking it down

1. The four ANSI anomalies, and where PostgreSQL stands

Start with the anomalies, because every isolation level is defined by which of them it forbids. The SQL standard names four, and the literature adds a fifth that the standard missed.

A dirty read sees a row another transaction wrote but has not committed. If the writer rolls back, you read data that never existed.

A non-repeatable read happens when the same SELECT for a specific row returns different values within one transaction, because another transaction committed an update between your two reads.

A phantom read is the same idea at the set level: a SELECT with a predicate returns a different set of rows the second time, because a concurrent transaction inserted or deleted rows that match your predicate.

A lost update happens when two transactions both read a value, both compute a new value from it, and both write back, so one of the two updates is silently overwritten.

The fifth, missing from the standard but real in any snapshot database, is write skew. Two transactions each read overlapping sets, each see a state that justifies a write, each write to a disjoint row, and both commit. No row was overwritten, but the combined post-state would have been impossible under any serial ordering. The on-call doctors example in rung 5 is the canonical case.

PostgreSQL’s relationship to the four standard levels is unusual and worth stating bluntly. It accepts READ UNCOMMITTED syntactically, then silently behaves as Read Committed, because MVCC never exposes an uncommitted tuple to anyone. Dirty reads simply cannot happen in PostgreSQL at any level. Repeatable Read in PostgreSQL is stricter than the SQL standard demands: it prevents phantoms as well as non-repeatable reads, because the whole transaction reads from one snapshot. Only write skew separates Repeatable Read from Serializable.

AnomalyPostgreSQL Read CommittedRepeatable ReadSerializable
Dirty readimpossibleimpossibleimpossible
Non-repeatable readpossibleimpossibleimpossible
Phantom readpossibleimpossible (stricter than ANSI)impossible
Lost updatepossible (without FOR UPDATE)impossible (aborts with 40001)impossible
Write skewpossiblepossibleimpossible

That table is the whole map. The rest of this page is how each level achieves its column, what it costs, and when to pick it.

2. Read Committed: a fresh snapshot per statement

Read Committed is the default and it is the level your application runs at unless you say otherwise. Its rule is simple: every individual statement takes its own fresh snapshot at the moment it begins, and that snapshot sees every transaction that had committed by then.

The consequence is that two SELECTs of the same row inside one transaction can correctly return different values, because another transaction committed between them. That is not a bug; it is the contract. Within one statement, however, you do get a consistent view: a single SELECT scanning a million rows reads one snapshot through to the last row.

sequenceDiagram
    participant A as Txn A (Read Committed)
    participant B as Txn B (writer)
    A->>A: BEGIN
    A->>A: SELECT balance FROM accounts WHERE id=7 -> 100
    B->>B: UPDATE accounts SET balance=90 WHERE id=7
    B->>B: COMMIT
    A->>A: SELECT balance FROM accounts WHERE id=7 -> 90
    Note over A: same row, two values, no error

The interesting part is what happens to writes. When your UPDATE finds a row to change, it tries to lock that row. If a concurrent transaction has already updated the same row since your statement began, your statement does not see a serialization failure: it pauses until the other commits, then re-evaluates its WHERE clause against the new version. This re-evaluation is called EvalPlanQual, and it is the small piece of magic that lets Read Committed avoid lost updates on the simple case of UPDATE accounts SET balance = balance - 10 WHERE id = 7. Two concurrent decrements both succeed and both apply, because each re-reads the latest balance before subtracting.

EvalPlanQual saves you only when the update can be re-evaluated against the new row. The classic lost-update bug is the read-modify-write you split across two statements:

-- Application code, NOT safe at Read Committed
select balance from accounts where id = 7;          -- reads 100
-- application computes new_balance = 100 - 10 in code
update accounts set balance = 90 where id = 7;      -- writes 90

If a concurrent transaction also reads 100 and writes 90, one of the two reductions is lost. There is no re-evaluation here, because the UPDATE does not depend on the current row state in its WHERE or SET. Use Read Committed when individual statements are atomic enough (every check and write fits in one SQL statement), and reach for Repeatable Read, Serializable, or explicit SELECT ... FOR UPDATE (covered in the locking-and-deadlocks page) when they are not.

3. Repeatable Read: one snapshot for the whole transaction

Repeatable Read changes one thing: the snapshot is taken at the first statement of the transaction and held for every statement after it. Every read inside the transaction sees the database as it stood at that one instant, no matter how many other transactions commit in between. Non-repeatable reads and phantom reads both become impossible by construction. PostgreSQL’s Repeatable Read is sometimes called snapshot isolation in the literature, and it is stricter than the SQL standard requires.

sequenceDiagram
    participant A as Txn A (Repeatable Read)
    participant B as Txn B (writer)
    A->>A: BEGIN (snapshot frozen)
    A->>A: SELECT balance FROM accounts WHERE id=7 -> 100
    B->>B: UPDATE accounts SET balance=90 WHERE id=7
    B->>B: COMMIT
    A->>A: SELECT balance FROM accounts WHERE id=7 -> 100
    Note over A: same value both reads, the snapshot is fixed

Writes pay for that stability. If your transaction tries to UPDATE a row that another transaction has already updated and committed since your snapshot, PostgreSQL cannot re-evaluate the update (your view of the world is locked at the snapshot) and cannot let you overwrite a value your transaction never saw. So it aborts your transaction with SQLSTATE 40001, serialization_failure, and the message could not serialize access due to concurrent update. Your application must catch that error, roll back, and retry the transaction from the start.

This is the abort-and-retry pattern, and it is the price of Repeatable Read on contended rows:

-- Pseudocode in the application; SQL inside the transaction
begin isolation level repeatable read;

select balance from accounts where id = 7;          -- reads 100 from snapshot
-- ... business logic ...
update accounts set balance = balance - 10 where id = 7;
-- If a concurrent commit modified row 7 after my snapshot,
-- this UPDATE fails with SQLSTATE 40001 and rolls back.
commit;

Repeatable Read prevents lost updates on the read-modify-write case from rung 2: instead of silently overwriting, the second transaction aborts and the application retries against the new value. It does NOT prevent write skew, because two transactions touching disjoint rows never see each other’s writes and never trigger the serialization-failure check.

4. Serializable: SSI catches the dependency cycle

Serializable goes one step further. It takes the same snapshot as Repeatable Read, runs at the same speed for reads and writes, and adds one extra mechanism: PostgreSQL tracks the read-write dependencies between concurrent transactions and aborts one transaction if the combination would create a cycle that no serial ordering could produce. The technique is called Serializable Snapshot Isolation (SSI), introduced in PostgreSQL 9.1, and it is the only mainstream implementation of true serializability that does not fall back to locking on reads.

The tracking is predicate locking with SIREAD (serializable-isolation read) locks. When your serializable transaction reads rows by predicate (an index range, a sequential scan, a hash bucket), PostgreSQL records a fine-grained note saying “this transaction read this region of this relation.” It is not a blocking lock: another transaction can write into that region freely. But if the cycle detector later sees that your read range and that write would close a dangerous read-write dependency among a set of concurrent transactions, it aborts one of them with SQLSTATE 40001.

flowchart LR
    A[Txn A reads X] --> B[Txn B writes X]
    B --> C[Txn B reads Y]
    C --> D[Txn A writes Y]
    D --> E{Cycle?}
    E -->|yes| F[SSI aborts one with 40001]
    E -->|no| G[both commit]

The cost lives in two places. Predicate locks consume shared memory, and PostgreSQL caps how many a single transaction or relation or page may hold. Three settings tune this budget, and you should know all three by name.

  • max_pred_locks_per_transaction (default 64) sets the average locks per active transaction in the shared pool. The total pool size is this times max_connections times max_prepared_transactions plus an internal factor; raising it lets long-running serializable transactions track more read regions before the system escalates their granularity from row to page, then page to relation.
  • max_pred_locks_per_relation (default -2, meaning negative-half the per-transaction setting) and max_pred_locks_per_page (default 2) control when PostgreSQL escalates a fine-grained predicate lock to a coarser one. Escalation does not cause aborts directly, but a relation-level predicate lock makes false-positive cycles much more likely, and you will see your 40001 rate climb.
  • A read-only transaction can be marked READ ONLY DEFERRABLE. PostgreSQL then waits until it can take a snapshot that is guaranteed not to participate in any cycle, so the transaction is exempt from SSI aborts entirely. This is the right level for a long analytical report that must not be retried.

Serializable is the only level that prevents write skew, and it is the only one that earns the literal SQL-standard guarantee: any committed serializable transaction produced a result equivalent to some serial execution of all the transactions involved.

5. Write skew, the anomaly only Serializable catches

Now the worked example, because write skew is the one that experienced engineers still get wrong. A hospital rota says at least one doctor must be on call at any time. Two doctors, Alice and Bob, are both currently on call. Each opens the rota at the same time to take themselves off.

-- Schema: a small on-call table
create table on_call (
  doctor text primary key,
  is_on_call boolean not null
);
insert into on_call values ('alice', true), ('bob', true);

Under Repeatable Read, the bug is silent. Each transaction takes a snapshot that shows two doctors on call, the check passes, and each commits its own update on a different row.

Write skew under Repeatable Read
Both beginAlice and Bob each BEGIN ISOLATION LEVEL REPEATABLE READ. Both snapshots see alice = true, bob = true.
Step 1 of 5

The Serializable fix is the one-word change shown in the last step. No application logic changes, no extra FOR UPDATE annotations, no rewritten queries. The cost is that under heavy contention the application sees more 40001 errors and must retry. That is the trade.

6. Syntax: BEGIN, SET TRANSACTION, DEFERRABLE

There are three places you can pin an isolation level, and they have different scopes. Be deliberate about which one you use.

-- 1. Per transaction, on BEGIN. Effective only for this transaction.
begin isolation level serializable read only deferrable;
-- ...
commit;

-- 2. Per transaction, after BEGIN, before the first statement.
begin;
set transaction isolation level repeatable read;
-- ...
commit;

-- 3. Per session (or per transaction with LOCAL).
set session characteristics as transaction isolation level read committed; -- session
set local default_transaction_isolation = 'repeatable read';               -- this txn only

The READ ONLY clause tells PostgreSQL no INSERT, UPDATE, DELETE, or DDL will run; the server refuses such statements with an error, which is a cheap safety check on long reports. The DEFERRABLE clause matters only at Serializable: it lets a read-only transaction wait for a clean snapshot and never be aborted, at the cost of a delayed start. Use it for batch reads where a retry would be expensive.

Constraints have a related but distinct knob. A constraint declared INITIALLY DEFERRED (or normally IMMEDIATE but flipped with SET CONSTRAINTS ... DEFERRED) is checked at commit time rather than after each statement. This is the right tool for cyclic foreign keys or any pair of writes that violate a constraint mid-transaction but satisfy it at commit. It is not an isolation feature, but it is the next syntax engineers ask about, and it composes cleanly with any level.

-- A foreign-key check deferred to commit
alter table orders
  add constraint fk_customer foreign key (customer_id)
  references customers (id) deferrable initially deferred;

7. Application patterns: retry loops and pessimistic locks

Repeatable Read and Serializable shift the cost of correctness onto the application, and the cost is a retry loop. Get the loop right once and you can use either level safely.

The pattern has four pieces: catch SQLSTATE 40001 (and the related 40P01 for deadlocks), bound the retry count, back off between attempts to avoid thrashing on a hot row, and never wrap the retry around a transaction that has external side effects (do not send an email twice because a database transaction was retried twice).

const (
    serializationFailure = "40001"
    deadlockDetected     = "40P01"
    maxAttempts          = 5
    baseBackoff          = 10 * time.Millisecond
)

func withRetry(ctx context.Context, db *sql.DB, work func(*sql.Tx) error) error {
    var lastErr error
    for attempt := 0; attempt < maxAttempts; attempt++ {
        tx, err := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
        if err != nil {
            return err
        }
        if err := work(tx); err != nil {
            tx.Rollback()
            if isRetryable(err) {
                lastErr = err
                jitter := time.Duration(rand.Int63n(int64(baseBackoff)))
                time.Sleep(baseBackoff*(1<<attempt) + jitter)
                continue
            }
            return err
        }
        if err := tx.Commit(); err != nil {
            if isRetryable(err) {
                lastErr = err
                jitter := time.Duration(rand.Int63n(int64(baseBackoff)))
                time.Sleep(baseBackoff*(1<<attempt) + jitter)
                continue
            }
            return err
        }
        return nil
    }
    return fmt.Errorf("gave up after %d attempts: %w", maxAttempts, lastErr)
}

func isRetryable(err error) bool {
    var pgErr *pgconn.PgError
    if errors.As(err, &pgErr) {
        return pgErr.SQLState() == serializationFailure ||
               pgErr.SQLState() == deadlockDetected
    }
    return false
}

A few non-obvious points. The retryable check belongs at the Commit step too, because SSI can detect a cycle at commit time, not only at the offending statement. Exponential backoff with jitter avoids two retrying transactions colliding on the same retry tick. And the work closure must be pure: any non-database side effect inside it runs once per attempt. Side effects belong outside the retry, after the transaction commits.

Sometimes the cheaper answer is pessimistic locking with SELECT ... FOR UPDATE (or FOR NO KEY UPDATE for the milder lock that allows referencing foreign keys to land). The lock-and-deadlocks page covers the lock modes in detail; the rule for now is that FOR UPDATE is the right tool when you know which exact row your transaction will write and contention on that row is high. It blocks competing writers explicitly, so you trade serialization aborts for waits. Wrong rows or too many rows locked turns it into a deadlock factory; use it deliberately.

Picking an isolation level
Use it when each statement is its own atomic decision. Good for: simple CRUD, single-row updates expressed as UPDATE ... SET col = expr WHERE id = ?, web APIs that read fresh data on every request. Watch out for: read-modify-write split across two statements, which silently loses updates.

8. How other databases differ from PostgreSQL

If you are coming from another database, the same level name means a different thing. Knowing the deltas saves a class of bugs the day you switch.

Oracle runs every statement with statement-level read consistency by default, much like PostgreSQL Read Committed. Its SERIALIZABLE level is actually snapshot isolation (closer to PostgreSQL Repeatable Read), and Oracle has no true serializable mode. Write skew is possible on Oracle SERIALIZABLE; it is not on PostgreSQL Serializable.

SQL Server defaults to Read Committed using shared read locks (readers block writers and vice versa, unlike PostgreSQL). It offers READ_COMMITTED_SNAPSHOT (statement-level snapshots, like PostgreSQL Read Committed) and SNAPSHOT (transaction-level snapshots, like PostgreSQL Repeatable Read) as opt-ins. Its SERIALIZABLE level uses range locking, blocking writes that would change predicate results, rather than detecting cycles after the fact.

MySQL with InnoDB defaults to REPEATABLE READ, but its implementation uses semi-consistent reads and gap locks that catch some but not all phantom cases. Crucially, an UPDATE at InnoDB Repeatable Read reads the latest committed version (not the snapshot version) of the row it is updating, which is closer to PostgreSQL Read Committed semantics for writes. InnoDB Serializable converts plain SELECT into SELECT ... LOCK IN SHARE MODE, blocking concurrent writers.

The summary you can carry into a design review:

DatabaseDefault levelSnapshot isolation available asTrue serializable?Reader-writer blocking?
PostgreSQLRead CommittedRepeatable Read (built in)yes, via SSInever
OracleRead CommittedSERIALIZABLE (mislabeled)nonever on reads
SQL ServerRead CommittedSNAPSHOT (opt-in)only via range locksyes, unless RCSI on
MySQL InnoDBRepeatable Read(it is the default)only via lock escalationpartial

When you port code across these, the most common bug is assuming the level name carries the same guarantee. It does not. If you need write-skew safety on a non-PostgreSQL database, you almost always need explicit SELECT ... FOR UPDATE on every row you read for a decision; on PostgreSQL Serializable, you do not.

9. Diagnostics: spotting and tuning the cost

Two views give you almost everything you need, and one column on pg_stat_activity closes the gap.

pg_stat_database reports cumulative counts per database, including the two failure modes that matter for isolation:

select datname, conflicts, deadlocks, xact_commit, xact_rollback
from pg_stat_database
where datname = current_database();

conflicts counts recovery conflicts (mostly relevant on replicas), deadlocks counts deadlocks the deadlock detector resolved, and the ratio of xact_rollback to xact_commit includes serialization-failure rollbacks. A sudden climb in any of these correlates with the right action: rollbacks against commits trending up on a serializable workload means contention is closing more cycles and the retry loop is doing more work.

The error itself arrives as SQLSTATE 40001, could not serialize access due to .... The detail tells you which case:

ERROR:  could not serialize access due to concurrent update
ERROR:  could not serialize access due to read/write dependencies among transactions
HINT:   The transaction might succeed if retried.

The first message is the Repeatable Read or Serializable concurrent-update case. The second is SSI specifically: a read-write cycle was detected and one transaction was sacrificed. Both have the same SQLSTATE, both deserve the same retry, but they tell you different things about which level your contention is hitting.

To find which backend is pinning a long snapshot (and so feeding both bloat from the mvcc-and-tuple-visibility page and serialization pressure here), backend_xmin on pg_stat_activity is the column to watch:

select pid, state, xact_start, query_start,
       age(backend_xmin) as snapshot_age_in_xids,
       left(query, 80) as query_snippet
from pg_stat_activity
where backend_xmin is not null
order by backend_xmin asc
limit 5;

For SSI specifically, monitor whether predicate locks are escalating to relation-level (which inflates the false-positive abort rate). The view pg_locks reports mode = 'SIReadLock' rows for predicate locks; aggregate by locktype to see how often the granularity has coarsened to relation:

select locktype, count(*)
from pg_locks
where mode = 'SIReadLock'
group by locktype
order by 2 desc;

If the relation-level count rises and your 40001 rate climbs with it, raise max_pred_locks_per_transaction so the budget per worker is larger and escalation is rarer. The setting takes effect at server restart, so tune it once during a maintenance window rather than chasing it during an incident.

10. Failure modes experienced engineers still hit

Five failure modes account for most production isolation bugs. Knowing the shape of each is the difference between fixing one and chasing the next.

The first is “Read Committed is transactional enough.” It is, for atomic statements. It is not, for split read-modify-write. The classic double-spend (read the balance, check it covers the charge, debit it) at Read Committed allows two concurrent charges to both pass the check and both succeed. Fix: do the check inside one UPDATE ... WHERE balance >= ?, or move the transaction to Repeatable Read with a retry loop, or use Serializable.

The second is missing the retry loop on Repeatable Read or Serializable. An application that opens a Repeatable Read transaction without handling 40001 will surface random business failures to users under load. The retry loop from rung 7 is mandatory at these levels; treat the absence of one as a bug, not a polish item.

The third is SSI under heavy parallelism with too few predicate-lock slots. With many short Serializable transactions running at once, the per-transaction lock budget is exhausted, predicate locks escalate to relation granularity, false-positive cycles balloon, and the 40001 rate explodes. The symptom is a high retry rate that grows with concurrency rather than transaction complexity. Fix: raise max_pred_locks_per_transaction (and watch pg_locks to confirm escalation drops).

The fourth is SELECT ... FOR UPDATE on the wrong rows. Pessimistic locking is correct when you lock the exact rows you will write; it becomes a deadlock factory when you lock everything you might read. The clue is a pg_stat_database.deadlocks count climbing along with end-user errors. Lock the minimum set, in a consistent order, or drop the explicit lock and let Serializable detect the conflict.

The fifth is trusting a non-PostgreSQL mental model. Engineers from MySQL or Oracle expect SERIALIZABLE to mean “same speed, more locks.” On PostgreSQL it means “same speed, more aborts you must retry.” Engineers from SQL Server expect Read Committed to take read locks. On PostgreSQL it does not, and a long-running read never blocks a writer. Re-read the rung-8 table before designing a new system on PostgreSQL after years on something else.

Two more traps worth knowing

Mastery Questions

  1. A payments service at Read Committed lets a customer transfer money between two accounts. The check is SELECT balance FROM accounts WHERE id = :from; if (balance >= amount) UPDATE accounts SET balance = balance - :amount WHERE id = :from; UPDATE accounts SET balance = balance + :amount WHERE id = :to;. Under load, the same customer occasionally overdraws their account. What is happening at the snapshot level, and what is the simplest correct fix?

    Answer. The bug is a classic lost update across statements. At Read Committed each statement takes its own snapshot, so two concurrent transfers can both run the SELECT, both see a balance of 100, both see their amount (say 80) covered, and both proceed to the two UPDATEs. EvalPlanQual does NOT save you here because the debit UPDATE does not reference the current balance in its WHERE or SET: it writes a precomputed value the application brought in from the read. So both updates execute and the account goes to negative 60. Three simplest fixes, in order of mechanical change. Fold the check into the write: UPDATE accounts SET balance = balance - :amount WHERE id = :from AND balance >= :amount and treat zero rows affected as insufficient funds; this works at Read Committed because EvalPlanQual re-reads the latest balance against the predicate. Or move to Repeatable Read with a retry loop on 40001: a concurrent debit will cause your UPDATE to fail and you retry. Or use SELECT ... FOR UPDATE on the source row inside the transaction to take a row lock before reading the balance. All three are correct; the first is the least intrusive.

  2. Your team enables Serializable on a hot endpoint to fix a write-skew bug in inventory reservations. The bug goes away, but the endpoint’s 40001 rate climbs steadily as traffic grows, far faster than transaction complexity does, and pg_locks shows many SIReadLock rows at locktype = 'relation'. What is happening, and what is the tuning knob?

    Answer. The application is exhausting its predicate-lock budget and PostgreSQL is escalating fine-grained predicate locks to relation-level locks to fit. A relation-level SIReadLock says “this transaction read SOMETHING from this relation,” which makes the SSI cycle detector report many false-positive cycles, because any concurrent writer to that relation now looks like it could close a cycle with your read. The 40001 rate climbs out of proportion to actual conflicts. The tuning knob is max_pred_locks_per_transaction (default 64), which sets the per-active-transaction share of the predicate-lock pool. Raise it (say to 256 or 512) so each transaction holds enough fine-grained locks before escalation, watch pg_locks to confirm relation-level SIReadLock rows fall, and re-measure 40001. A small follow-up: max_pred_locks_per_relation and max_pred_locks_per_page also affect the escalation thresholds and can be tuned per workload. None of this changes correctness; it only changes how often the detector sees a real cycle versus a coarsened approximation.

  3. A long analytical report runs at Read Committed and frequently shows numbers from two different points in time because tables it reads change between its first and last SELECT. Switching it to Repeatable Read fixes the smear but the report now sometimes aborts with 40001 because background writes touch a row it read. Switching to Serializable makes the abort rate worse. What is the right answer, and why?

    Answer. The right answer is BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE. At Read Committed the report reads through one fresh snapshot per statement, which gives you the time-smear you saw. Repeatable Read fixes the smear with one snapshot for the transaction, but a read-only transaction at Repeatable Read can still be aborted with 40001 if it runs a write (and even pure reads can hit the error on certain query plans involving locks). Serializable adds the SSI predicate-lock checks, which are exactly the wrong cost for a long report: more concurrent writes equal more chances to be sacrificed for a detected cycle. The READ ONLY DEFERRABLE mode is the escape hatch: PostgreSQL accepts your serializable read-only transaction, but waits to take a snapshot until it can prove that snapshot cannot participate in any read-write cycle, and then runs the report under that clean snapshot. The transaction starts a little later than it otherwise would and is guaranteed never to be aborted with 40001. For a long report whose retry would be expensive, this is the correct trade.

Recommended next

  • Locking and Deadlocks
    Builds directly on this page: Locking and Deadlocks is the next step in the PostgreSQL performance ladder.
Sources & evidence14 claims · 3 cited

Snapshot rules per level, the silent treatment of READ UNCOMMITTED as Read Committed, the stricter-than-ANSI phantom guarantee at Repeatable Read, SSI predicate-locking and the max_pred_locks_per_* settings, the SQLSTATE 40001 error and the could-not-serialize messages, BEGIN ... ISOLATION LEVEL syntax, and READ ONLY DEFERRABLE are all grounded in the PostgreSQL Transaction Isolation, Concurrency Control (MVCC), and Explicit Locking documentation. The cross-database comparison with Oracle, SQL Server, and MySQL/InnoDB, the retry-loop pattern with exponential backoff and jitter, and the worked write-skew example are treated as stable common knowledge that the listed sources do not cover directly.

  • PostgreSQL accepts READ UNCOMMITTED syntactically but treats it as Read Committed, because MVCC never exposes an uncommitted tuple to any transaction.verified
  • PostgreSQL Repeatable Read is stricter than the SQL standard: it prevents phantom reads in addition to non-repeatable reads, because the whole transaction reads from a single snapshot.verified
  • At Read Committed, every statement takes its own fresh snapshot at the moment it begins, so two SELECTs of the same row in one transaction can correctly return different values.verified
  • When a Read Committed UPDATE finds a row another transaction has already updated, it waits for the other to commit and then re-evaluates its WHERE clause against the new version (the EvalPlanQual re-evaluation) rather than aborting.verified
  • At Repeatable Read, an UPDATE on a row that another transaction has already modified after the snapshot fails with SQLSTATE 40001 (serialization_failure) and the message 'could not serialize access due to concurrent update', requiring the application to retry.verified
  • Serializable in PostgreSQL is implemented as Serializable Snapshot Isolation (SSI): it runs on the same snapshot as Repeatable Read and additionally tracks read-write dependencies via predicate locks (SIREAD), aborting one transaction whose commit would close a dangerous dependency cycle.verified
  • The predicate-lock budget is tuned with max_pred_locks_per_transaction (default 64), max_pred_locks_per_relation, and max_pred_locks_per_page (default 2); exhausting the budget escalates row-level predicate locks to page or relation level and raises the false-positive serialization-failure rate.verified
  • A serializable transaction declared READ ONLY DEFERRABLE waits until it can take a snapshot that is guaranteed not to participate in any read-write cycle, after which it cannot be aborted with a serialization failure.verified
  • Write skew (two transactions reading overlapping sets, writing to disjoint rows, and committing into a state no serial ordering could produce) is possible at Repeatable Read because the writes touch disjoint rows and never trigger a concurrent-update abort, but Serializable detects it through SSI predicate-lock dependency tracking and aborts one transaction with SQLSTATE 40001.verified
  • PostgreSQL accepts BEGIN ISOLATION LEVEL <level> [READ ONLY | READ WRITE] [DEFERRABLE | NOT DEFERRABLE], SET TRANSACTION ISOLATION LEVEL after BEGIN, and SET LOCAL default_transaction_isolation inside a transaction; DEFERRABLE only has effect at Serializable READ ONLY.verified
  • The correct retry pattern for Repeatable Read and Serializable catches SQLSTATE 40001 (serialization_failure) and 40P01 (deadlock_detected) at both statement and commit time, bounds the retry count, applies exponential backoff with jitter, and keeps non-database side effects outside the retried block so they do not execute multiple times.stable common knowledge
  • Oracle SERIALIZABLE is snapshot isolation (closer to PostgreSQL Repeatable Read) and does not prevent write skew; SQL Server defaults to Read Committed with shared read locks and offers SNAPSHOT as an opt-in; MySQL/InnoDB defaults to REPEATABLE READ but uses semi-consistent reads so an UPDATE reads the latest committed row rather than the snapshot version.stable common knowledge
  • pg_stat_database exposes per-database conflicts and deadlocks counters, and serialization failures arrive as SQLSTATE 40001 with either 'could not serialize access due to concurrent update' (Repeatable Read or Serializable concurrent write) or 'could not serialize access due to read/write dependencies among transactions' (SSI cycle detection).verified
  • Read Committed does not protect a read-modify-write that is split across two statements, so the same atomic-feeling logic that is safe inside a single UPDATE ... SET col = expr WHERE id = ? becomes a lost update when written as a SELECT followed by an UPDATE that does not re-read the row.verified

Cited sources