Logical Replication and CDC

How PostgreSQL turns its WAL into a row-level change stream, how publications, subscriptions, replica identity, and slots wire it together, and how that machinery powers near-zero-downtime major-version upgrades, selective analytics replication, and CDC pipelines into Kafka with tools like Debezium.

Learning outcomes

Streaming replication ships the write-ahead log byte for byte and rebuilds an identical replica. Logical replication does something stranger and more useful: it decodes that same log into a stream of row-level changes (insert this row, update that one, delete the other), then ships those changes to a subscriber that may run a different major version, a different schema, or not even be PostgreSQL at all. The same machinery powers cross-version upgrades with seconds of downtime and feeds Debezium pipelines into Kafka. The streaming-and-synchronous-replication page covers the byte-for-byte path; this page is its logical sibling.

After studying this page, you can:

  • Explain how a single WAL stream is decoded into logical row events and why that needs wal_level = logical.
  • Set up a publication and subscription end to end, and predict which DDL and DML operations cross the wire.
  • Choose the right REPLICA IDENTITY for a table and diagnose the errors you get when it is wrong.
  • Run a near-zero-downtime PostgreSQL major-version upgrade using logical replication.
  • Read the diagnostics in pg_replication_slots, pg_stat_subscription, and pg_stat_replication_slots, and unstick a subscription that has stalled on a conflict.
  • Pick between Debezium, Airbyte, AWS DMS, and native tools for an external CDC pipeline, and know which output plugins each speaks.

Before we dive in

You should be comfortable with the write-ahead log as a sequential record of every change, as the wal-checkpoints-and-durability page explains, and with physical streaming replication, as the streaming-and-synchronous-replication page covers. You also need the basics from the mvcc-and-tuple-visibility page: a row update produces a new tuple, and a delete tombstones the old one.

A few terms, defined as we use them. A logical decoder is the publisher-side process that reads WAL records and turns them into row-level change events. An output plugin is the format that decoder emits in, the most common being pgoutput (binary, built in) and wal2json (JSON, contrib). A publication is a server-side declaration of which tables and which operations to publish. A subscription is the receiver: it connects to a publisher, creates a logical replication slot there, performs an initial copy, and then applies the change stream row by row. A logical replication slot is the publisher-side bookmark that pins WAL until the subscriber confirms it has applied past that point. Hold these five. Everything below is built from them.

Mental Model

The tempting wrong picture is that logical replication is just streaming replication with a filter, the same bytes flowing across the wire with some tables left out. Under that picture the subscriber is a frozen image of the publisher, and the version must match.

Logical replication does not work that way. The better model is a translator standing between two databases. The publisher hands the translator a tape of physical changes (the WAL). The translator reads the tape, recognises insert, update, and delete events on specific tables, and dictates them in plain language: “on table orders, insert the row with these column values,” “on table accounts, update the row whose primary key is 42 to balance 90.” The subscriber listens to the dictation and executes those statements against its own copy. It runs its own planner, its own indexes, its own version of PostgreSQL.

Two consequences fall out the moment that picture clicks. First, the publisher and subscriber are decoupled: different majors, different storage layouts, even different extensions. Second, the translator can only describe what it sees on the tape. DDL never goes on the tape (a CREATE INDEX writes WAL about the index build, not the schema intent), so DDL is not replicated. Sequence values, large objects, and system catalogs sit outside the per-table stream too. The translator is faithful to what it reads, but you have to give it the right tape.

Breaking it down

1. Why logical replication exists when streaming already works

Start with what streaming gives you and what it refuses. Streaming replication ships WAL records byte for byte, and the standby replays them through the same redo code path the primary uses for crash recovery. The result is a perfect physical clone: same page layout, same tuple offsets, same major version, same architecture. That precision is the feature.

It is also the limitation. A physical replica cannot be running PostgreSQL 17 while the primary runs 16, because their on-disk formats may differ in ways the redo code does not bridge. You cannot replicate a subset of tables, because WAL records are about pages, not tables. You cannot reshape data on the way across, you cannot replicate into a non-PostgreSQL system, and you cannot run the replica as an active writer for the tables it receives.

Logical replication exists because all four of those wishes are routine. You want to migrate from PostgreSQL 14 to 17 with seconds of downtime. You want to feed three event tables into an analytics warehouse, not the whole database. You want to ship row changes to Kafka so a downstream service can react, what the industry calls analytics-CDC and external bus. The trick is to stop shipping bytes and start shipping meaning: not “page 9 of file 16384 changed like this,” but “row 42 of orders got these new column values.” That is what logical decoding produces.

2. The mechanism: decoding WAL into row-level changes

Here is the machinery. The same WAL that drives streaming replication is also rich enough to reconstruct every row change. PostgreSQL ships a feature called logical decoding that reads WAL records on the publisher and emits a structured stream of events:

BEGIN 742
  INSERT orders (id=101, customer_id=7, total=2999)
  UPDATE accounts SET balance=8501 WHERE id=7
COMMIT 742

Each event names the table, the operation, and the column values involved. The decoder runs on the publisher, inside a walsender process spawned for the subscription. It reads WAL, looks up the table’s schema in the system catalogs as of the moment the change happened, formats the event through the configured output plugin, and writes it to the network socket. The subscriber receives that stream and applies the events one transaction at a time, in commit order.

Two consequences of “decode on the publisher” matter for capacity planning. First, decoding costs CPU and memory on the publisher, not on the subscriber. Second, a long-running transaction on the publisher delays decoding for every subscriber, because the decoder cannot emit a transaction’s events until it sees the commit record, so the events spool in the publisher’s memory or spill to disk while it waits.

This richer machinery costs more WAL than physical replication. With wal_level = logical, PostgreSQL writes extra information into WAL records so the decoder can reconstruct row identities and column values without consulting the live heap, which may have moved on. You must set it explicitly:

# postgresql.conf on the publisher
wal_level = logical
max_wal_senders = 10
max_replication_slots = 10
max_logical_replication_workers = 4

wal_level = logical is a superset of replica, so it covers streaming replication too. Changing it requires a restart.

flowchart LR
    A[Client commits a transaction] --> B[WAL records written]
    B --> C[Logical decoder reads WAL]
    C --> D[Output plugin formats events]
    D --> E[Walsender pushes to subscriber]
    E --> F[Apply worker executes per row]
    F --> G[Subscriber confirms LSN]
    G --> H[Slot advances on publisher]

The diagram is the whole flow in one breath: commit, WAL, decode, format, ship, apply, confirm, advance. The logical replication slot at the publisher end is the bookmark that pins WAL until step 7 happens, which is why a stalled subscriber can fill the publisher’s disk.

Publisher decode to subscriber apply
Client COMMITThe application commits a transaction on the publisher. Its WAL records are written and fsync'd as for any commit.
Step 1 of 7

3. Publications and subscriptions: the wiring

You declare what to publish on the publisher and what to receive on the subscriber. Publishers can publish a chosen list of tables or every table in the database.

-- On the publisher
create publication orders_pub for table orders, order_items;

-- Or, for an upgrade-style "everything" replication:
create publication all_tables_pub for all tables;

A publication also picks which operations to forward. The default is all four (insert, update, delete, truncate); you can restrict it.

create publication orders_inserts for table orders
  with (publish = 'insert');

On the subscriber side, a subscription connects to the publisher, names the publication, and starts work.

-- On the subscriber
create subscription orders_sub
  connection 'host=publisher.example.com dbname=appdb user=replicator password=...'
  publication orders_pub;

Creating the subscription does several things at once: it creates a logical replication slot on the publisher (named after the subscription unless you pass slot_name), starts the initial table sync (a COPY of every published table at its current snapshot), then transitions into apply mode and starts streaming changes. The role used on the publisher needs the REPLICATION attribute and SELECT on every published table.

-- On the publisher, give the replicator role what it needs.
create role replicator with replication login password '...';
grant select on all tables in schema public to replicator;

The connection string carries weight. It is stored on the subscriber and reused on reconnect, so it must point at a host the subscriber can reach forever, not a tunnel that vanishes on session end.

4. REPLICA IDENTITY: how the subscriber finds a row

Inserts are easy: the event carries every column value, and the subscriber inserts a new row. Updates and deletes are harder. To apply “update the row that used to have these key columns to these new values,” the subscriber needs to know which row that is. REPLICA IDENTITY is the table-level setting that decides which columns the publisher writes into WAL for that purpose.

-- The four options.
alter table orders replica identity default;        -- use the primary key (the default)
alter table orders replica identity full;           -- log every column of the old row
alter table orders replica identity using index idx_orders_uniq;  -- use a unique non-PK index
alter table orders replica identity nothing;        -- log no key; UPDATE and DELETE will error

DEFAULT is what almost every table should use, and it requires a primary key. The publisher writes the primary key columns into the WAL record for each update and delete; the subscriber looks the row up by primary key. FULL writes every column of the old row into WAL, which is expensive (every update writes the whole row twice, once as the old image and once as the new), and the subscriber falls back to a full row match, which is a sequential scan on the subscriber unless an index happens to cover the comparison. Use it only for tables that genuinely lack a unique key and only after measuring.

The trap is the table with no primary key and no unique index. Its REPLICA IDENTITY defaults to DEFAULT, but DEFAULT cannot find a key, so the first update or delete will fail at apply time with cannot update table "x" because it does not have a replica identity and publishes updates. Either add a primary key, set a unique index and REPLICA IDENTITY USING INDEX, or accept the cost of FULL. NOTHING is a deliberate “no updates or deletes will replicate” choice for an insert-only stream.

REPLICA IDENTITY in practice

5. What is and is NOT replicated by default

This is the section that catches even experienced operators, because the omissions are not obvious. The decoder emits events for table rows; anything that is not a table-row change is silent.

OperationReplicated?Notes
INSERT, UPDATE, DELETEyesThe core stream.
TRUNCATEyes, since PostgreSQL 11Publication option publish_via_partition_root affects partitions.
DDL (CREATE TABLE, ALTER TABLE, CREATE INDEX)NOYou must run DDL on the subscriber separately, ideally first.
Sequence valuesmostly NO historically; PostgreSQL 16 added per-row sequence replication for sequences explicitly added to a publicationA subscriber may have stale nextval after switchover unless you bump it.
Large objects (pg_largeobject)NOThey live in a system catalog, not user tables.
System catalogsNORoles, tablespaces, extensions, search_path: all out.
Temporary and unlogged tablesNONo WAL, no decoding.

The single largest operational consequence is that DDL must be applied to both sides, in the right order. The safe pattern is: pause writes, apply the DDL on the subscriber first (it tolerates having a column the publisher does not yet write to), then on the publisher, then resume writes. Going the other way (publisher first) means the publisher starts emitting events that reference a column the subscriber lacks, and apply errors out.

Sequence values are the second trap, particularly in the upgrade scenario. After a switchover, the application keeps inserting via nextval, but the subscriber’s sequence may sit at the value it had during the initial copy. The standard fix is to bump every sequence on the subscriber to a safe value before promoting it to primary.

-- On the subscriber, just before promotion.
select setval(pg_get_serial_sequence('orders', 'id'),
              (select max(id) from orders));

6. Initial sync, slot catch-up, and parallel apply

When a subscription is created, every published table moves through a lifecycle the apply worker drives.

  1. init: the subscription is created; no work has started for this table yet.
  2. data sync: a per-table sync worker takes a snapshot via the slot and runs COPY from the publisher into the subscriber. Each table sync has its own short-lived slot named pg_<subid>_sync_<relid> to hold WAL during its copy.
  3. catch-up: once the COPY finishes, the sync worker streams the changes that happened during the COPY, from the snapshot LSN up to the apply worker’s current position. When it catches up, it hands off.
  4. ready: the main apply worker is now responsible for this table, and changes flow continuously.

The detail that matters operationally: until every table is ready, the subscription is holding multiple slots open on the publisher, and the publisher cannot recycle WAL behind the oldest of them. A subscription syncing a 2 TB table can pin WAL for the entire duration of that COPY.

-- Watch the per-table state.
select srrelid::regclass, srsubstate, srsublsn
from pg_subscription_rel
where srsubid = (select oid from pg_subscription where subname = 'orders_sub');

srsubstate is one character: i for init, d for data sync, s for synchronized (catch-up done), r for ready.

Once every table reaches ready, the steady state kicks in: one apply worker per subscription reads the stream and applies transactions. Historically that worker was single-threaded, which made apply lag scale with how fast the subscriber could replay one transaction at a time. PostgreSQL 16 added parallel apply for large in-progress transactions, where the apply worker can hand parts of a streaming transaction off to parallel workers, controlled by max_parallel_apply_workers_per_subscription (default 2). The setting matters most for OLTP workloads where one publisher transaction touches many rows; small transactions still apply on the main worker.

7. Conflicts and how the subscription stalls

A conflict on the subscriber is anything that prevents the apply worker from executing an event: a unique-key violation because some other writer inserted the same key first, a missing target row for an update because someone deleted it, a constraint violation because the subscriber’s schema is stricter. The default behaviour is brutal and intentional: the apply worker raises an error and the subscription stalls. The same event will be retried on the next reconnect attempt, and as long as the conflict is still there, it will fail again.

Why stall instead of skip? Skipping silently would leave the subscriber’s data quietly different from the publisher’s, with no way to know which divergences were “safe.” A stalled subscription is loud: pg_stat_subscription reports a non-empty error in the last_error_message column, and the publisher’s slot stops advancing, so its pg_wal/ starts to grow. The operator must intervene.

The two intervention paths are:

-- Path 1: fix the subscriber data so the event can apply.
delete from orders where id = 101;  -- remove the conflicting row
alter subscription orders_sub enable;

-- Path 2: skip the offending transaction at its LSN (PostgreSQL 15+).
alter subscription orders_sub skip (lsn = '3A/1C5F08');
alter subscription orders_sub enable;

ALTER SUBSCRIPTION ... SKIP is the controlled escape hatch added in PostgreSQL 15: you find the LSN from the error message, mark it for skip, and the apply worker advances past that one commit. PostgreSQL 16 and 17 added more visibility into conflict types and counters, including new system views for conflict tracking, so you can tell at a glance whether the stalled subscription is hitting unique violations, missing rows, or schema mismatches. PostgreSQL 16 also introduced origin filtering through CREATE SUBSCRIPTION ... WITH (origin = 'none'), which lets a subscriber accept only changes that did not come from another logical replication source. That is the building block for active-active topologies (each side replicates only its own writes to the other), with the caveat that genuine write conflicts (the same row updated on both sides) still need application-level conflict resolution.

sequenceDiagram
    participant P as Publisher
    participant S as Subscriber
    participant Op as Operator
    P->>S: insert orders row 101
    S-->>P: error duplicate key 101
    Note over S: subscription stalls and slot stops advancing
    Op->>S: query pg_stat_subscription
    Op->>S: delete conflict row or skip the lsn
    Op->>S: enable the subscription
    S->>P: resume and confirmed_flush_lsn advances

8. Use cases that win: upgrades, analytics, CDC pipelines

Three patterns are where logical replication genuinely beats every alternative.

Cross-major-version upgrades with near-zero downtime. This is the marquee use case. You stand up a new PostgreSQL 17 cluster, set up logical replication from the 16 primary into it, wait for the initial sync to finish and catch-up lag to fall to seconds, and then orchestrate a brief switchover: pause writes on the primary, wait a moment for the last events to apply, bump sequences on the new side, repoint the application, resume writes. Downtime is the switchover window, often under a minute, instead of the hours pg_upgrade --link plus restart would cost on a multi-terabyte database.

Selective replication to an analytics database. You replicate a chosen set of tables (orders, order_items, payments) from the OLTP primary into a separate PostgreSQL instance shaped for analytics: more memory, looser autovacuum, summary indexes, materialized views built on top. The analytics replica runs heavy queries without bothering the OLTP system, and the slice you replicate is exactly the tables you need, not the whole database.

CDC into Kafka or another external bus. This is the analytics-CDC and external bus pattern. A downstream system, often Kafka or a search index or a cache, needs to react to every row change in real time. You connect a CDC tool (Debezium is the dominant one in this space) to the publisher’s replication slot, the tool decodes WAL through the pgoutput or wal2json plugin, and it emits structured messages to Kafka topics. Consumers downstream build whatever they want on top.

Physical streaming vs logical replication
Physical streaming needs wal_level = replica (the default). Logical replication needs wal_level = logical, which is a superset and writes more WAL per change so the decoder can reconstruct row identities. Switching levels requires a publisher restart.

9. Output plugins and the CDC tool landscape

Logical decoding emits events through an output plugin that picks the wire format. Three plugins matter in practice.

  • pgoutput: built into PostgreSQL since version 10, binary, the default that the built-in subscription uses. Most efficient on the wire. Required for in-PostgreSQL logical replication.
  • wal2json: contrib plugin (sometimes packaged separately), emits each change as a JSON object. Easier to debug because you can read the events with your eyes. Common in older Debezium deployments and ad-hoc tooling.
  • test_decoding: built in, emits a human-readable text trace. Useful only for debugging; do not pipe it into a production CDC pipeline.

The CDC tool landscape on top of those plugins is short and worth knowing.

  • Debezium is the dominant Java framework for change data capture. Its PostgreSQL connector reads from a logical replication slot using pgoutput (the modern default) or wal2json (legacy), turns each event into a Kafka Connect record, and writes it to Kafka topics keyed by primary key. The bulk of PostgreSQL-into-Kafka deployments in industry run on Debezium.
  • Airbyte offers a Postgres source connector that uses logical replication under the hood for incremental CDC syncs into data warehouses (Snowflake, BigQuery, Redshift). It targets the ELT use case rather than streaming Kafka, but the publisher-side mechanism is the same slot and decoder.
  • AWS DMS (Database Migration Service) is the managed CDC option in AWS: it consumes from a slot and writes to a long list of targets including RDS, Redshift, S3, and Kafka via MSK. The trade is operational ease for less control over the plugin and behaviour.
  • Native pg_recvlogical is the command-line client shipped with PostgreSQL itself. It reads from a slot and writes the decoded stream to standard out or a file, which is enough for scripts, debugging, and very small pipelines.
# Peek at the change stream from a slot without consuming it (useful for debugging).
psql -d appdb -c "SELECT data
                  FROM pg_logical_slot_peek_changes('orders_slot', NULL, NULL,
                                                    'include-xids', '0');"

# Or, from the command line, tail the slot through pg_recvlogical.
pg_recvlogical -d appdb --slot=orders_slot --start \
  -o include-xids=0 -o include-timestamp=1 -f -

pg_logical_slot_get_changes is the destructive sibling of peek: it consumes events and advances the slot. Reach for peek first while diagnosing.

10. Failure modes and diagnostics

Logical replication has a small set of failure modes that recur. They all show up in the same three views.

-- 1) The publisher's slots: are they advancing?
select slot_name, slot_type, active, confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn))
         as retained_wal
from pg_replication_slots;

-- 2) The subscriber's apply state: any errors, how far behind?
select subname, pid, received_lsn, latest_end_lsn, last_msg_send_time, last_msg_receipt_time
from pg_stat_subscription;

-- 3) Decoder timings (PostgreSQL 14+): is decoding itself the bottleneck?
select slot_name, total_txns, total_bytes, spill_txns, spill_bytes,
       stream_txns, stream_bytes
from pg_stat_replication_slots;

Read the views together. If a publisher slot’s retained_wal is huge and confirmed_flush_lsn is stuck, the subscriber is not confirming, which means it is either disconnected or stalled on a conflict. Check pg_stat_subscription for an error or a dead pid. If decoding lag is growing but the subscriber is keeping up, look at spill_txns and spill_bytes in pg_stat_replication_slots: a long transaction on the publisher is forcing the decoder to spill to disk, which is slow.

Here are the failure modes ranked by how often they actually bite production.

  1. Subscriber stalled on a conflict, publisher WAL grows. The most common outage. The slot pins WAL forever, pg_wal/ fills, the publisher eventually halts. Fix the subscriber data or skip the LSN; consider max_slot_wal_keep_size (PostgreSQL 13+) to bound the damage, with the trade-off that the subscriber may need to be rebuilt if the slot drops.
  2. Table without a primary key. The first update or delete on it errors out on the subscriber. Either add a PK, set REPLICA IDENTITY USING INDEX, or accept FULL.
  3. DDL forgotten on one side. A schema change applied on the publisher but not the subscriber will crash apply the moment a row carries the new column. Apply DDL on the subscriber first.
  4. Long-running transaction on the publisher. The decoder cannot emit until commit, so subscription lag grows from the moment the transaction starts. Kill or shorten the transaction; the apply side cannot fix this.
  5. Single-threaded apply lag on a write-heavy publisher. Before PostgreSQL 16, this was a hard ceiling. With 16+, raise max_parallel_apply_workers_per_subscription and design publications so the workload streams in parallelizable chunks.
  6. Missing permissions. The replication role lacks REPLICATION, or it lacks SELECT on a published table. The error is loud at subscription creation time; check pg_hba.conf includes a replication entry for the role and host.
Check yourself
Your logical replication subscription has been stalled for two hours after a unique-key violation. The publisher's pg_wal directory is now 60 percent full and growing. What is the most appropriate immediate action?

A worked example pulls it together. Suppose you are migrating a 1.2 TB PostgreSQL 15 cluster to 17 with a Sunday morning maintenance window.

-- 1) On the new PG17 cluster, restore the schema (pg_dump -s from the old side).
psql -h pg17.example.com -d appdb -f schema.sql

-- 2) On the PG15 publisher, set wal_level=logical (already set if you planned ahead).
-- Then create the publication for everything.
create publication upgrade_pub for all tables;

-- 3) Create a replication role with the right grants and pg_hba.conf entry.
create role upgrader with replication login password '...';
grant select on all tables in schema public to upgrader;

-- 4) On the PG17 subscriber, create the subscription. It immediately starts copying.
create subscription upgrade_sub
  connection 'host=pg15-primary dbname=appdb user=upgrader password=...'
  publication upgrade_pub;

-- 5) Wait. Monitor pg_subscription_rel until every table reaches state 'r' (ready)
-- and pg_stat_subscription shows latest_end_lsn close to pg_current_wal_lsn() on PG15.

-- 6) At the maintenance window: pause writes on the application.
-- Wait briefly for apply to drain (latest_end_lsn == pg_current_wal_lsn() on PG15).

-- 7) On PG17, bump every sequence so nextval is past the largest live id.
select setval(s.relname::regclass,
              greatest((select max(id) from public.orders), 1))
from pg_class s where s.relkind = 'S';  -- repeat per sequence/table in practice

-- 8) Repoint the application to PG17 and resume writes.
-- 9) On PG17, drop the subscription (it has done its job).
alter subscription upgrade_sub disable;
alter subscription upgrade_sub set (slot_name = none);
drop subscription upgrade_sub;

Steps 6 through 8 are the only downtime. On a database measured in terabytes that window can be under a minute, which is the central reason logical replication is the standard tool for major-version upgrades at any scale that cannot tolerate pg_upgrade’s restart window.

Mastery Questions

  1. A teammate sets up logical replication for a single 500 GB events table that has no primary key and no unique index, picks REPLICA IDENTITY DEFAULT, and says, “We only insert into it, so it should be fine.” A week later, a bug-fix script runs a one-row UPDATE and the subscription stalls. What broke, and what would you change?

    Answer. REPLICA IDENTITY DEFAULT means “use the primary key,” but the table has none, so the publisher has no key columns to write into the WAL record for the update. Inserts work because they carry every column anyway, which is why the subscription seemed healthy for a week. The first update errors out on apply because there is no way to identify the target row, and the subscription stalls; the publisher’s slot stops advancing and pg_wal/ starts to grow. There are two clean fixes. The right one for an events table is to add a primary key, even a synthetic bigint generated always as identity, because that is what every other operational tool also wants. If you truly cannot add a key, set a unique non-null index and REPLICA IDENTITY USING INDEX. REPLICA IDENTITY FULL works as a last resort but doubles the WAL volume of every update and turns apply into a sequential scan, which is brutal on a 500 GB table. Long term, treat “no primary key” as a flag the operational tooling will keep catching, not a property to live with.

  2. You are running an upgrade from PostgreSQL 15 to 17 using logical replication. The initial sync of a 2 TB events table has been running for six hours, and you notice that the publisher’s pg_wal directory has grown by 400 GB during that time. Why is WAL piling up, and is anything broken?

    Answer. Nothing is broken; this is the cost the upgrade pattern asks you to plan for. Until the table-sync worker finishes its COPY and catches up to the apply worker, the subscription holds a logical replication slot at the publisher pinned at the LSN where the COPY started. The publisher cannot recycle WAL behind that slot, so every WAL byte written by ongoing OLTP traffic for those six hours accumulates in pg_wal. The remediation is to size the publisher’s disk for the largest plausible sync window before you start (rule of thumb: peak WAL generation rate multiplied by expected sync time, with comfortable headroom), and to set max_slot_wal_keep_size to a value that bounds the worst case at the cost of forcing a slot rebuild if exceeded. The diagnostic to confirm is pg_replication_slots: confirmed_flush_lsn will be far behind pg_current_wal_lsn() for the sync slots, and pg_subscription_rel will show those tables still in state d. Once the COPY finishes, the catch-up phase will run, the slot’s confirmed LSN will leap forward, and the next checkpoint will recycle the retained WAL.

  3. You want to feed every change on three OLTP tables into Kafka so a downstream service can react. A colleague suggests setting up a second PostgreSQL instance as a logical replica and having the downstream service poll it for changes. Why is that strictly worse than running Debezium against the OLTP publisher directly, and what specifically does Debezium use?

    Answer. The polling design pays for logical replication twice and loses freshness. You consume publisher resources to decode WAL and ship it to a second PostgreSQL, you consume disk and CPU on the replica to apply every change, and then you ask the downstream service to scan the replica on an interval to find new rows, which is the opposite of streaming and reintroduces every problem (lag, missed deletes, polling cost) that CDC was meant to solve. Running Debezium against the OLTP publisher uses the same logical decoding infrastructure exactly once. Debezium opens a replication slot, subscribes through the pgoutput plugin (the modern default; older deployments use wal2json), receives the change events directly from the walsender, and emits them as Kafka Connect records into Kafka topics keyed by primary key. There is no second database, no polling, and lag is bounded by network and Kafka, not by an apply worker and a poll interval. The trade is that Debezium is one more piece of operational software to run, and the slot it owns has the same “do not stall it” properties as any other logical slot, so monitoring pg_replication_slots and bounding retention with max_slot_wal_keep_size are still your job.

Recommended next

Sources & evidence14 claims · 4 cited

Core mechanics (wal_level, publications/subscriptions, REPLICA IDENTITY, slot behaviour, what is/is not replicated, conflicts, output plugins) are grounded in the official PostgreSQL 16/17 logical-replication and WAL documentation (src_pg_docs_logical_replication, src_pg_docs_wal, src_pg_docs_warm_standby, src_pg_docs_high_availability). Facts those sections do not cover but are correct and stable (Debezium/Airbyte/AWS DMS tooling, PostgreSQL 16 parallel apply, the bidirectional origin filter, pg_stat_replication_slots) are marked stable-common-knowledge with empty source_ids.

  • Logical decoding runs on the publisher inside the walsender process and reads WAL records to emit a stream of row-level INSERT, UPDATE, DELETE, and TRUNCATE events through a configurable output plugin, requiring wal_level = logical which writes more WAL than wal_level = replica because it records extra information for reconstructing row identities and column values.verified
  • The decoder cannot emit any event of a transaction until it observes the COMMIT record, so a long-running transaction on the publisher delays decoding for every subscriber and forces the decoder to spool events in memory or spill them to disk while it waits.verified
  • CREATE SUBSCRIPTION creates a logical replication slot on the publisher (named after the subscription unless slot_name is given), performs an initial COPY of every published table, and then transitions into continuous apply; the role used on the publisher requires the REPLICATION attribute and SELECT on every published table.verified
  • REPLICA IDENTITY DEFAULT logs only the primary key columns into WAL for UPDATE and DELETE; FULL logs every column of the old row (doubling per-update WAL volume and forcing the subscriber into a sequential-scan match unless a covering index happens to exist); NOTHING makes UPDATE and DELETE error out at apply time; USING INDEX accepts a unique, non-partial, NOT NULL index in place of a primary key.verified
  • DDL changes are not replicated by logical replication and must be applied manually on the subscriber (typically first, so the publisher cannot emit events referencing a column the subscriber lacks); system catalogs, large objects, and temporary or unlogged tables are also outside the per-table stream, and TRUNCATE has been replicated since PostgreSQL 11.verified
  • Each per-table sync worker opens its own short-lived slot named pg_<subid>_sync_<relid> to hold WAL during the COPY phase, so until every table reaches state 'r' (ready) the subscription pins multiple slots on the publisher and prevents WAL recycling behind the oldest of them.verified
  • PostgreSQL 16 introduced parallel apply for in-progress streaming transactions, controlled by max_parallel_apply_workers_per_subscription (default 2), letting the main apply worker hand parts of a large transaction off to parallel workers; before 16, apply for a subscription was strictly single-threaded.stable common knowledge
  • A conflict at apply (unique-key violation, missing target row, constraint failure) raises an error and stalls the entire subscription; the slot stops advancing, pg_wal grows on the publisher, and the operator must either repair the subscriber data or call ALTER SUBSCRIPTION ... SKIP (lsn = '...') (added in PostgreSQL 15) to skip the offending commit before re-enabling apply.verified
  • PostgreSQL 16 added origin filtering through CREATE SUBSCRIPTION ... WITH (origin = 'none'), which is the building block for active-active topologies because it lets a subscriber accept only changes that did not arrive from another logical replication source; genuine write conflicts on the same row still require application-level conflict resolution.stable common knowledge
  • Because the subscriber may run a newer PostgreSQL major version than the publisher, logical replication is the standard tool for near-zero-downtime major-version upgrades: the new cluster is built and synced live, then a brief switchover window pauses writes, drains apply, bumps sequences, and repoints the application, often keeping downtime to under a minute even on multi-terabyte databases.verified
  • Logical decoding emits events through an output plugin: pgoutput (built in since PostgreSQL 10, binary, used by the in-PostgreSQL subscription path), wal2json (contrib, JSON), and test_decoding (built in, human-readable, debugging only).verified
  • Debezium is the dominant Java framework for PostgreSQL CDC: its connector reads from a logical replication slot using pgoutput (modern default) or wal2json (legacy), turns each event into a Kafka Connect record, and emits to Kafka topics keyed by primary key; Airbyte's Postgres source uses logical replication for ELT into warehouses, and AWS DMS is the managed alternative for many targets including Kafka via MSK.stable common knowledge
  • pg_stat_replication_slots (PostgreSQL 14+) exposes per-slot decode metrics including total_txns, total_bytes, spill_txns, and spill_bytes, which let an operator see when a long publisher transaction is forcing the decoder to spill events to disk and slow down apparent decode throughput.stable common knowledge
  • A stalled subscriber pins WAL through its logical slot, so pg_wal grows on the publisher until either the subscriber confirms past the conflict or max_slot_wal_keep_size (PostgreSQL 13+) caps retention, with the trade-off that hitting the cap invalidates the slot and forces a full subscription rebuild.verified