Autovacuum Tuning

How to make the PostgreSQL autovacuum daemon keep up at scale: why the default trigger starves large tables, per-table scale factors, insert-driven autovacuum, cost-based I/O throttling, the shared worker budget, vacuum memory, daemon-level freezing controls, autoanalyze, and the diagnostics that show whether autovacuum is winning or losing.

advanced 26 min read in Configuration

postgresqlautovacuumvacuumbloatconfigurationperformance-tuningfreezingoperations

Learning outcomes

Autovacuum is the background daemon that runs VACUUM and ANALYZE for you so dead tuples get reclaimed and the planner keeps fresh statistics. On a small database the defaults are fine and you never think about it. At scale they quietly fail, and the failure looks like creeping bloat, slowing scans, and one day a database that stops accepting writes. This page is about making the daemon keep up when tables get big and writes get hot.

After studying this page, you can:

Explain why the default trigger formula starves vacuuming on a billion-row table, and fix it with a per-table scale factor.
Configure insert-driven autovacuum so append-only tables still get vacuumed and frozen.
Tune cost-based throttling so vacuum uses the I/O a fast SSD or NVMe disk can actually deliver.
Set autovacuum_max_workers and maintenance_work_mem correctly, knowing the cost budget is shared and a bigger memory pass scans indexes fewer times.
Read pg_stat_user_tables and pg_stat_progress_vacuum to tell whether autovacuum is keeping up or falling behind.
Recognize the failure modes: runaway bloat from defaults, throttle starvation on fast disks, more workers with no more budget, and an anti-wraparound vacuum blocked by a long transaction.

Before we dive in

You should already know why dead tuples exist. The mvcc-and-tuple-visibility page covers that: every UPDATE and DELETE leaves a dead tuple behind, and VACUUM is the cleanup that reclaims their space. This page does not re-teach what one VACUUM does internally; the vacuum-freezing-and-wraparound page covers vacuum internals and the wraparound danger in full. Here we focus on the autovacuum daemon: the launcher and worker processes that decide which tables to vacuum, when, and how fast.

A few terms, defined as we use them. Bloat is the wasted space dead tuples occupy on a table’s pages. The trigger is the condition autovacuum checks to decide a table needs vacuuming. A cost limit is a throttle that caps how much I/O a vacuum may do before it pauses. Freezing is the process that marks very old tuples as permanently visible so transaction ids can be recycled; wraparound is the catastrophe that freezing exists to prevent. reltuples is PostgreSQL’s estimate of a table’s row count, kept in pg_class and refreshed by vacuum and analyze. Hold onto these. Almost every parameter below is a knob on one of them.

Mental Model

The tempting wrong model is that autovacuum is a fixed background chore: it “runs every minute” and either keeps up or it does not, and the only lever you have is whether it is on. Under that model, when bloat grows, the instinct is to add more workers, the way you would add more threads to a slow job.

That model is wrong on both counts. Autovacuum is not a timer that vacuums everything every minute. Every autovacuum_naptime (one minute by default) the launcher wakes, looks at each table’s dead-tuple count against a per-table trigger, and vacuums only the tables that crossed it. And the daemon is not bottlenecked on the number of workers. It is bottlenecked on a shared I/O budget that all workers draw from. Adding a fourth worker to a three-worker pool with the same total cost_limit just slices the same budget four ways instead of three. Each table now gets vacuumed slower.

The better model is a fleet of cleaning crews sharing one fuel tank. The trigger decides which rooms are dirty enough to clean. The number of crews decides how many rooms you can clean at once. But the fuel tank, the cost limit, decides how fast any cleaning actually happens, and it is shared. To make vacuuming faster you usually do not hire more crews. You make the trigger fire sooner (so rooms never get filthy) and you fill the tank (so the crews can move). Keep this picture. It tells you which knob to reach for every time.

Breaking it down

1. Why the defaults fail on a large table

Start with the formula, because every default failure traces back to it. Autovacuum decides a table needs vacuuming when its dead-tuple count crosses a threshold computed like this:

trigger = autovacuum_vacuum_threshold
        + autovacuum_vacuum_scale_factor * reltuples

The defaults are autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2. The threshold is a flat floor; the scale factor is the part that scales with table size. So the rule is roughly “vacuum once 20 percent of the rows are dead, plus a small fixed slack.”

Twenty percent sounds reasonable until you put a real row count in. Walk it up:

A 1,000-row table triggers at 50 + 0.2 * 1000 = 250 dead tuples. Fine.
A 1-million-row table triggers at about 200,050 dead tuples. Already a lot of slack, but survivable.
A 1-billion-row table triggers at about 200 million dead tuples.

Read that last line again. On a billion-row table, the default settings let 200 million dead tuples accumulate before autovacuum even looks at it. That is a fifth of the table sitting dead on its pages. By then the table is badly bloated, scans are dragging through dead weight, and the single vacuum that finally fires has an enormous amount of work to do all at once. The defaults did not break. They did exactly what 0.2 times a billion says to do.

The slider below lets you feel the cliff. Drag the table size up and watch the trigger threshold climb with it.

The default trigger grows with the table

Table size1000000 rows

1000 rows1000000000 rows

Dead tuples before autovacuum fires (threshold = 50 + 0.2 * rows)200,050 dead tuples

Small table: the default 20 percent is harmless

The lesson is not “lower the global scale factor for everyone.” A single global value cannot fit both a 1,000-row lookup table and a 1-billion-row events table: what is too aggressive for one is far too lax for the other. The fix is per-table, and that is the next rung.

2. Per-table tuning: the one technique that matters most

Here is the single most important autovacuum technique, and most production tuning is some version of it. You override the scale factor for the big, hot tables, and you leave everyone else on the defaults.

You set it with table storage parameters:

-- A billion-row events table: vacuum once 1 percent is dead, not 20 percent.
ALTER TABLE events SET (autovacuum_vacuum_scale_factor = 0.01);

-- An even hotter table can go lower still.
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.005,
  autovacuum_vacuum_threshold = 1000
);

With autovacuum_vacuum_scale_factor = 0.01, the billion-row table now triggers at about 10 million dead tuples instead of 200 million, a twentyfold improvement. Drop it to 0.005 and it triggers at 5 million; to 0.001 and it triggers at 1 million. The very largest, hottest tables in busy systems often run at 0.005 or 0.001 precisely so vacuum fires early and often, keeping each pass small and the table close to its live size.

There is a real trade-off in how low you go. A smaller scale factor means more frequent vacuums, and each vacuum has fixed overhead: it must scan the table’s indexes at least once. Set it absurdly low on a table with many large indexes and you can spend more time in index scans than you save in reduced bloat. The sweet spot keeps the table from ever getting filthy without vacuuming it so often that index maintenance dominates. Start at 0.01 for a large hot table, measure, and tighten only if dead tuples still climb between runs.

The mirror-image mistake is setting an aggressive value on a tiny table where it does nothing useful. A 500-row config table does not need scale_factor = 0.001; the flat threshold of 50 already governs it, and over-tuning it just adds noise. Per-table tuning is a scalpel for your handful of giant hot tables, not a blanket policy.

3. Insert-driven autovacuum for append-only tables

So far the trigger only counts dead tuples, which come from updates and deletes. That leaves a blind spot: a table you only ever INSERT into. It produces no dead tuples, so the classic trigger never fires, yet it still needs vacuuming. Why? Two reasons. Vacuum sets the visibility map, the per-page record of which pages hold only all-visible tuples, and that map is what enables index-only scans and lets a later vacuum skip pages. And on an append-only table, freezing must eventually happen or transaction ids run toward wraparound.

Before PostgreSQL 13, append-only tables were a known trap: they sat unvacuumed for ages, their visibility maps went stale, index-only scans stopped working, and the only vacuum that ever touched them was the emergency anti-wraparound one. Version 13 added an insert-driven trigger to fix exactly this. It mirrors the dead-tuple formula but counts tuples inserted since the last vacuum:

insert trigger = autovacuum_vacuum_insert_threshold
               + autovacuum_vacuum_insert_scale_factor * reltuples

The defaults are autovacuum_vacuum_insert_threshold = 1000 and autovacuum_vacuum_insert_scale_factor = 0.2. So a table accumulating inserts gets vacuumed on insert volume alone, even with zero deletes. On a large append-only table you tune this the same way you tune the dead-tuple scale factor, lowering the insert scale factor so the visibility map and freezing stay current:

-- A high-volume append-only log table: vacuum on insert volume too.
ALTER TABLE access_log SET (autovacuum_vacuum_insert_scale_factor = 0.01);

If you run anything older than PostgreSQL 13, this trigger does not exist, and append-only tables need a scheduled manual VACUUM or you wait for the anti-wraparound vacuum to do it the hard way.

4. Cost-based throttling, the part most people get wrong

This is the rung that separates engineers who think autovacuum is “on or off” from those who can actually make it keep up. Autovacuum does not run flat out. It throttles itself with a cost-based budget so that background cleanup does not saturate your disks and hurt foreground queries.

Here is the accounting. As vacuum works, it adds up a running cost for the pages it touches:

Action	Parameter	Default cost
Page found in shared buffers	`vacuum_cost_page_hit`	1
Page read from disk	`vacuum_cost_page_miss`	2
Page dirtied (written)	`vacuum_cost_page_dirty`	20

When the accumulated cost exceeds autovacuum_vacuum_cost_limit, the worker sleeps for autovacuum_vacuum_cost_delay milliseconds, then resets the counter and continues. The defaults are autovacuum_vacuum_cost_limit = 200 and autovacuum_vacuum_cost_delay = 2ms. A special value, autovacuum_vacuum_cost_limit = -1, means “fall back to the plain vacuum_cost_limit.” Dirtying pages is the expensive action by design, because writing is what actually stresses the disk.

Now do the arithmetic the way most people never do. A limit of 200 lets vacuum touch about 200 buffer hits, or 100 disk reads, or 10 page dirties, before it sleeps 2 milliseconds. Those defaults were chosen for spinning disks a decade ago. On a modern NVMe array that can do hundreds of thousands of IOPS, this throttle is absurdly conservative: vacuum spends most of its life asleep while the disk sits nearly idle. This is the classic “autovacuum is running but never catching up” symptom on fast hardware. The daemon is awake and working; it is just hobbled.

The fix is to raise the budget so vacuum can use the I/O you actually have. Two equivalent levers: raise autovacuum_vacuum_cost_limit or lower autovacuum_vacuum_cost_delay.

# postgresql.conf, for a server on fast SSD/NVMe
autovacuum_vacuum_cost_limit = 2000     # 10x the default budget per sleep
autovacuum_vacuum_cost_delay = 2ms      # keep the sleep short

Raising cost_limit from 200 to 1000 or 2000 gives vacuum five to ten times the work between naps, and on fast storage that is often the difference between falling behind and keeping up. The model below lets you compute the effective throughput and see the throttle lift.

How much I/O the throttle allows

cost_limit200

2004000

cost_delay (ms)2 ms

1 ms20 ms

Pages dirtied per second5,000 pages/s

Default budget: fine for spinning disks, starves vacuum on NVMe

A word of caution in the other direction. If you raise the budget too far on a disk that cannot sustain it, vacuum’s writes compete with foreground queries and you trade bloat for latency spikes. The right value is the most I/O your disk can give vacuum without hurting the queries that pay the bills. Measure under real load, do not guess.

You can run several vacuums at once. autovacuum_max_workers (default 3) sets how many tables can be vacuumed in parallel by separate worker processes. The launcher starts a new worker when there is work and a free slot, wakes on its schedule, and checks for tables over their trigger.

Here is the subtlety that wrecks a lot of tuning attempts. The cost limit is shared across all workers, not granted per worker. PostgreSQL takes the total autovacuum_vacuum_cost_limit and divides it among the currently running workers. So if you raise autovacuum_max_workers from 3 to 6 without touching the cost limit, you have not given autovacuum more total I/O. You have split the same budget across twice as many workers, and each individual vacuum now runs at half speed. Big tables take longer, not shorter.

flowchart TB
    L["Autovacuum launcher (wakes every naptime)"]
    B["Shared cost budget: total cost_limit"]
    W1["Worker 1: table A"]
    W2["Worker 2: table B"]
    W3["Worker 3: table C"]
    L --> W1
    L --> W2
    L --> W3
    B -. divided among .-> W1
    B -. divided among .-> W2
    B -. divided among .-> W3

So workers and budget are two different knobs for two different problems. Raise autovacuum_max_workers when you have many tables crossing their trigger at once and they queue up waiting for a free worker. Raise autovacuum_vacuum_cost_limit when any single vacuum is too slow. If you add workers, raise the total cost limit alongside, or you have made every vacuum slower while feeling productive.

The other timing knob is autovacuum_naptime, default one minute. It is how long the launcher sleeps between rounds of checking tables. On most systems the default is right. Lower it only if you have so many databases that one minute is not enough to cycle through them, since the launcher tries to visit each database roughly once per naptime.

6. Memory: how much a single pass can collect

Vacuum needs memory to remember the dead tuples it finds. As it scans the heap, it collects the tuple ids of dead tuples into a buffer. Then it scans every index to remove pointers to those dead tuples, and finally returns to the heap to reclaim the space. The size of that buffer is maintenance_work_mem, or autovacuum_work_mem if you set it (autovacuum workers use autovacuum_work_mem when it is not -1, otherwise they fall back to maintenance_work_mem).

Why does the buffer size matter so much? Because if the buffer fills before vacuum finishes scanning the heap, vacuum must stop, run a full pass over every index to clear what it has, empty the buffer, and resume the heap scan. Each index scan is expensive. So a buffer that is too small forces multiple index passes over one vacuum, multiplying the cost on a table with large indexes. A bigger buffer holds more dead tuple ids, so one vacuum collects more before it has to pause, and it scans the indexes fewer times. On a big table the difference between one index pass and five is enormous.

# postgresql.conf
maintenance_work_mem = 1GB      # used by manual VACUUM, CREATE INDEX, etc.
autovacuum_work_mem = 1GB       # used by autovacuum workers specifically

There is a practical ceiling worth knowing. In the dead-tuple-id storage used through PostgreSQL 16, the collection was capped at roughly 1 GB regardless of how high you set maintenance_work_mem, so values above about 1 GB bought a single autovacuum pass nothing extra. Setting it to 1 GB was the practical maximum that helped one pass. (PostgreSQL 17 replaced that storage with a more compact structure that both uses memory more efficiently and lifts the old 1 GB cap, so a single pass can track far more dead tuples; on 17 a higher value can genuinely help.) Either way, remember that autovacuum_work_mem is multiplied by the number of running workers: set it to 1 GB with six workers and autovacuum can use up to 6 GB at once. Size it against your RAM, not in isolation.

7. Freezing controls at the daemon level

Freezing is the deep topic of the vacuum-freezing-and-wraparound page, so here we cover only the daemon-level knobs and the trade-off they set. The short version: PostgreSQL recycles transaction ids, and a tuple older than a certain age must be frozen (marked permanently visible) before its creating transaction id can be reused. If freezing falls too far behind, the system approaches wraparound and will eventually stop accepting writes to protect your data.

Two parameters govern the daemon’s freezing behavior. vacuum_freeze_min_age (default 50 million) is how old a tuple must be before a normal vacuum bothers to freeze it. autovacuum_freeze_max_age (default 200 million) is the hard line: when a table’s oldest unfrozen transaction id reaches this age, autovacuum launches a special anti-wraparound vacuum on it. That vacuum is not optional and cannot be skipped the way a normal autovacuum can be. It runs even if autovacuum is otherwise disabled, because the alternative is data loss.

The trade-off in autovacuum_freeze_max_age cuts both ways. Set it too low and you force anti-wraparound vacuums often, re-freezing tuples and burning I/O on work you did not need yet. Set it too high and you let unfrozen age build up, so when the anti-wraparound vacuum finally fires it has a colossal amount of freezing to do all at once, and you have less safety margin before wraparound. Many large deployments raise it (to 1 billion or more) to make anti-wraparound vacuums rarer and larger, but only after ensuring normal vacuuming and freezing keep up so the table never actually reaches the limit by surprise.

-- Make anti-wraparound vacuums rarer on a huge table that vacuums well otherwise.
ALTER TABLE events SET (autovacuum_freeze_max_age = 1000000000);

stateDiagram-v2
    [*] --> Normal
    Normal: Normal autovacuum (skippable, throttled)
    AntiWrap: Anti-wraparound autovacuum (forced, must finish)
    Normal --> AntiWrap: oldest xid age reaches autovacuum_freeze_max_age
    AntiWrap --> Normal: freezing done, age reset
    note right of AntiWrap: cannot be skipped; blocks if a long transaction pins old xids

The danger to flag now, covered fully on the wraparound page, is the interaction with long transactions. An anti-wraparound vacuum cannot advance past tuples that a still-running transaction might need. So a long-open transaction can pin the horizon while the forced vacuum runs and runs without making progress, and the wraparound clock keeps ticking. An anti-wraparound vacuum stuck behind a long transaction is a genuine emergency: end the transaction.

8. Keeping statistics fresh with autoanalyze

Autovacuum has a second job that is easy to forget: it also runs ANALYZE to keep the planner’s statistics current. Stale statistics are their own performance bug. If the planner thinks a table has 10,000 rows when it has 10 million, it picks the wrong join order and the wrong scan type, and a query that should take milliseconds takes minutes. The-query-planner-and-cost-model page covers how the planner uses those statistics; here we make sure they stay fresh.

The trigger mirrors the vacuum trigger but counts rows changed since the last analyze (inserts, updates, and deletes all count):

analyze trigger = autovacuum_analyze_threshold
                + autovacuum_analyze_scale_factor * reltuples

The defaults are autovacuum_analyze_threshold = 50 and autovacuum_analyze_scale_factor = 0.1. So by default a table is re-analyzed once 10 percent of its rows have changed. On a very large table this has the same scaling problem as the vacuum trigger: 10 percent of a billion rows is 100 million changes before the planner gets fresh numbers, which is far too stale for a table whose data distribution shifts. Lower the per-table analyze scale factor on big or fast-moving tables for the same reason you lower the vacuum one:

ALTER TABLE events SET (autovacuum_analyze_scale_factor = 0.02);

A subtle case to watch: a table whose row count is stable but whose distribution drifts, like a status column that moves from mostly pending to mostly done over a day. Few rows are inserted or deleted, so the analyze trigger barely moves, yet the statistics the planner relies on go badly stale. For those tables, a scheduled manual ANALYZE or a lower analyze threshold keeps the planner honest.

9. Diagnostics: is autovacuum keeping up?

Tuning blind is guessing. PostgreSQL exposes exactly what you need to see whether the daemon is winning or losing. Start with pg_stat_user_tables, which carries the running tallies autovacuum itself watches:

SELECT relname,
       n_live_tup,
       n_dead_tup,
       round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
       last_autovacuum,
       autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

This is your first read of “who is bloating.” If n_dead_tup is large and climbing across snapshots while last_autovacuum stays old, that table’s trigger is set too high for its write rate: lower its scale factor. If last_autovacuum is recent but n_dead_tup is still huge, autovacuum is running but not keeping up, which usually points at the cost throttle (rung 4) on fast disks, not the trigger.

To watch a vacuum that is in flight, use pg_stat_progress_vacuum. It reports the live phase and how far the scan has gotten:

SELECT p.pid, t.relname, p.phase,
       p.heap_blks_total, p.heap_blks_scanned,
       round(100.0 * p.heap_blks_scanned / nullif(p.heap_blks_total, 0), 1) AS pct_scanned,
       p.num_dead_item_ids
FROM pg_stat_progress_vacuum p
JOIN pg_stat_user_tables t ON t.relid = p.relid;

The phase column is the tell. scanning heap is the first heap pass; vacuuming indexes and vacuuming heap are the index-and-reclaim work. If you see a vacuum bounce between vacuuming indexes and scanning heap repeatedly on one table, its maintenance_work_mem buffer is filling and forcing multiple index passes (rung 6). Raise it.

Two cross-cutting checks complete the picture. First, measure vacuum lag by trending n_dead_tup over time per table: a flat or sawtooth line means autovacuum is keeping pace; a steadily rising line means it is losing. Second, remember the xmin horizon from the mvcc-and-tuple-visibility page. No amount of autovacuum tuning can remove a dead tuple that a long-running transaction might still need to see. If dead tuples climb everywhere at once despite aggressive settings, the cause is almost certainly a transaction holding the horizon back, not your autovacuum config. Confirm it before you touch a single parameter:

-- The oldest transaction pinning the xmin horizon, database-wide.
SELECT pid, state, age(backend_xmin) AS xmin_age,
       now() - xact_start AS txn_age, query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC
LIMIT 5;

Reading the symptoms: which knob to turn

Check yourself

A billion-row table on NVMe storage shows last_autovacuum updating every few minutes, yet n_dead_tup never drops below 40 million. What is the most likely fix?

Mastery Questions

Your busiest table is a 2-billion-row events table on an NVMe-backed server. Monitoring shows it bloating steadily: on-disk size keeps growing faster than the data, and n_dead_tup trends upward all day. last_autovacuum on the table updates roughly once an hour. Walk through how you would diagnose and fix this, naming the parameters.

Answer. Two defaults are working against you, and you address them in order. First the trigger: at the default autovacuum_vacuum_scale_factor = 0.2, this table does not even qualify for vacuuming until about 400 million dead tuples exist, so it is filthy long before autovacuum touches it. Set a per-table scale factor, ALTER TABLE events SET (autovacuum_vacuum_scale_factor = 0.01) or lower, so vacuum fires at roughly 20 million dead tuples instead and each pass stays smaller. Second, and the reason last_autovacuum updating hourly still is not enough: the cost throttle. On NVMe the default autovacuum_vacuum_cost_limit = 200 leaves vacuum asleep most of the time while the disk sits idle. Raise it to 1000 or 2000 (or lower autovacuum_vacuum_cost_delay) so vacuum uses the I/O the disk can deliver. If a single vacuum still scans the table’s indexes several times per run, raise maintenance_work_mem or autovacuum_work_mem toward 1 GB so each pass collects more dead tuple ids before it has to drain through the indexes. Confirm at every step with pg_stat_user_tables (is n_dead_tup trending down between runs now?) and pg_stat_progress_vacuum (is the vacuum spending its time in scanning heap rather than looping through vacuuming indexes?). Before any of this, rule out a long transaction pinning the xmin horizon with the pg_stat_activity query, because if one is, no tuning will help.
A teammate reports that autovacuum “isn’t keeping up” and proposes raising autovacuum_max_workers from 3 to 8. The cost limit is at its default. Why is this likely to make things worse, and what would you do instead?

Answer. The flaw is that autovacuum_vacuum_cost_limit is shared across all running workers, not granted per worker. PostgreSQL divides the total budget among however many workers are active. Going from 3 to 8 workers with the same cost_limit = 200 means each worker now gets one-eighth of the budget instead of one-third, so every individual vacuum runs roughly two-and-a-half times slower. The big, hot tables that were already struggling now take even longer to vacuum. More workers only helps when the problem is that many tables cross their trigger simultaneously and queue up waiting for a free worker slot; it does nothing for a single table whose vacuum is too slow. The correct move depends on the real symptom. If individual vacuums are slow (the common case on fast disks), raise the total autovacuum_vacuum_cost_limit so there is more I/O budget to go around. If many tables genuinely queue at once, then raise autovacuum_max_workers, but raise the total cost_limit in proportion so each worker still gets enough budget to make progress. Adding workers without adding budget is the textbook way to feel busy while getting slower.
You run an append-only audit_log table: rows are only ever inserted, never updated or deleted. A colleague argues it never needs vacuuming because it produces no dead tuples. On PostgreSQL 12 this caused a production incident; on PostgreSQL 16 it is fine by default. Explain both why the table does need vacuuming and what changed between those versions.

Answer. No dead tuples does not mean no vacuum needed. An append-only table still requires vacuuming for two reasons. First, vacuum sets the visibility map, the per-page record of all-visible pages; without it kept current, index-only scans on the table stop working and later vacuums cannot skip already-clean pages. Second, and more dangerous, every inserted tuple carries a transaction id that must eventually be frozen, or the table marches toward wraparound. The classic dead-tuple trigger only counts updates and deletes, so on PostgreSQL 12 this table never qualified for a normal autovacuum: its visibility map went stale, and the only thing that ever vacuumed it was the emergency anti-wraparound vacuum when its oldest xid hit autovacuum_freeze_max_age, which is exactly the surprise that caused the incident. PostgreSQL 13 fixed this by adding insert-driven autovacuum: autovacuum_vacuum_insert_threshold (default 1000) plus autovacuum_vacuum_insert_scale_factor (default 0.2) times reltuples, a trigger that counts inserts since the last vacuum. So on 16 the table gets vacuumed on insert volume alone, keeping its visibility map fresh and freezing tuples gradually instead of waiting for a forced anti-wraparound pass. On a high-volume version you would still lower autovacuum_vacuum_insert_scale_factor per table so it stays current, but the default now prevents the failure mode entirely.

Recommended next

Reading EXPLAIN ANALYZE
Builds directly on this page: Reading EXPLAIN ANALYZE is the next step in the PostgreSQL performance ladder.

Sources & evidence14 claims · 3 cited

Parameter names, defaults, and behaviors are grounded in the PostgreSQL documentation for Automatic Vacuuming runtime parameters, Routine Vacuuming, and Resource Consumption (cost-based delay). The PostgreSQL 13 introduction of insert-driven autovacuum and the PostgreSQL 16-vs-17 change in dead-tuple-id storage and its ~1 GB practical cap are treated as stable common knowledge where the listed runtime sections do not state them; illustrative trigger arithmetic (for example 0.2 times one billion) is computed from the documented formula.

Autovacuum triggers a vacuum when dead tuples exceed autovacuum_vacuum_threshold (default 50) plus autovacuum_vacuum_scale_factor (default 0.2) times the estimated row count, so a one-billion-row table waits for roughly 200 million dead tuples before its first autovacuum.verified
A per-table autovacuum_vacuum_scale_factor set with ALTER TABLE (for example 0.01, 0.005, or 0.001) overrides the global default so large hot tables are vacuumed at a few percent dead rather than 20 percent, because one global value cannot fit both a 1000-row and a 1-billion-row table.verified
PostgreSQL 13 added an insert-driven autovacuum trigger, autovacuum_vacuum_insert_threshold (default 1000) plus autovacuum_vacuum_insert_scale_factor (default 0.2) times the row count, so append-only tables with no dead tuples are still vacuumed to set the visibility map and freeze tuples.verified
Vacuum accrues cost per page (vacuum_cost_page_hit 1, vacuum_cost_page_miss 2, vacuum_cost_page_dirty 20) and sleeps for autovacuum_vacuum_cost_delay (default 2ms) whenever the accumulated cost exceeds autovacuum_vacuum_cost_limit (default 200, or -1 to inherit vacuum_cost_limit).verified
On fast SSD or NVMe storage the default cost limit of 200 leaves vacuum asleep most of the time and is far too conservative, so raising autovacuum_vacuum_cost_limit to roughly 1000 to 2000 (or lowering the delay) lets vacuum keep up with a heavy write workload.verified
The cost-based delay budget (autovacuum_vacuum_cost_limit) is shared and divided among all running autovacuum workers, so raising autovacuum_max_workers (default 3) without raising the total cost limit only slices the same I/O budget thinner and makes each individual vacuum slower.verified
The autovacuum launcher wakes every autovacuum_naptime (default one minute) to check tables against their triggers and tries to visit each database roughly once per naptime.verified
Autovacuum workers use autovacuum_work_mem when it is not -1, otherwise maintenance_work_mem, to buffer the ids of dead tuples; a larger buffer collects more dead tuples per pass so vacuum scans the table's indexes fewer times, and autovacuum_work_mem is consumed per running worker.verified
Through PostgreSQL 16 the dead-tuple-id collection was effectively capped near 1 GB regardless of how high maintenance_work_mem was set, so values above about 1 GB did not help a single pass; PostgreSQL 17 replaced that storage with a more compact structure that lifts the cap so a higher value can genuinely help.stable common knowledge
When a table's oldest unfrozen transaction id reaches autovacuum_freeze_max_age (default 200 million), autovacuum launches an anti-wraparound vacuum that cannot be skipped and runs even when autovacuum is otherwise disabled, while vacuum_freeze_min_age (default 50 million) sets how old a tuple must be before a normal vacuum freezes it.verified
Setting autovacuum_freeze_max_age too low forces frequent anti-wraparound vacuums that re-freeze tuples and waste I/O, while setting it too high lets unfrozen age build up so the eventual forced vacuum is huge and the safety margin before wraparound shrinks.verified
Autovacuum also runs ANALYZE, triggered when rows changed since the last analyze exceed autovacuum_analyze_threshold (default 50) plus autovacuum_analyze_scale_factor (default 0.1) times the row count, so the planner does not work from stale statistics.verified
pg_stat_user_tables reports n_dead_tup, n_live_tup, last_autovacuum, and autovacuum_count per table, and pg_stat_progress_vacuum exposes the live phase, heap_blks_total, and heap_blks_scanned of an in-flight vacuum, which together show whether autovacuum is keeping up.verified
No autovacuum tuning can remove a dead tuple that a long-running or idle-in-transaction transaction might still need, because that transaction holds back the xmin horizon, so dead tuples climbing on every table at once points to a held-open transaction rather than an autovacuum misconfiguration.verified

Cited sources

PostgreSQL Documentation: Automatic Vacuuming Parameters · PostgreSQL Global Development Group
PostgreSQL Documentation: Routine Vacuuming · PostgreSQL Global Development Group
PostgreSQL Documentation: Resource Consumption (Memory) · PostgreSQL Global Development Group