Tuning the Planner Cost Settings

How to change PostgreSQL's planner cost constants so its plan choices match real hardware: lowering random_page_cost on SSD, sizing effective_cache_size and effective_io_concurrency, tuning parallel query and JIT, and rolling changes out safely.

Learning outcomes

The query-planner-and-cost-model page taught you how the planner scores plans: it costs each candidate against a model of your hardware and runs the cheapest estimate. That model is built from a handful of numeric constants, and they ship with defaults chosen for a machine almost nobody runs anymore. This page is about changing those constants so the planner’s model matches the disks and memory you actually have. Get them right and the planner picks good plans on its own. Leave them at the defaults on modern hardware and it will systematically avoid your best indexes.

After studying this page, you can:

  • Explain why the default random_page_cost of 4.0 makes the planner shun indexes on SSD and NVMe, and choose a correct value.
  • Set effective_cache_size and effective_io_concurrency to match your real memory and storage, and predict how each biases plan choice.
  • Decide when raising parallel-query limits speeds up analytics and when it slows down OLTP.
  • Judge when to disable JIT or raise its cost threshold, and recognize the latency it adds to short queries.
  • Use the enable_* flags to prove a costing problem in EXPLAIN, and roll out a cost change without a restart or an outage.

Before we dive in

You should have read the query-planner-and-cost-model page, or already know that PostgreSQL is cost-based: it enumerates candidate plans, assigns each an estimated cost in abstract units, and runs the lowest. You should know how to read the four numbers on an EXPLAIN node (cost=startup..total rows=estimate width=bytes). You do not need to know the scan or join algorithms in detail.

A few terms, defined as we use them. A cost constant is one of the planner’s numeric parameters, like random_page_cost, that sets the price of one kind of work. The whole cost model is relative: seq_page_cost (default 1.0) is the yardstick, the cost of reading one page in sequential order, and every other constant is priced against it. A planner-only parameter is one that changes the planner’s estimates but allocates no memory and moves no data; effective_cache_size is the clearest example. Hold those three. The rest of the page is about choosing values for cost constants and planner-only parameters so the estimate matches your machine.

Mental Model

The wrong model, and it is an easy one to fall into, is that these settings make queries faster directly, the way raising work_mem lets a sort stay in memory. Under that model you would turn random_page_cost down to “speed up index access,” as if the number controlled the disk.

It does not. The better model is that you are calibrating a price list the planner shops from. The planner never measures your disk. It reads the price list, adds up what each candidate plan would cost at those prices, and buys the cheapest. The numbers do not change how fast anything runs. They change which plan the planner believes is cheapest. If the price list says random reads cost four times sequential reads, the planner will avoid plans full of random reads, even when your NVMe drive does random reads almost as fast as sequential ones. The drive did not get slower. The planner was just told it is slow.

Keep this picture. Tuning cost constants is correcting the planner’s beliefs about your hardware, not tuning the hardware. Once that clicks, every setting on this page reads the same way: which belief am I fixing, and which plans will the planner prefer once it believes the truth?

Breaking it down

1. The constants are a hardware model, and the defaults are old

Start with where the defaults came from, because that is the whole reason you need to change them. The cost constants were calibrated for a server with spinning hard drives and a few gigabytes of RAM, the typical machine when the model was written. Every default encodes an assumption about that hardware.

Here are the planner cost constants and their PostgreSQL 16 and 17 defaults, the ones this page tunes.

# postgresql.conf: planner cost constants and planner-only parameters (defaults)
seq_page_cost = 1.0              # read one page sequentially (the yardstick, leave it)
random_page_cost = 4.0           # read one page at a random offset
cpu_tuple_cost = 0.01            # process one row through the executor
cpu_index_tuple_cost = 0.005     # process one index entry
cpu_operator_cost = 0.0025       # evaluate one operator or function
effective_cache_size = '4GB'     # how much data the planner assumes is cached
effective_io_concurrency = 1     # concurrent I/O requests for bitmap-heap prefetch

Two of these are the high-leverage ones on modern hardware, and both encode a storage assumption that no longer holds. The random_page_cost of 4.0 says a random page read costs four times a sequential one, which was true for a disk that had to physically seek the head. The effective_cache_size of 4GB says only about four gigabytes of your data is likely cached, which was generous in 2005 and tiny on a machine with 128 GB of RAM today. Leave them at the defaults on an SSD with lots of memory and the planner is reasoning about a server you do not own.

The rule for this page: you keep seq_page_cost at 1.0 as the fixed yardstick and move the others relative to it. Changing the yardstick itself just rescales everything and teaches you nothing. Everything below is a ratio against that 1.0.

2. random_page_cost: the one setting that fixes the most plans

This is the single highest-impact planner setting, and the one most often left wrong. Set it correctly and a surprising number of “why won’t it use my index” complaints disappear on their own.

Here is why it matters so much. The planner uses random_page_cost to price exactly the operations an index scan is made of. An index scan walks the index, then fetches each matching row from the heap at a random page location, so its cost is dominated by random page reads. A sequential scan reads pages in order, priced at seq_page_cost. So the ratio of these two numbers, 4.0 / 1.0, is the exchange rate that decides how selective a condition must be before an index beats a scan. At 4.0, an index has to save reading four times as many pages just to break even on each fetch, which biases the planner hard toward sequential scans.

That ratio was right for spinning disks, where a random seek really did cost several times a sequential read because the drive head had to move. On SSD and NVMe there is no head to move, so the gap between random and sequential I/O nearly vanishes. The standard fix is to lower random_page_cost toward the cost of a sequential read.

# postgresql.conf: SSD / NVMe tuning, the most common single change you will make
random_page_cost = 1.1           # SSD/NVMe: random reads cost almost the same as sequential
# random_page_cost = 1.0         # aggressive; some shops on fast NVMe set it here
# random_page_cost = 4.0         # the default, correct only for spinning disks

The common value is 1.1, with 1.0 to 1.5 all defensible. Keeping it slightly above 1.0 reflects that even on SSD a random access has marginally more overhead than a sequential one, and it nudges the planner to still prefer a sequential scan on a true tie. The failure mode of leaving it at 4.0 on an SSD is the classic one: the planner believes random reads are expensive, so it avoids index scans the index was built to serve, and falls back to scanning whole tables. You see it in EXPLAIN as a Seq Scan where you expected an Index Scan, on a selective condition, with no good reason.

How random_page_cost tips index vs sequential scan
random_page_cost1.1
14
SSD / NVMe range: index scans priced fairly, the planner uses your indexes

Drag it and read the verdict as the planner’s bias. Low values price an index fetch close to a sequential read, so the planner reaches for indexes freely. High values make every random fetch look expensive, so only the most selective conditions can justify an index, and on SSD that is simply wrong about the hardware.

3. effective_cache_size: telling the planner what is already in memory

The second high-leverage setting is effective_cache_size, and the first thing to understand is what it does not do. It allocates nothing. It is a pure hint, a planner-only parameter, that tells the planner roughly how much of your data is likely sitting in memory, counting both PostgreSQL’s own shared buffers and the operating system’s file cache. PostgreSQL does not reserve that memory; it just plans as if that much cache exists.

Why the planner needs the hint: an index scan often reads the same heap and index pages many times, and if those pages are cached, the repeated reads are nearly free. A large effective_cache_size tells the planner “assume most of those repeated reads hit cache,” which lowers the estimated cost of index scans and biases the planner toward them. A small value tells the planner to assume cold reads, which makes index scans look expensive. So this setting works alongside random_page_cost: one prices a random read, the other estimates how often that read is avoided entirely by the cache.

The standard value is roughly 50 to 75 percent of total system RAM. You are estimating how much memory is realistically available to cache database pages after the OS and other processes take their share.

# postgresql.conf: on a server with 64 GB of RAM
effective_cache_size = '48GB'    # ~75% of RAM; a planner hint, allocates nothing
shared_buffers = '16GB'          # this one DOES allocate; effective_cache_size counts it plus OS cache

Note the relationship in that snippet. shared_buffers actually reserves memory for PostgreSQL’s own cache; effective_cache_size is the larger figure that includes those shared buffers plus the OS page cache. Because it allocates nothing, setting it too high only risks the planner being slightly over-optimistic about cache hits; setting it too low, at the 4GB default on a big machine, makes the planner needlessly pessimistic about index scans, the same failure direction as a too-high random_page_cost.

4. effective_io_concurrency: prefetching during bitmap scans

This setting controls a real I/O behavior, not just an estimate, and it targets one specific plan node: the bitmap heap scan. Recall the plan-node menu from the query-planner-and-cost-model page. A bitmap scan first builds a bitmap of which heap pages contain matching rows, then reads those pages. Because it knows the full list of pages it needs before it starts reading them, it can ask the operating system to fetch several at once instead of one at a time.

effective_io_concurrency is how many such reads PostgreSQL will have in flight at once during that heap fetch. The point is to keep a fast storage device busy. A single spinning disk can really only service one request at a time, so the default of 1 matches it. An SSD, and especially an NVMe drive, has many independent channels and thrives on parallel requests, so it sits idle waiting if you ask for only one page at a time.

# postgresql.conf: prefetch depth for bitmap heap scans
effective_io_concurrency = 200   # SSD/NVMe: let many heap-page reads run concurrently
# effective_io_concurrency = 1   # default; correct for a single spinning disk
maintenance_io_concurrency = 10  # the same idea for maintenance work like VACUUM

The practical range on SSD and NVMe is 100 to 300; many production systems land at 200. Leave it at 1 on NVMe and your bitmap scans fetch pages one at a time, leaving most of the drive’s parallelism unused, so a scan that could finish in one wave of concurrent reads instead crawls through them serially. There is a sibling, maintenance_io_concurrency, that applies the same prefetching to maintenance operations such as VACUUM; raise it on fast storage too.

5. The CPU cost constants, and when to touch them

These three you will rarely change, but you should understand them, because they set the balance between I/O work and CPU work in every estimate.

# postgresql.conf: CPU cost constants (defaults; usually left alone)
cpu_tuple_cost = 0.01            # process one row through the executor
cpu_index_tuple_cost = 0.005     # process one index entry
cpu_operator_cost = 0.0025       # evaluate one operator or function call

Read them as ratios against the 1.0 page-read yardstick. Processing a row in the CPU is priced at one hundredth of reading a page from disk, because in the original model the disk was the bottleneck by a wide margin. The relative weights are the point: an operator evaluation (0.0025) costs a quarter of processing a row (0.01), which costs a hundredth of a page read.

The one you might deliberately raise is cpu_operator_cost, and only for a specific workload. If your queries call expensive functions per row, say a heavy regex, a slow user-defined function, or complex expression arithmetic over millions of rows, the real cost of each operator evaluation is far above the default. Raising cpu_operator_cost tells the planner those per-row evaluations are pricey, which biases it toward plans that evaluate the expression on fewer rows, for example filtering hard with an index before the expensive operator runs. It is a niche tuning knob, but for a CPU-bound expression-heavy workload it can change the plan in your favor. Leave the other two alone unless you have measured a specific reason.

6. Parallel query: paying for workers that earn their keep

Parallel query lets one query split a big scan, join, or aggregate across several worker processes, then combine the results at a Gather node, as the query-planner-and-cost-model page introduced. The planner only goes parallel when it estimates the plan is expensive enough that the speedup outweighs the cost of starting workers and shipping rows between them. The settings below control both how many workers are available and how eager the planner is to use them.

# postgresql.conf: parallel query (PostgreSQL 16/17 defaults)
max_parallel_workers_per_gather = 2    # workers one Gather node may use
max_parallel_workers = 8               # total parallel workers across the cluster
max_worker_processes = 8               # the hard ceiling all background workers share
parallel_setup_cost = 1000             # fixed cost the planner charges to start workers
parallel_tuple_cost = 0.1              # cost to ship one row from a worker to the leader
min_parallel_table_scan_size = '8MB'   # a table must be at least this big to go parallel

Two costs are why parallelism is not free, and they are exactly the two failure points. The parallel_setup_cost of 1000 is a large fixed charge the planner adds to any parallel plan, modeling the real time to spin up workers; it is why short queries never go parallel, since the setup would dwarf the work. The parallel_tuple_cost of 0.1 charges for every row a worker passes back to the leader, modeling the overhead of moving rows between processes; it is why parallelism hurts when a plan produces many rows that must all funnel through the leader.

Parallelism wins on big sequential scans and large aggregates over many cores, where each worker chews through a slice of the table and only a small summarized result comes back. It hurts on short OLTP queries that touch few rows, where the worker startup and row-shipping overhead exceed any savings. For an analytics workload on a many-core box, you raise the worker limits and let more queries go parallel.

# postgresql.conf: an analytics box with many cores, tuned for parallelism
max_parallel_workers_per_gather = 8    # let one big query fan out to 8 workers
max_parallel_workers = 16              # raise the cluster-wide pool
max_worker_processes = 16              # raise the shared ceiling to match

Note the ordering constraint: max_parallel_workers cannot exceed max_worker_processes, and max_parallel_workers_per_gather is capped by max_parallel_workers, so raise the ceilings together. The over-parallelizing failure is the mirror image of under-parallelizing: crank these up on a busy OLTP system and short queries that should never have forked now pay worker startup on every call, and you exhaust the worker pool so the queries that truly benefit get starved.

When parallel query helps and when it hurts
A SELECT with aggregation scanning a 50 GB fact table on a 16-core box. Each worker scans a slice, computes a partial aggregate, and returns one small summary row. The setup cost of 1000 is trivial against minutes of scan work, and almost nothing is shipped through the leader. Raise max_parallel_workers_per_gather to put more cores on it.

7. JIT: compiling expressions for big analytic queries

PostgreSQL can just-in-time compile the expressions in a query, turning row-by-row interpreted evaluation into native machine code, which speeds up queries that evaluate the same expression over millions of rows. JIT is on by default and is gated by cost thresholds, so it only kicks in for plans the planner already considers expensive.

# postgresql.conf: JIT (on by default, gated by plan cost)
jit = on                         # master switch
jit_above_cost = 100000          # only JIT a plan whose total cost exceeds this
jit_inline_above_cost = 500000   # inline small functions above this cost
jit_optimize_above_cost = 500000 # run expensive optimization passes above this cost

Read the thresholds as a staircase. Below a total plan cost of 100000, JIT does nothing. Above it, PostgreSQL compiles the expressions. Above 500000, it also inlines functions and runs heavier optimization passes, because the query is expensive enough to repay that extra compile effort. The design intent is that only big analytic queries, where compilation is amortized over millions of rows, ever pay to compile.

The trap is that compiling takes real wall-clock time before the query runs, and that latency is pure overhead if the query then finishes quickly. A query whose cost estimate creeps just past 100000 but actually runs in tens of milliseconds can spend more time compiling than executing. On an OLTP system full of short-but-not-trivial queries, this shows up as mysterious latency spikes: the query did the same work as always, but now it paused to JIT first. The fixes are direct. Either turn JIT off for that workload, or raise jit_above_cost so only genuinely huge plans trigger it.

# postgresql.conf: OLTP system seeing JIT compile latency on borderline queries
jit = off                        # simplest fix: no compile latency at all
# jit_above_cost = 1000000       # alternative: only JIT truly large analytic plans

You confirm JIT is the culprit with EXPLAIN (ANALYZE), which prints a JIT section with the time spent on Generation, Inlining, Optimization, and Emission. If those add up to a meaningful slice of total execution time on a query you expected to be fast, JIT is your latency.

EXPLAIN (ANALYZE) SELECT sum(amount) FROM orders WHERE region = 'EU';

 ...
 JIT:
   Functions: 4
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.2 ms, Inlining 8.4 ms, Optimization 24.1 ms, Emission 12.0 ms, Total 45.7 ms
 Execution Time: 51.3 ms

In that output, JIT spent 45.7 ms compiling for a query that ran in 51.3 ms total: almost all the time was compilation. That is the signature that tells you to raise the threshold or disable JIT for this query.

8. default_statistics_target: sharper estimates feed every cost

Every cost on this page is built on top of a row estimate, so the accuracy of the cost constants only matters if the row counts feeding them are right. default_statistics_target controls how detailed the per-column statistics are that the planner reads, and it is set in the same configuration file, so it belongs here even though the statistics-and-row-estimation page covers the machinery in depth.

# postgresql.conf
default_statistics_target = 100  # default: histogram buckets and most-common-values kept per column

The number is, roughly, how many histogram buckets and how many most-common-values the ANALYZE command records for each column. The default of 100 is a balance: enough resolution for most columns, cheap enough to gather quickly. Raising it, globally or per column, gives the planner a finer-grained picture, which produces better selectivity estimates on skewed or unevenly distributed columns. The cost is that ANALYZE runs slower and the statistics take more space, so you usually raise it surgically on the few columns whose estimates actually go wrong, rather than across the board.

-- Raise resolution on one problem column, then refresh its statistics
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;

The connection to the rest of this page is the point. Tuning random_page_cost to price index scans fairly does nothing if the planner thinks a selective condition matches half the table because its histogram is too coarse to see the skew. Good cost constants and good statistics are two halves of the same goal: a cost estimate that matches reality.

9. The enable flags as a diagnostic, not a setting

PostgreSQL exposes a set of enable_* flags, one per plan-node type: enable_seqscan, enable_indexscan, enable_nestloop, enable_hashjoin, enable_bitmapscan, and more. They look like on/off switches for plan types, and that framing is exactly the trap. They are diagnostic tools, not production settings.

Here is the mechanism. Turning a flag off does not actually forbid that node type. It adds a huge cost penalty (effectively 1e10) to plans that use it, so the planner avoids that node unless it has no other legal choice. That penalty makes them perfect for an experiment: you force the planner away from the plan it chose and read the cost of the alternative it falls back to. Comparing the two costs tells you whether the planner picked the cheaper plan because it really is cheaper, or because a cost constant is lying.

-- The plan you got: a Seq Scan you suspect is wrong on your SSD
EXPLAIN SELECT * FROM orders WHERE customer_id = 4217;

-- Force the planner off the seq scan FOR THIS SESSION ONLY, and read the index plan's cost
SET enable_seqscan = off;
EXPLAIN SELECT * FROM orders WHERE customer_id = 4217;
RESET enable_seqscan;

Read the two EXPLAIN outputs side by side. If the forced index plan shows a higher total cost than the sequential scan, the planner was right and the index genuinely does not help. If the forced index plan shows a much lower cost than the runtime suggests it should, or you know from EXPLAIN ANALYZE that the index plan is actually faster despite a higher estimated cost, then a cost constant is miscalibrated, and random_page_cost is the usual suspect. You have confirmed a costing problem rather than guessed at one.

The hard rule: never leave an enable_* flag off in production. They were built to diagnose, and the right durable fix is to correct the cost constant they helped you find, not to bolt the planner’s choices open. The classic failure is shipping enable_seqscan = off in postgresql.conf because it fixed one query in testing. Now every query in the database pays the giant penalty on sequential scans, including the many queries for which a sequential scan is genuinely the best plan, and the planner makes bizarre choices across the board to avoid a node it should sometimes love.

Using the enable flags to diagnose, then throwing them away

10. Rolling out a cost change safely

You change these settings without a restart, and you should test every change before trusting it. ALTER SYSTEM writes the value to a managed file, and pg_reload_conf() makes the running server re-read its configuration. All the planner parameters on this page take effect on reload, no restart required.

-- Change a cost constant cluster-wide, no restart needed
ALTER SYSTEM SET random_page_cost = 1.1;
ALTER SYSTEM SET effective_cache_size = '48GB';
ALTER SYSTEM SET effective_io_concurrency = 200;

-- Apply it to the running server
SELECT pg_reload_conf();

-- Confirm the new value is live
SHOW random_page_cost;

The safe rollout is to test the change before it touches everyone. Because these are planner parameters, you can set them at the session level with SET and run EXPLAIN to see how the plan responds, all without affecting any other connection. Only once a session test shows the plan you want do you promote the change to ALTER SYSTEM.

-- Try the value in your own session first, look at the plan, then decide
SET random_page_cost = 1.1;
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 4217;
RESET random_page_cost;

You also do not have to change a setting for the whole cluster. You can scope a value to one database or one role, which is the right tool when an analytics database and an OLTP database share one server but want different tuning. The OLTP database keeps a conservative parallelism setting; the analytics database raises it.

-- Per-database and per-role overrides: different tuning on one server
ALTER DATABASE analytics SET max_parallel_workers_per_gather = 8;
ALTER DATABASE oltp      SET max_parallel_workers_per_gather = 0;
ALTER ROLE reporting     SET jit = on;
flowchart LR
    A["Suspect a bad plan"] --> B["SET in your session, run EXPLAIN ANALYZE"]
    B --> C{"Plan improved?"}
    C -->|no| D["Wrong setting, RESET and rethink"]
    C -->|yes| E["ALTER SYSTEM SET the value"]
    E --> F["SELECT pg_reload_conf()"]
    F --> G["SHOW to confirm, watch real queries"]

Read the flow as the discipline: prove it in a session, promote it cluster-wide, reload without a restart, confirm it is live, then watch real traffic. A cost change is reversible and cheap to test, so there is never a reason to push one straight to production untested.

Mastery Questions

  1. A team migrates a database from spinning disks to NVMe SSDs and reports that queries are no faster, and some are slower. EXPLAIN shows sequential scans on tables where selective indexes exist. Nothing else changed. What is the most probable cause, and what would you change first?

    Answer. The cost constants still describe the old spinning disks, so the planner is reasoning about hardware that no longer exists. The prime suspect is random_page_cost, left at its default of 4.0. That value tells the planner a random page read, the operation an index scan is built from, costs four times a sequential read, so the planner prices index scans high and falls back to scanning whole tables. On the old disks that ratio was roughly true; on NVMe, random and sequential reads cost almost the same, so the 4.0 is simply a false belief about the new drives. I would lower random_page_cost to 1.1 (anywhere in 1.0 to 1.5 is reasonable), and I would also raise effective_io_concurrency from 1 to around 200 so bitmap heap scans actually prefetch across the SSD’s many channels, and confirm effective_cache_size reflects the real RAM. I would test the random_page_cost change in a session with SET and EXPLAIN first, watch the seq scans turn into index scans on the selective queries, then promote it with ALTER SYSTEM SET random_page_cost = 1.1 followed by pg_reload_conf(). The hardware got faster; the planner just had to be told.

  2. After an upgrade, an OLTP service sees intermittent latency spikes on queries that used to be uniformly fast. EXPLAIN (ANALYZE) on a slow instance shows a JIT section reporting tens of milliseconds of Optimization and Emission time on a query that does little actual work. What is happening, and how do you fix it without hurting the analytics jobs on the same server?

    Answer. JIT compilation is the culprit. JIT is on by default and triggers when a plan’s estimated total cost exceeds jit_above_cost (default 100000). A query whose cost estimate creeps just past that threshold but actually runs in a few milliseconds pays the compile time as pure overhead, because the compilation cannot be amortized over enough rows; the JIT timing in the plan, dominated by Optimization and Emission, is the smoking gun. The naive fix, jit = off cluster-wide, would also strip JIT from the big analytic queries on the same server that genuinely benefit from it. The targeted fix uses scoping. I would either raise jit_above_cost so only truly enormous plans compile, which lets the short OLTP queries skip JIT while the genuinely expensive analytic plans still cross the higher bar, or I would scope the change per database or per role: ALTER DATABASE oltp SET jit = off while leaving the analytics database with jit = on. That keeps compilation where it pays for itself and removes it where it only adds latency.

  3. An engineer fixes a slow query by adding enable_seqscan = off to postgresql.conf and reloading, and it works. A week later, unrelated queries across the database have developed strange, slow plans. Explain the mechanism, and describe the correct way the original problem should have been diagnosed and fixed.

    Answer. The enable_* flags are diagnostic probes, not production settings, and shipping one cluster-wide is the mistake. Turning enable_seqscan off does not forbid sequential scans; it adds an enormous cost penalty to any plan that uses one, so the planner avoids sequential scans unless it has no legal alternative. That is fine as a one-session experiment, but cluster-wide it poisons every query in the database, including the many for which a sequential scan is genuinely the cheapest plan, such as reading most of a small table. To dodge the penalized node, the planner now picks contorted index-based plans that are slower, which is exactly the wave of strange slow plans that appeared. The correct diagnosis would have used the flag the right way: SET enable_seqscan = off in a single session, re-run EXPLAIN and EXPLAIN ANALYZE to compare the forced index plan against the original sequential scan, and check whether the index plan was actually faster than its estimated cost implied. If it was, that proves a cost constant is miscalibrated, almost always random_page_cost sitting at 4.0 on SSD. The durable fix is to lower random_page_cost to about 1.1 with ALTER SYSTEM and reload, then RESET enable_seqscan. The flag finds the problem; correcting the cost constant cures it for every query at once.

Recommended next

  • Autovacuum Tuning
    Builds directly on this page: Autovacuum Tuning is the next step in the PostgreSQL performance ladder.
Sources & evidence14 claims · 4 cited

Quantitative defaults and mechanism claims are grounded in the PostgreSQL query-planning, parallel-query, planner-statistics, and EXPLAIN docs. Widely-known operational facts the listed docs do not state in exact terms (the 1.1 SSD convention, the 100-300 effective_io_concurrency range, the enable_* penalty as a diagnostic, ALTER SYSTEM/pg_reload_conf rollout, per-database/role scoping) are marked stable-common-knowledge with empty source_ids.

  • The whole planner cost model is relative to seq_page_cost (default 1.0), the cost of reading one page sequentially; you keep it as the fixed yardstick and move random_page_cost, the CPU constants, and the planner-only parameters relative to it.verified
  • random_page_cost defaults to 4.0, modeling a random page fetch as four times a sequential one as on a seeking spinning disk; on SSDs it is commonly lowered toward 1.1 because random and sequential reads cost nearly the same, which stops the planner from shunning index scans.verified
  • Because an index scan's cost is dominated by random heap-page fetches priced at random_page_cost while a sequential scan reads pages at seq_page_cost, the ratio of the two constants sets how selective a condition must be before an index scan is estimated cheaper than a sequential scan.verified
  • effective_cache_size (default 4GB) is a planner-only hint that allocates no memory; it estimates how much data, across PostgreSQL's shared buffers and the OS page cache, is likely cached, and a larger value biases the planner toward index scans because it assumes repeated reads hit cache.verified
  • effective_cache_size is conventionally set to roughly 50-75% of total system RAM, and because it allocates nothing, the only risk of setting it too high is mild planner over-optimism about cache hits.stable common knowledge
  • effective_io_concurrency (default 1) sets how many concurrent I/O requests PostgreSQL issues to prefetch heap pages during a bitmap heap scan; it is left at 1 for a single spinning disk and raised to roughly 100-300 on SSD/NVMe to exploit the device's parallel channels, with maintenance_io_concurrency applying the same idea to maintenance work.verified
  • cpu_tuple_cost (0.01), cpu_index_tuple_cost (0.005), and cpu_operator_cost (0.0025) weight per-row CPU work relative to the 1.0 page-read yardstick; they are rarely changed, but raising cpu_operator_cost can help CPU-bound, expression-heavy queries by biasing the planner toward plans that evaluate the costly operator on fewer rows.verified
  • max_parallel_workers_per_gather (default 2) bounds workers under one Gather node, while total parallel workers are bounded by max_parallel_workers (default 8) and the shared max_worker_processes ceiling, so raising parallelism for analytics means raising these limits together.verified
  • parallel_setup_cost (1000) is a fixed charge for starting workers and parallel_tuple_cost (0.1) charges per row shipped from a worker to the leader, so parallelism helps large scans and aggregates but hurts short OLTP queries and high-row-transfer plans where these overheads dominate; min_parallel_table_scan_size (8MB) is the floor below which a scan stays serial.verified
  • JIT is on by default and only compiles a plan whose estimated total cost exceeds jit_above_cost (100000), with jit_inline_above_cost and jit_optimize_above_cost gating heavier passes; compilation adds wall-clock latency, so a borderline-cost but short-running query can spend more time compiling than executing, which is why JIT is disabled or its threshold raised for OLTP.verified
  • EXPLAIN (ANALYZE) prints a JIT section reporting time spent on Generation, Inlining, Optimization, and Emission, which lets you confirm that JIT compilation, not execution, is the source of a query's latency.verified
  • default_statistics_target (default 100) sets how many histogram buckets and most-common-values ANALYZE records per column; raising it, globally or per column with ALTER TABLE ... SET STATISTICS, improves selectivity estimates on skewed data at the cost of slower ANALYZE, and is set in the same configuration as the cost constants.verified
  • The enable_* flags such as enable_seqscan and enable_nestloop do not forbid a node type; setting one off adds a very large cost penalty so the planner avoids that node only when an alternative exists, which makes them session-only diagnostics for forcing and comparing plan costs, never production settings.stable common knowledge
  • Planner cost parameters take effect without a restart: ALTER SYSTEM SET writes the value and SELECT pg_reload_conf() reloads it, while SET applies it to a single session for testing with EXPLAIN, and ALTER DATABASE or ALTER ROLE scope a value to one database or role so OLTP and analytics workloads on one server can differ.stable common knowledge

Cited sources