The Query Planner and Cost Model

How the PostgreSQL planner sits in the query lifecycle and picks the lowest-estimated-cost plan, covering cost units, the sequential-scan cost formula, startup vs total cost, selectivity estimation, the plan-node menu, join-order search, plan caching, reading EXPLAIN, and the failure modes of a wrong estimate.

Learning outcomes

The planner is the brain of PostgreSQL. You write a query that says WHAT you want, and the planner decides HOW to get it: which index to use, which order to join tables in, whether to sort or hash. Every performance question you will ever debug, from a slow report to an index the database “refuses” to use, is really a question about a decision the planner made. This page shows you how it decides, so those decisions stop being mysterious.

After studying this page, you can:

  • Trace a query through its five lifecycle stages and explain why the planner is the one that determines speed.
  • Explain why PostgreSQL picks plans by lowest estimated cost, and why that estimate is the thing it optimizes, never the real runtime.
  • Compute the cost of a sequential scan by hand from the documented cost parameters, and predict how LIMIT shifts the winning plan.
  • Describe how row-count estimates come from per-column statistics, and where the independence assumption breaks on correlated columns.
  • Read the cost, rows, and width numbers on any EXPLAIN node, and recognize a misestimate before it cascades into a bad join.

Before we dive in

You should be comfortable writing SQL with joins and WHERE clauses, and you should know that an index is a separate structure that lets PostgreSQL find rows without scanning the whole table. You do not need to know index internals or the exact scan algorithms yet. This page is the map of how the planner thinks; the deep dives on individual scan types, join algorithms, and statistics each get their own page.

A few terms, defined as we use them. A plan (or query plan) is the concrete tree of operations PostgreSQL will run to answer your query: scans at the leaves, joins and sorts above them, one final result at the root. The planner (also called the optimizer) is the component that builds and chooses that tree. Cost is a unitless number the planner assigns to each plan to compare them; lower is better. Cardinality is the number of rows a step is expected to produce, and selectivity is the fraction of rows a condition is expected to keep. Hold those four. Everything below is built from them.

Mental Model

The wrong model, and almost everyone starts here, is that PostgreSQL “runs your SQL” the way it is written: it reads the tables in the order you typed them, applies the WHERE clause, and uses an index if one exists on the column. Under that model the query text controls the execution, so reordering your joins or rewriting a condition should change the speed.

PostgreSQL does not work that way. The better model is a travel-planning app. You ask for a route from A to B. The app does not blindly take the roads in the order you might name them. It enumerates many possible routes, estimates a cost for each (distance, traffic, tolls), and picks the cheapest estimate. The route you get depends on the app’s traffic data, not on how you phrased the request. The planner is exactly this: it generates candidate plans, scores each with an estimated cost, and runs only the cheapest one.

Two consequences fall out of this picture, and they shape everything else. First, the planner optimizes an ESTIMATE. If its traffic data (the table statistics) is wrong, it confidently picks a bad route. Second, the order you write joins and conditions usually does not matter, because the planner reorders them itself. Keep the travel-planner image. Once it clicks, EXPLAIN, statistics, and “why won’t it use my index” all become the same question: what did the planner estimate, and was it right?

Breaking it down

1. The query lifecycle: where the planner sits

Start with the journey a query takes, because the planner only makes sense once you see what feeds it. When you send a SQL string to PostgreSQL, it passes through five stages in order before any data moves.

flowchart LR
    A["SQL text"] --> B["Parse: syntax to a raw tree"]
    B --> C["Analyze: resolve tables, columns, types"]
    C --> D["Rewrite: expand views and rules"]
    D --> E["Plan / optimize: choose the cheapest plan"]
    E --> F["Execute: run the plan, return rows"]

Walk the five stages. Parse turns your text into a raw parse tree, checking only grammar; at this point SELCT fails but a SELECT from a non-existent table does not. Analyze is the semantic pass: it looks up the real tables and columns in the catalog, resolves data types, and binds names to objects, so now a missing table is caught. Rewrite applies the rule system: it expands views into their underlying queries and applies any rewrite rules, which is also how row-level security conditions get folded in. Plan, the stage this whole page is about, takes that fully-resolved query and chooses how to run it. Execute finally runs the chosen plan and streams rows back.

The planner sits fourth, and that position is the point. By the time it runs, the query is fully understood: every table is real, every type is known, every view is expanded into plain table accesses. The planner does not worry about syntax or naming. Its one job is the hardest one: of all the correct ways to compute this exact result, which is fastest? That is why it is the brain. Parse and analyze decide whether the query is valid; the planner decides whether it is fast.

The rewrite stage matters more than it looks, because it can multiply the work the planner faces. A query against a view is really a query against the view’s definition, spliced in. Query through three nested views and the planner sees one large flattened query, not three small ones. We name the rewrite system here and leave its details to the query-rewriting page.

2. Cost-based, not rule-based: optimizing an estimate

Here is the single idea that organizes the rest of PostgreSQL performance. The planner is cost-based. It does not follow a fixed list of rules like “always use an index for an equality condition.” Instead it enumerates candidate plans, assigns each an estimated cost, and runs the one with the lowest estimate.

Contrast the two philosophies, because the difference is not academic. A rule-based optimizer has a priority list: index beats sequential scan, this join type beats that one, and so on. It is predictable but dumb, because the best choice genuinely depends on the data. Reading an index to fetch ninety percent of a table is slower than just scanning the table, and no fixed rule captures “ninety percent.” A cost-based optimizer estimates the actual work for each option on this specific table with this specific data, and lets the numbers decide. PostgreSQL is firmly cost-based.

Now the consequence that you must internalize before anything else makes sense. The planner optimizes the estimate, never the truth. It has no idea how long your query will actually take. It builds a model of the work, scores each plan against that model, and trusts the model completely. When the model is accurate, you get a great plan. When the model is wrong, because the statistics are stale or the data is skewed, the planner picks a bad plan with total confidence and no warning.

This is why the rest of performance work exists. EXPLAIN exists so you can see the estimate the planner committed to. EXPLAIN ANALYZE exists so you can compare that estimate against reality. ANALYZE and the statistics system exist to make the estimate accurate. Every one of those tools is, at heart, about closing the gap between the planner’s model and the real data. Keep this framing: a slow query is usually not a slow plan, it is the wrong plan, chosen from a wrong estimate.

3. The cost units and the sequential-scan formula

To compare plans, the planner needs a currency, and it has one. Cost is measured in abstract units pinned to a single yardstick: the cost of reading one page sequentially from disk. That parameter is seq_page_cost, and it defaults to 1.0. Every other cost is expressed relative to it.

Five parameters set the exchange rates, and these are the planner cost constants you will see again and again. Here are the PostgreSQL 16 and 17 defaults.

# postgresql.conf: planner cost constants (defaults)
seq_page_cost = 1.0          # read one page in sequential order (the yardstick)
random_page_cost = 4.0       # read one page at a random offset
cpu_tuple_cost = 0.01        # process one row through the executor
cpu_index_tuple_cost = 0.005 # process one index entry
cpu_operator_cost = 0.0025   # evaluate one operator or function, e.g. one comparison

Read those numbers as a story about hardware. A random page read costs four times a sequential one, because the original model assumed spinning disks where seeking is expensive. Processing a row in the CPU is a hundred times cheaper than reading a page from disk. These ratios, not the absolute values, are what steer the planner. We tune them, especially random_page_cost for SSDs, in the planner-cost-settings page; here we just take the defaults.

Now build the cost of the simplest plan, a sequential scan, from these parts. A sequential scan reads every page of the table and processes every row. So its total cost is the page-read work plus the per-row CPU work plus the per-condition CPU work:

total_cost = (pages * seq_page_cost)
           + (rows * cpu_tuple_cost)
           + (rows * cpu_operator_cost * number_of_WHERE_conditions)

Put real numbers on it. Suppose a table has 10,000 pages and 1,000,000 rows, and you scan it with one condition in the WHERE clause. The page work is 10000 * 1.0 = 10000. The per-row work is 1000000 * 0.01 = 10000. The condition work is 1000000 * 0.0025 * 1 = 2500. Add them: the estimated total cost is 22500. That single number is what the planner compares against every alternative, such as an index scan, to decide what to run.

The sequential-scan cost, built from its parts
Table size10000 pages
100 pages100000 pages
Rows in table1000000 rows
10000 rows5000000 rows
WHERE conditions1
05
Estimated total cost22,500
Mid-size: the planner weighs scan against index by selectivity

Drag the inputs and watch the two halves of the formula trade places. On a small table the page work dominates and a sequential scan is hard to beat. As the table grows, a scan reads every one of those pages whether you want one row or all of them, which is exactly the opening an index needs to exploit.

4. Startup cost versus total cost, and why LIMIT changes the plan

Every plan node carries not one cost but two, and the difference decides surprisingly many plans. The startup cost is the work done before the node can emit its first row. The total cost is the work to emit the last row. EXPLAIN always prints both, as cost=startup..total.

The gap between them depends on the operation. A sequential scan has a near-zero startup cost: it reads the first page and immediately hands you the first matching row, so its startup is roughly 0. A sort is the opposite: it must consume every input row before it can emit even one, because the smallest value might be the last row it reads. So a sort’s startup cost is almost its entire cost. Hold that contrast, because it is the whole reason the next part works.

Now bring in LIMIT. When your query ends in LIMIT 10, the planner does not need the plan with the lowest TOTAL cost. It needs the plan that produces ten rows cheapest. It interpolates: for a node estimated to produce 1000 rows, taking 10 of them costs roughly startup + (10 / 1000) * (total - startup). A plan with a high startup cost, like one that sorts the whole table, gets no discount from LIMIT, because you pay the full startup before the first row appears. A plan with a low startup cost, like an index scan that walks rows already in order, gets almost the whole LIMIT discount.

This is why adding LIMIT can flip the chosen plan entirely. Without it, the planner might pick a sequential scan plus a sort because that has the lowest total cost. Add LIMIT 10 and an ordered index scan suddenly wins, because it delivers ten rows almost immediately while the sort is still reading the table. The plan changed not because your data changed, but because what you asked to optimize changed: first-row speed instead of last-row speed.

ORDER BY created_at on a million-row table
The planner compares whole-result costs. A sequential scan plus an in-memory or on-disk sort has the lowest total cost, so it scans every page, sorts every row, then returns them. High startup, but lowest total.

5. Cardinality and selectivity: where the row counts come from

Costs are built from row counts, so the planner’s accuracy lives or dies on one question: how many rows will each step produce? That count is the cardinality, and estimating it is the hardest and most consequential thing the planner does. A wrong row estimate poisons every cost above it in the tree.

The estimates come from statistics that the ANALYZE command gathers and stores per column in the catalog. For each column, PostgreSQL keeps a small statistical summary, visible in the pg_stats view:

  • n_distinct: the number of distinct values in the column (or a negative ratio for very large tables).
  • most_common_vals and most_common_freqs: the most frequent values and how often each appears. This captures skew, like a status column that is 95 percent 'active'.
  • histogram_bounds: boundary values that split the rest of the column into roughly equal-population buckets, used to estimate range conditions.
  • null_frac: the fraction of the column that is NULL.

From these, the planner computes a selectivity for each condition: the fraction of rows the condition is expected to keep, a number between 0 and 1. Two worked examples show the mechanism.

For an equality like WHERE status = 'active', if 'active' appears in most_common_vals with a frequency of 0.95, the selectivity is simply 0.95. On a million-row table the planner estimates 950000 rows pass. If the value is not in the common list, it falls back to roughly (1 - sum_of_common_freqs) / number_of_other_distinct_values. For a range like WHERE created_at > '2026-01-01', the planner finds where that boundary lands among the histogram_bounds and estimates the fraction of buckets above it; if the date sits at the 80th percentile of the histogram, the selectivity is about 0.20, so 200000 rows pass.

Now the assumption that breaks in production. To combine two conditions, the planner multiplies their selectivities, because it assumes the columns are independent. WHERE city = 'Paris' AND country = 'France' is estimated as selectivity(city) * selectivity(country). But those columns are perfectly correlated: every Paris row is already a France row, so multiplying drastically underestimates the result. The planner thinks the condition is far more selective than it is, and may pick a plan tuned for a handful of rows when millions match. The fix, extended statistics with CREATE STATISTICS, and the full estimation machinery, belong to the statistics-and-row-estimation page; here you just need to know that selectivities multiply, and that correlation is where multiplication lies.

-- Inspect the per-column statistics the planner reads
SELECT attname, n_distinct, null_frac, most_common_vals, most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname IN ('status', 'created_at');

-- See the planner's row estimate for a specific condition, without running it
EXPLAIN SELECT * FROM orders WHERE status = 'active' AND created_at > '2026-01-01';

6. The plan-node menu

The planner assembles a plan from a fixed vocabulary of operations, the plan nodes. You do not need their internals yet, but you need the menu, because reading EXPLAIN means recognizing these names and knowing in one line why each was chosen. Each goes deep in the query-optimization track; here is the map.

flowchart TB
    subgraph Scans["Scan nodes: get rows from a table"]
      S1["Seq Scan"]
      S2["Index Scan"]
      S3["Index Only Scan"]
      S4["Bitmap Index + Heap Scan"]
    end
    subgraph Joins["Join nodes: combine two inputs"]
      J1["Nested Loop"]
      J2["Hash Join"]
      J3["Merge Join"]
    end
    subgraph Other["Shaping and parallel"]
      O1["Sort"]
      O2["Aggregate / HashAggregate"]
      O3["Gather"]
    end
The plan-node menu, and when each is chosen
Seq Scan reads every page; chosen when a large fraction of rows match or the table is tiny. Index Scan walks an index then fetches matching rows from the heap; chosen for selective conditions. Index Only Scan answers entirely from the index without touching the heap; chosen when the index covers every column the query needs. Bitmap Index + Heap Scan builds a bitmap of matching pages then reads them in physical order; chosen for medium selectivity or when combining several indexes.

One line ties the menu back to cost. The planner does not prefer any of these nodes; it costs each viable option and keeps the cheapest. “Use an index” is never a rule. It is the outcome when an index scan’s estimated cost beats the sequential scan’s, which happens only when the condition is selective enough that fetching matching rows costs less than reading every page.

7. Join-order search and its blow-up

Joining tables is where the planner’s job explodes, and where it has to cheat to stay fast. The trouble is that join order matters enormously for cost, and the number of possible orders grows brutally with the number of tables.

Consider why order matters. To join four tables, the planner can join any pair first, then bring in a third, then the fourth. Joining the two tables that produce the fewest combined rows first keeps every later step small. Joining the wrong pair first can produce a huge intermediate result that every later join must wade through. So the planner genuinely needs to search join orders, not just accept the order you wrote.

But the search space is enormous. The number of ways to order joins grows faster than exponentially in the number of tables. For a handful of tables an exhaustive search is fine. For a dozen, considering every order would take longer than running the query. PostgreSQL caps the damage with three controls.

# postgresql.conf: join search controls (defaults)
join_collapse_limit = 8   # flatten explicit JOINs into the search up to this many tables
from_collapse_limit = 8   # flatten subqueries in FROM up to this many tables
geqo = on                 # use the genetic optimizer above the threshold
geqo_threshold = 12       # switch to GEQO at this many tables in one join problem

Read them as a staged defense. Below join_collapse_limit (default 8) tables, the planner flattens your joins and considers reorderings freely. The from_collapse_limit (also 8) does the same for subqueries pulled up from the FROM clause. Once a single join problem reaches geqo_threshold (default 12) tables, exhaustive search is hopeless, so PostgreSQL switches to GEQO, the genetic query optimizer: it treats join orders as a population and evolves toward a good-enough order instead of proving the best one. GEQO trades optimality for a bounded planning time.

The practical edge cases follow directly. If you raise join_collapse_limit, the planner considers more orders and may find a better plan, but planning itself gets slower, sometimes dramatically. If you set join_collapse_limit = 1, you tell the planner to join in exactly the order you wrote, which is an escape hatch when you know better than the estimates. And because GEQO is partly random, two runs of the same many-table query can occasionally produce different plans. The deep treatment of join algorithms and order lives in the join-algorithms-and-join-order page.

8. Plan caching for prepared statements

Planning is not free, so for queries you run repeatedly, PostgreSQL can reuse a plan. This happens with prepared statements: you PREPARE a parameterized query once and EXECUTE it many times with different parameter values. The question is whether to reuse one plan across all those executions, and PostgreSQL has a specific heuristic for it.

There are two kinds of cached plan. A custom plan is planned fresh for the specific parameter values you pass, so the planner can use the statistics for those exact values. A generic plan is planned once with the parameters left as unknowns, then reused for every execution, so it pays the planning cost only once. Custom plans are smarter; generic plans are cheaper to produce. PostgreSQL tries to get both.

The default behavior is a heuristic built on counting. For the first five executions of a prepared statement, PostgreSQL builds a custom plan each time and records its cost. Then it estimates what a generic plan would cost. If the average custom-plan cost is not meaningfully cheaper than the generic plan, it switches to the generic plan permanently and stops re-planning. If custom plans keep winning, because the parameter values lead to very different good plans, it keeps building custom plans. You control this with plan_cache_mode.

-- Force one behavior or the other for a session
SET plan_cache_mode = 'force_custom_plan';   -- always re-plan per parameter values
SET plan_cache_mode = 'force_generic_plan';  -- always reuse one parameterized plan
SET plan_cache_mode = 'auto';                -- the default 5-execution heuristic

The trap is skew. Suppose a status column is 95 percent 'done' and 1 percent 'error'. A query parameterized on status wants a sequential scan for 'done' (most rows match) but an index scan for 'error' (few match). If the heuristic locks in a generic plan, you get one of those for both, and the wrong half of your traffic runs the wrong plan. The fix is usually force_custom_plan for that statement. The mechanics of how parameters flow through rewriting belong to the query-rewriting page; what matters here is that a cached plan can be stale for the values you actually pass.

Check yourself
A prepared statement filters on a heavily-skewed status column. After a few hundred executions it suddenly gets slow for some parameter values but not others. What is the most likely cause?

9. Reading EXPLAIN: the four numbers on every node

You now have everything needed to read a plan, which is the whole point of understanding the planner. The EXPLAIN command shows the plan the planner chose, without running it. Every node prints the same four-part summary, and once you can read those four numbers, the planner’s reasoning is visible.

EXPLAIN SELECT * FROM orders WHERE status = 'active';

                          QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..22500.00 rows=950000 width=124)
   Filter: (status = 'active'::text)

Read the four numbers left to right. cost=0.00..22500.00 is the startup cost (0.00, this scan emits its first row almost immediately) and the total cost (22500.00, the work to emit the last row). rows=950000 is the estimated cardinality: the planner expects 950,000 rows out of this node, which from rung 5 means it thinks status = 'active' has a selectivity of 0.95. width=124 is the estimated average row size in bytes, which the planner uses to predict memory use and whether a sort will spill.

That is the planner’s entire claim about this node: it will cost about 22500, produce about 950000 rows, each about 124 bytes wide. Read the same four numbers on every node up the tree and you see the plan’s whole theory of the query.

There is a crucial limit. These are all ESTIMATES. EXPLAIN never ran the query; it only printed the model. To compare the estimate against what actually happens, you run EXPLAIN ANALYZE, which executes the query and adds the real time and the real row counts next to the estimates. Comparing estimated rows against actual rows is the single most useful diagnostic in PostgreSQL, and the reading-explain-analyze page is devoted to it. For now, fix the four-number reading firmly: cost=startup..total rows=estimate width=bytes.

10. Failure modes: when the estimate lies

Everything the planner does rests on estimates, so every serious planner failure is an estimate that lied. Three failure modes cover almost all of them, and they share one root cause.

The first is the cascading misestimate. The planner estimates rows at the bottom of the tree, and that estimate flows upward into every cost above it. Suppose a scan is estimated at 10 rows but actually returns 100,000, because of the correlated-columns problem from rung 5. The planner, seeing 10 rows, picks a Nested Loop join, which is excellent for 10 rows and catastrophic for 100,000, because it probes the inner side once per outer row. One wrong number at a leaf becomes a join that runs ten thousand times too often. The cost was tiny; the runtime is minutes. You spot it in EXPLAIN ANALYZE as a node where estimated rows and actual rows differ by orders of magnitude.

The second is stale statistics. The planner’s row estimates are only as fresh as the last ANALYZE. Load a million new rows with a new range of dates and never re-analyze, and the planner’s histogram still describes the old data; it will badly misjudge any condition on the new range. Autovacuum runs ANALYZE automatically after enough changes, but a bulk load or a brand-new table can outrun it, so after a large data change you run ANALYZE yourself.

-- Refresh statistics after a bulk load so the planner sees the new data
ANALYZE orders;

-- Check how stale a table's statistics are
SELECT relname, last_analyze, last_autoanalyze, n_mod_since_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

The third is the famous complaint: “the planner is ignoring my index.” It almost never is. The planner considered the index, estimated its cost, and found a sequential scan cheaper, which usually means one of two things. Either the condition is not selective enough, so fetching matching rows one by one through the index costs more than reading every page in bulk, in which case the planner is right and the index simply does not help here. Or the row estimate is wrong, so the planner thinks the index would return far more rows than it really would, in which case the fix is better statistics, not forcing the index. Reaching for enable_seqscan = off to bully the planner treats the symptom; the disease is the estimate.

Three things engineers blame, and what is really wrong

The lesson under all three is the framing from rung 2: the planner optimizes an estimate. When a plan is bad, do not fight the plan. Find the node where the estimate diverged from reality, and fix the estimate. The planner will choose correctly the moment it can see the data clearly.

Mastery Questions

  1. You add LIMIT 20 to a reporting query that ends in ORDER BY created_at DESC, and the plan changes from a sequential scan followed by a sort to an index scan on created_at. The data did not change. Explain precisely why the cheaper plan flipped.

    Answer. The planner optimizes for what you asked, and LIMIT changes the request from “the whole sorted result” to “the first 20 rows of it.” Cost has two parts: startup (work before the first row) and total (work to the last row). A sort has a very high startup cost because it must read and order every input row before it can emit even one, so it gets no benefit from LIMIT; you still pay the full sort to get 20 rows. An index scan on created_at walks rows already in order with a near-zero startup cost, so under LIMIT 20 the planner estimates roughly startup + (20 / estimated_rows) * (total - startup), which is tiny, and the scan can stop after 20 rows. Without LIMIT the planner compares total costs, where the seq-scan-plus-sort is cheapest; with LIMIT it compares first-20-rows costs, where the ordered index scan wins. Nothing about the data changed; the quantity being optimized did. This is the startup-versus-total distinction from rung 4 in action.

  2. A four-table join runs in milliseconds, but the “same” query with two more tables joined in suddenly takes seconds just to start returning rows, and the plan looks strange. The tables are small. What is happening, and which settings would you look at first?

    Answer. This is planning-time blow-up, not execution-time. The number of join orders the planner must consider grows faster than exponentially with the table count, so six tables is far more search than four. Two things can be biting. First, if the join problem reached geqo_threshold (default 12) it would switch to the genetic optimizer, but at six tables it is still doing a near-exhaustive search, which can itself be slow if other subqueries inflated the effective count. Second, and more likely at six tables, the planner is spending real time enumerating orders under join_collapse_limit and from_collapse_limit (both default 8). If the query has been written with many flattenable subqueries or the limits were raised, planning time can dominate a fast execution. Look first at EXPLAIN versus EXPLAIN ANALYZE to separate planning time from execution time, then at join_collapse_limit, from_collapse_limit, and geqo_threshold. If you know the right order, setting join_collapse_limit = 1 forces your written order and skips the search entirely.

  3. An engineer files a bug: “PostgreSQL refuses to use the index I created on orders(status), even though my query filters on status.” The index is valid and the column is indexed. Walk through how you would determine whether this is actually a bug, and what the most probable real cause is.

    Answer. It is almost certainly not a bug; the planner considered the index and costed a sequential scan cheaper. I would run EXPLAIN on the query and read the estimated rows on the scan node, then compare it to reality with EXPLAIN ANALYZE. Two cases follow. If the condition genuinely matches a large fraction of the table, say status = 'active' where most rows are active, then an index scan would fetch most rows one at a time through the heap, which costs more than reading every page in bulk; the sequential scan is correctly cheaper and the index simply does not help for that value. That is the planner being right. The other case is a misestimate: if estimated rows is tiny but actual rows is huge, the statistics are stale or a correlation is fooling the independence assumption, so the planner mis-prices both plans. The fix there is ANALYZE to refresh statistics, or CREATE STATISTICS for correlated columns, after which the planner will pick the index on its own. The thing I would not do is force it with enable_seqscan = off, because that masks the real problem. The framing from rung 2 settles it: the planner optimizes an estimate, so “it ignored my index” is a question about the estimate, not a bug in plan selection.

Recommended next

Sources & evidence14 claims · 4 cited

Cost parameter defaults and the EXPLAIN cost/rows/width format are grounded in src_pg_docs_runtime_query and src_pg_docs_explain; statistics, selectivity, and join-search settings in src_pg_docs_planner_stats; Gather/parallel in src_pg_docs_parallel_query. The five-execution custom/generic plan heuristic and the worked arithmetic examples are stable common knowledge from PostgreSQL behavior not pinned to a single cited paragraph.

  • A query passes through parse, analyze (semantic resolution against the catalog), rewrite (view and rule expansion), plan/optimize, and execute, in that order, with the planner running fourth on a fully resolved query.stable common knowledge
  • PostgreSQL is a cost-based optimizer: it enumerates candidate plans, assigns each an estimated cost, and executes the plan with the lowest estimated total cost rather than following fixed priority rules.verified
  • The planner cost constants default to seq_page_cost = 1.0, random_page_cost = 4.0, cpu_tuple_cost = 0.01, cpu_index_tuple_cost = 0.005, and cpu_operator_cost = 0.0025 in PostgreSQL 16 and 17, with seq_page_cost as the yardstick all other costs are relative to.verified
  • A sequential scan's estimated total cost is pages * seq_page_cost + rows * cpu_tuple_cost + rows * cpu_operator_cost per WHERE condition; for 10,000 pages and 1,000,000 rows with one condition this is 10000 + 10000 + 2500 = 22500.stable common knowledge
  • Every plan node has a startup cost (work before the first row) and a total cost (work to the last row); a LIMIT favors low-startup plans like an ordered index scan over high-startup plans like a full sort, which can flip the chosen plan.verified
  • Row estimates come from per-column statistics gathered by ANALYZE, including n_distinct, most_common_vals, most_common_freqs, histogram_bounds, and null_frac, exposed through the pg_stats view.verified
  • The planner combines the selectivities of multiple conditions by multiplying them under an independence assumption, which underestimates result size when the columns are correlated.verified
  • The planner assembles plans from scan nodes (Seq Scan, Index Scan, Index Only Scan, Bitmap Index/Heap Scan), join nodes (Nested Loop, Hash Join, Merge Join), and shaping nodes (Sort, Aggregate/HashAggregate), with a Gather node collecting rows from parallel worker processes at the top of any parallel plan.verified
  • join_collapse_limit and from_collapse_limit both default to 8, bounding how many tables the planner flattens into its join-order search before limiting it.verified
  • Once a single join problem reaches geqo_threshold (default 12) tables, PostgreSQL switches from near-exhaustive search to the genetic query optimizer (GEQO), which evolves toward a good-enough order rather than proving the optimal one.verified
  • For a prepared statement, PostgreSQL builds custom plans for the first five executions, then compares their average cost to a generic plan and switches to the generic plan permanently if it is not meaningfully more expensive; plan_cache_mode (auto, force_custom_plan, force_generic_plan) overrides this heuristic.stable common knowledge
  • EXPLAIN prints, per node, cost=startup..total, an estimated rows (cardinality), and an estimated width in bytes, without executing the query; these are all estimates from the planner's model.verified
  • A misestimated row count at a leaf cascades upward, for example a 10-row estimate that is really 100,000 rows leading the planner to pick a Nested Loop that probes the inner side far too many times; the divergence is visible in EXPLAIN ANALYZE as estimated rows versus actual rows differing by orders of magnitude.verified
  • The complaint that the planner ignores an index is almost always a costing or estimation issue, not a bug: the planner costed the index scan against a sequential scan and either the condition was not selective enough or the row estimate was wrong, so the fix is fresher or extended statistics rather than forcing the index.verified

Cited sources