Scan Methods and When Each Wins

How PostgreSQL's planner chooses among Sequential, Index, Index Only, and Bitmap scans, why physical-row correlation and the visibility map decide whether an index helps, and how to read the plan and tune the boundary.

Learning outcomes

Once you can read a plan, the next question is the one that decides your latency: which scan node did the planner pick, and was it the right one? The Reading EXPLAIN ANALYZE page taught you to read the tree. This page teaches you to judge the leaves of that tree. Every query touches its tables through a scan node, and PostgreSQL has five real choices for how to do it. Picking well is the single largest lever on a point-lookup or a range query, and the planner’s reasoning is more learnable than it looks.

After studying this page, you can:

  • Name the five scan nodes (Seq Scan, Index Scan, Index Only Scan, Bitmap Heap Scan, plus the minor ones) and say in one sentence when each wins.
  • Explain why physical-row correlation makes the same index a hero on one column and a liability on another.
  • Read the Heap Fetches line of an index-only scan and trace a stale visibility map back to it.
  • Recognize a lossy bitmap and a Recheck Cond line, and connect them to work_mem.
  • Estimate where the index-versus-seq-scan tipping point sits, and move it on purpose by tuning cost settings.
  • Drive EXPLAIN (ANALYZE, BUFFERS), pg_stats.correlation, and enable_seqscan to confirm or override the planner.

Before we dive in

You should be comfortable reading an EXPLAIN (ANALYZE, BUFFERS) plan tree: spotting the node type, its estimated versus actual rows, and its buffer counts. The Reading EXPLAIN ANALYZE page covers that, and we lean on it constantly here. You also want a working picture of how rows live on disk: a table is a sequence of fixed-size pages (8 kB by default), each holding several tuples, as the Heap Pages and TOAST page describes.

A few terms, defined as we use them. Selectivity is the fraction of a table’s rows a condition keeps: a filter matching 50 rows out of a million has a selectivity near 0.00005. Sequential I/O reads pages in physical order, the cheap case, because the storage device and the OS read-ahead are built for it. Random I/O jumps to a page out of order, the expensive case. The heap is the table’s own pages, where full rows live; an index is a separate structure that maps a key to the row’s physical location. Hold those four. The whole chapter is about trading sequential reads of the heap against random reads driven by an index.

Mental Model

The tempting wrong model is “an index is always faster than a scan, so the planner should use the index whenever one exists.” Under that model a sequential scan is a failure, something to fix by adding an index.

That model breaks the moment your query returns many rows. The better picture is a librarian fetching books by a list of call numbers. If you want three specific books, the card catalog (the index) plus three walks to the right shelf beats reading every book in the building. But if you want a thousand books scattered across every aisle, a thousand catalog lookups and a thousand separate trips lose badly to simply walking the stacks shelf by shelf and grabbing what you need as you pass it. Same building, same catalog, opposite answer.

The deciding factor is not “does an index exist” but “how much of the table do I want, and are the rows I want clustered together or scattered.” Sequential reading is cheap per page; random reading is expensive per page; an index trades fewer pages for more expensive ones. The planner is doing that arithmetic on every query. Once you see scan choice as a cost trade rather than a right-or-wrong, the rest of this page is just the details of the trade.

Breaking it down

1. Why there is more than one way to read a table

Start with the constraint the whole topic lives inside: storage rewards order. Reading 1,000 pages in physical sequence is dramatically cheaper than reading 1,000 pages at random addresses, because both the OS and the device prefetch the next sequential page while you process the current one. A random read defeats that prefetch and pays a latency penalty on every page.

PostgreSQL’s cost model bakes this in with two constants. seq_page_cost is 1.0, the price of reading one page sequentially, and it is the yardstick the whole model is measured against. random_page_cost defaults to 4.0, meaning the planner assumes a random page fetch costs four times a sequential one. That single ratio, four to one, is the gravity that pulls the planner toward sequential reads and away from index-driven random reads. The Tuning the Planner Cost Settings page is where you change it; here we just need to know it exists.

So the planner’s job at each table is to pick the access method whose estimated cost is lowest for the number of rows it expects. Five methods compete. The next rungs build them up one at a time, cheapest-to-reason-about first.

flowchart TD
    Q["Query needs rows from a table"] --> SEL{"How many rows, and how scattered?"}
    SEL -->|"large fraction, or tiny table"| SEQ["Sequential Scan"]
    SEL -->|"few rows, well correlated"| IDX["Index Scan"]
    SEL -->|"few rows, all columns in index"| IOS["Index Only Scan"]
    SEL -->|"medium fraction, or several indexes"| BMP["Bitmap Heap Scan"]

2. Sequential Scan: read every page in order

A Sequential Scan, shown as Seq Scan in a plan, reads every page of the table from the first to the last and checks each tuple against the query’s filter. It uses no index at all. Because it touches pages in physical order, every read is sequential and priced at seq_page_cost, so the cost per page is as low as PostgreSQL ever pays.

A seq scan wins in three situations, and they are more common than beginners expect. First, when the query returns a large fraction of the table: if you want half the rows, you will end up visiting most pages anyway, so paying the cheap sequential price for all of them beats paying the random price for most of them through an index. Second, when the table is tiny, a handful of pages, the overhead of descending an index is not worth it. Third, when no useful index exists for the predicate, a seq scan is the only option.

Here is the shape of one. Note that the filter is applied as the scan runs, and the rows removed by the filter are reported so you can see the selectivity.

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'shipped';
 Seq Scan on orders  (cost=0.00..21925.00 rows=512000 width=64)
                     (actual time=0.014..142.300 rows=511800 loops=1)
   Filter: (status = 'shipped'::text)
   Rows Removed by Filter: 488200
   Buffers: shared hit=8000 read=2925

Half the table matched, so the planner correctly chose to read all 10,925 pages sequentially rather than fetch 511,800 rows one random page at a time. For a large table that scan is also a candidate for a Parallel Seq Scan, where several worker processes divide the pages between them and a Gather node merges the results. Parallelism only kicks in above min_parallel_table_scan_size (8 MB by default), because the cost of launching workers does not pay off on a small table.

3. Index Scan, and the correlation that makes or breaks it

An Index Scan is the opposite strategy. PostgreSQL descends the B-tree for the matching key range, and for each entry it finds there, it follows the pointer to the heap and fetches that row. The Advanced Indexing Techniques page goes deep on the index structures; here the key fact is the two-step rhythm: read the index, then jump to the heap, over and over.

That second step, the jump to the heap, is where the cost lives. Each jump is potentially a random read priced near random_page_cost. So an index scan wins when selectivity is high, meaning the query keeps very few rows: ten random heap fetches to find ten rows easily beats reading the whole table. It loses when it would drag you to thousands of scattered heap pages, because thousands of random reads cost more than one sequential pass.

Now the subtle part, the one that separates engineers who memorized “index for few rows” from those who can actually predict a plan. Whether those heap fetches are truly random depends on correlation. Correlation measures how closely the index’s key order matches the physical order of rows on disk.

Picture two columns on the same orders table. created_at is appended in time order, so rows that are next to each other in the index are also next to each other on disk: high correlation, near 1.0. A range scan on created_at walks the index and finds the matching heap rows sitting in a handful of consecutive pages, so the heap fetches are nearly sequential and cheap. Now take customer_id, scattered all over the table as customers placed orders at random times: low correlation, near 0. A range scan on customer_id walks the index and is flung to a different page on almost every fetch, so the heap reads are genuinely random and expensive.

flowchart LR
    subgraph HIGH["High correlation: created_at"]
        I1["index order"] --> H1["heap pages 4,4,5,5,6"]
    end
    subgraph LOW["Low correlation: customer_id"]
        I2["index order"] --> H2["heap pages 91,3,57,8,40"]
    end

This is why the same index helps one column and not another, and why an index that looks useless can be rescued by physically reordering the table. The CLUSTER command rewrites a table in the order of a chosen index, driving that column’s correlation toward 1.0 and turning its index scans nearly sequential. It is a one-time, locking rewrite and new writes are not kept in order, so it is a periodic maintenance act, not a guarantee. You read the current correlation straight from the statistics catalog.

SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'orders'
ORDER BY abs(correlation) DESC;
   attname    | correlation
--------------+-------------
 created_at   |       0.998
 id           |       0.994
 customer_id  |       0.012

A value near 1.0 or near -1.0 means index scans on that column will have cheap, near-sequential heap access; a value near 0 warns you they will be random and the planner will be reluctant to use them.

4. Index Only Scan: skipping the heap with the visibility map

The expensive half of an index scan is the heap fetch. So PostgreSQL has a way to skip it entirely. An Index Only Scan answers the query from the index alone, never touching the heap, when two conditions both hold.

First, every column the query needs must be present in the index. If you select and filter only on columns the index already stores, the index has the whole answer. Second, PostgreSQL must be sure the row is visible to your transaction without checking the heap, and it learns this from the visibility map: a compact bitmap, two bits per heap page, where the all-visible bit means every tuple on that page is visible to all current transactions. When the page is marked all-visible, the index entry can be trusted and the heap fetch is skipped.

The catch is the second condition, and it is where index-only scans quietly fail. The visibility map is maintained by VACUUM. After a burst of inserts or updates, the affected pages are not yet marked all-visible, so even a perfectly covering index has to fall back and check the heap for those pages. PostgreSQL reports exactly how often this happened on the Heap Fetches line.

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, created_at
FROM orders
WHERE customer_id = 42;
 Index Only Scan using orders_cust_created_idx on orders
     (cost=0.43..8.55 rows=12 width=12)
     (actual time=0.021..0.040 rows=12 loops=1)
   Index Cond: (customer_id = 42)
   Heap Fetches: 0
   Buffers: shared hit=4

Heap Fetches: 0 is the win: not one heap page was touched. If that number is high relative to the rows returned, your visibility map is stale and the “index only” scan is doing heap work behind your back. The fix is to vacuum the table so the map catches up.

VACUUM (VERBOSE) orders;

You can make far more queries qualify for index-only scans by widening what the index stores. An INCLUDE clause adds non-key payload columns to the index leaf, so a query can be answered from the index without those columns being part of the search key. The Advanced Indexing Techniques page covers covering indexes in full; the point here is that INCLUDE is the deliberate lever for turning an ordinary index scan into an index-only one.

CREATE INDEX CONCURRENTLY orders_cust_created_idx
  ON orders (customer_id) INCLUDE (created_at, status);
The same query, with a healthy vs a stale visibility map
VACUUM has set the all-visible bit on those pages, so Heap Fetches drops to 0 and the scan reads only index pages. This is the latency you expected when you built the covering index.

5. Bitmap scans: the in-between, and combining indexes

Sequential scan is cheap per page but reads everything. Index scan is precise but pays a random fetch per row. There is a large middle ground where you want, say, two percent of the table: too many rows for a tidy index scan, too few to justify reading the whole thing. That middle ground belongs to the bitmap scan, which always appears as a pair of nodes.

The lower node, Bitmap Index Scan, walks the index but does not jump to the heap per row. Instead it builds an in-memory bitmap marking which heap pages (and which tuples within them) hold matching rows. The upper node, Bitmap Heap Scan, then reads those pages, and here is the trick: it visits them in physical page order, once each. So a low-correlation column whose index scan would be a storm of random reads becomes, under a bitmap scan, a single ordered sweep of just the pages that matter. The bitmap converts random access into sorted access.

flowchart TD
    BIS["Bitmap Index Scan: walk index, mark matching pages"] --> BM["in-memory bitmap of tuple locations"]
    BM --> BHS["Bitmap Heap Scan: read marked pages in physical order"]
    BHS --> R["Recheck Cond re-applies the predicate"]

The bitmap form also unlocks something a plain index scan cannot do: combining indexes. PostgreSQL can build a bitmap from one index and another bitmap from a second index, then merge them with BitmapAnd (for AND predicates) or BitmapOr (for OR), and only then touch the heap. This is why two single-column indexes can together serve a two-column query, no composite index required.

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE status = 'returned' AND region = 'EU';
 Bitmap Heap Scan on orders  (cost=24.30..1842.50 rows=900 width=64)
                             (actual time=0.512..3.880 rows=874 loops=1)
   Recheck Cond: ((status = 'returned') AND (region = 'EU'))
   Heap Blocks: exact=812
   Buffers: shared hit=820
   ->  BitmapAnd  (cost=24.30..24.30 rows=900 width=0)
         ->  Bitmap Index Scan on orders_status_idx
               Index Cond: (status = 'returned')
         ->  Bitmap Index Scan on orders_region_idx
               Index Cond: (region = 'EU')

One detail on that plan controls whether a bitmap scan stays fast. The exact bitmap, one bit per tuple, lives in work_mem. If the result set is large enough that the bitmap would not fit, PostgreSQL degrades it to lossy mode, where it remembers only that a whole page contains some match, not which tuples. That costs you on the heap side: every tuple on a lossy page must be re-tested, which is exactly what the Recheck Cond line does. A bitmap scan always carries a recheck, but on a lossy bitmap the recheck does real filtering work instead of being a formality. You will see Heap Blocks: lossy=NNNN appear next to or instead of exact=. The cure is to raise work_mem for that query or narrow the result so the exact bitmap fits.

SET work_mem = '128MB';

6. The decision boundary, and how to shift it

Now put the pieces together into the question you actually face: where is the line between a seq scan and an index scan? The folklore answer is “around five to ten percent selectivity, switch to an index.” That number is a useful gut-check, but it is a symptom, not the rule. The real rule is cost.

The planner estimates the index path’s cost as roughly the number of matching rows, times how random their heap pages are (driven by correlation), times random_page_cost, plus the index descent. It estimates the seq scan as the page count times seq_page_cost. It picks the smaller. The five-to-ten percent figure is just where those two estimates tend to cross for default settings on a well-correlated column. Change any input and the crossover moves: a low-correlation column pushes the boundary down (the index gives up sooner, because its heap fetches are pricier), and a covering index that enables an index-only scan pushes it way up (no heap fetches to price at all).

The most consequential input you control is random_page_cost. On the spinning disks of 2005, four-to-one was honest. On NVMe SSDs a random read is nearly as cheap as a sequential one, so leaving random_page_cost at 4.0 makes the planner systematically overprice every index path and reach for seq scans far too late. Lowering it toward 1.1, as the Tuning the Planner Cost Settings page details, shifts the whole boundary toward index scans, because the random fetches an index needs are no longer being charged a disk-seek tax that your hardware does not pay.

Drag the slider to feel where the boundary lives. The bands are the planner’s rough zones on the raw selectivity percent.

Which scan wins, by selectivity
Fraction of the table the query returns3%
0%100%
Bitmap Heap Scan: too many for a tidy index scan, sorted-page sweep wins

Treat the band edges as soft. A column with correlation near 1.0 keeps index scans winning well past 8 percent; a column near 0 hands the job to a bitmap scan almost immediately. And remember the slider shows the default-hardware story: lower random_page_cost and every edge slides to the right.

7. The smaller scan nodes and the parallel variants

The five-way contest above covers nearly every table access, but a few more scan nodes show up in plans and you should recognize them so they do not surprise you.

A TID Scan appears when you query by ctid, the physical tuple address itself, for example WHERE ctid = '(0,1)'. It is the most direct read possible: PostgreSQL goes straight to that page and item, no index, no scan. A Function Scan is how a set-returning function in the FROM clause, such as generate_series(1, 1000), feeds rows into the plan. A Values Scan is the node for an inline VALUES list, the rows you write literally inside the query. These three are not choices the planner agonizes over; they are the only way to read their particular sources, and they cost almost nothing.

The parallel variants matter more for performance. Beyond the Parallel Seq Scan from rung 2, PostgreSQL can run a Parallel Index Scan and a Parallel Bitmap Heap Scan. In the parallel bitmap case, one worker builds the bitmap and all workers then divide up the marked heap pages, which is a strong plan for a large medium-selectivity query on a many-core box. You will see a Gather or Gather Merge node above them in the plan, collecting the workers’ output. The Reading EXPLAIN ANALYZE page explains how to read the loops and per-worker rows on these nodes.

Recognizing the rarer nodes in a plan

8. Diagnosing and overriding the planner’s choice

You now know the choices; here is how to inspect and challenge them on a live system. Three tools cover almost every investigation.

The first is EXPLAIN (ANALYZE, BUFFERS). ANALYZE runs the query and reports actual rows and time, so you can compare the planner’s estimate to reality, and the gap between estimated and actual rows is usually the root cause of a bad scan choice. BUFFERS reports pages read, which tells you whether an “index only” scan is secretly fetching from the heap and whether a bitmap went lossy. Always read both together.

The second is pg_stats.correlation, from rung 3. Before you blame the planner for refusing an index, check whether the column is well correlated. A near-zero correlation explains a “stubborn” seq scan honestly: the index would cause random I/O the planner is right to avoid, and the fix is CLUSTER or a bitmap-friendly query, not forcing the index.

The third is the enable_* family, and it is a diagnostic, not a setting. SET enable_seqscan = off does not forbid sequential scans; it adds a large cost penalty so the planner avoids one when any alternative exists. You use it for a single session to see what the index plan would have cost, then compare.

SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
RESET enable_seqscan;

If the forced index plan is dramatically cheaper than the seq scan the planner chose, your cost settings are mismodeling your hardware, and the durable fix is to lower random_page_cost, not to ship enable_seqscan = off in production. If the forced plan is more expensive, the planner was right and you just confirmed it.

Check yourself
A covering index-only scan on a freshly bulk-loaded table is slow, and EXPLAIN (ANALYZE, BUFFERS) shows 'Heap Fetches: 480000' on 500,000 rows. What is the cause and the fix?

The mistakes that bite experienced engineers are nearly always one of four. Expecting an index scan while selecting half the table: the planner is right to read it sequentially, and no amount of indexing changes that. A seq scan chosen on an SSD because random_page_cost is still 4.0: the planner is overpricing the index, lower the constant. An index-only scan that is secretly hitting the heap: a stale visibility map, vacuum the table. And a bitmap heap scan crawling on a huge result: it went lossy, raise work_mem or narrow the query. Each failure mode has a named line in the plan that points straight at it, which is why you read the plan first and guess never.

Mastery Questions

  1. You have a 50-million-row events table. A query filtering WHERE tenant_id = 7 returns about 200 rows but runs as a Bitmap Heap Scan with Heap Blocks: exact=198, touching almost 200 separate pages, when you expected a fast Index Scan. The tenant_id index exists and is used. Why a bitmap and 198 page reads for 200 rows, and what would make it faster?

    Answer. The bitmap and the near-one-page-per-row reads are both symptoms of low correlation. Those 200 rows for tenant 7 are physically scattered across the table, because events arrive interleaved from all tenants, so pg_stats.correlation for tenant_id is near zero. A plain index scan would do 200 random heap fetches in index order; the planner instead chose a bitmap so it could read those scattered pages in sorted physical order, one pass, which is cheaper than 200 random jumps. But the rows still live on roughly 200 distinct pages, so the page count cannot drop below the number of distinct pages holding a match, regardless of scan type. To make it genuinely fast you must change the physical layout or the index. Clustering the table on tenant_id (or partitioning by tenant) packs each tenant’s rows together, collapsing those 198 pages toward a handful, after which an index scan becomes cheap. Alternatively, a covering index, for example on (tenant_id) INCLUDE (the selected columns), lets an index-only scan answer the query without touching the heap at all, sidestepping the scatter entirely as long as the visibility map is current. The lesson is that “few rows” does not guarantee “few pages”; correlation decides how many pages those few rows occupy.

  2. A reporting query was fast for months as an Index Only Scan, then degraded overnight to high latency with no code or schema change. EXPLAIN (ANALYZE, BUFFERS) still shows an Index Only Scan node, but Heap Fetches is now huge and Buffers read jumped. What happened, and how do you confirm and fix it?

    Answer. The index-only scan stopped being able to skip the heap, because the visibility map went stale. An index-only scan reads from the index alone only on heap pages whose all-visible bit is set in the visibility map; on any page where that bit is unset, it must fetch the heap to confirm visibility, and each such fetch is counted in Heap Fetches. The bit gets unset whenever a page is modified and is only re-set by VACUUM. So the overnight trigger is almost certainly a surge of writes to the table combined with autovacuum falling behind, perhaps because a long-running transaction is holding back the cleanup horizon, or because autovacuum is throttled and the table churns faster than it can keep up. The node type stayed the same, which is exactly why the symptom is confusing: the plan still says “Index Only Scan,” but operationally it is doing heap work. You confirm it by reading Heap Fetches against the row count (a healthy index-only scan shows near zero) and by checking pg_stat_user_tables for a high n_dead_tup and an old last_autovacuum. The fix is to vacuum the table to rebuild the visibility map, then prevent recurrence by tuning autovacuum more aggressively for this table and clearing any long-open transaction that pins the horizon.

  3. On a new NVMe-backed server, point lookups are fast but range queries that should use an index keep coming back as sequential scans, even though the index exists and ANALYZE has run with accurate statistics. Forcing the index with SET enable_seqscan = off produces a plan that actually runs several times faster. What is the most likely cause, and what is the correct fix?

    Answer. The planner is mispricing index access for this hardware, almost certainly because random_page_cost is still at its default of 4.0. That default models a spinning disk, where a random page fetch really is about four times the cost of a sequential one. On NVMe a random read is nearly as cheap as a sequential read, so charging index paths a four-to-one tax makes the planner systematically overestimate their cost and prefer a sequential scan past the point where the index would actually win. The enable_seqscan = off experiment is the proof: it penalizes the seq scan enough to force the index plan, and that plan running faster shows the index path was genuinely cheaper all along, the planner just could not see it through the wrong constant. The correct fix is not to ship enable_seqscan = off, which is a session-only diagnostic that would distort every other query, but to lower random_page_cost toward 1.1 to match the device, ideally alongside a realistic effective_cache_size, as the Tuning the Planner Cost Settings page describes. After that the planner reaches for the index on its own, at the right selectivity boundary, for every query, not just the one you hand-forced.

Recommended next

Sources & evidence14 claims · 4 cited

Mechanism and metric claims are grounded in the PostgreSQL index-types, EXPLAIN, and planner-statistics docs. Widely-known operational facts those docs do not state in exact numeric terms (the seq_page_cost 1.0 / random_page_cost 4.0 defaults and the SSD 1.1 convention, the 8MB min_parallel_table_scan_size, the rough 5-10% selectivity folklore, enable_seqscan as a session-only diagnostic, the 8kB default page size) are marked stable-common-knowledge with empty source_ids.

  • PostgreSQL's cost model is relative to seq_page_cost (default 1.0, the price of one sequential page read), and random_page_cost defaults to 4.0, so the planner assumes a random page fetch costs four times a sequential one; that ratio is the gravity pulling plans toward sequential reads and away from index-driven random reads.stable common knowledge
  • A Sequential Scan reads every page of the table in physical order with no index, applies the predicate as a Filter, and wins when the query returns a large fraction of the table, the table is tiny, or no useful index exists, because every read is priced at the cheap seq_page_cost.verified
  • A large table's sequential scan can run as a Parallel Seq Scan with several workers dividing the pages under a Gather node, but parallelism only engages above min_parallel_table_scan_size (8MB by default) because worker startup does not pay off on a small table.stable common knowledge
  • An Index Scan descends the B-tree for the matching key range and follows each entry's pointer to the heap to fetch the row, so its cost is dominated by per-row heap fetches priced near random_page_cost; it wins at high selectivity (few rows) and loses when it would drag the query to thousands of scattered heap pages.verified
  • The pg_stats.correlation value measures how closely a column's index order matches the physical order of rows on disk; near 1.0 (or -1.0) the heap fetches of an index scan are nearly sequential and cheap, while near 0 they are random and expensive, which is why the same index helps one column and not another, and CLUSTER rewrites the table in an index's order to drive that correlation toward 1.0.verified
  • An Index Only Scan answers a query from the index alone, skipping the heap, only when every column the query needs is in the index AND the heap page is marked all-visible in the visibility map; otherwise it must fetch the heap to confirm visibility.verified
  • The Heap Fetches line in EXPLAIN reports how often an index-only scan had to visit the heap because a page was not marked all-visible; a high value relative to rows returned indicates a stale, under-vacuumed visibility map, and running VACUUM re-sets the all-visible bits so Heap Fetches drops toward zero.verified
  • An INCLUDE clause adds non-key payload columns to a B-tree index's leaf entries so a query can be answered from the index without those columns being part of the search key, deliberately turning ordinary index scans into index-only scans.verified
  • A Bitmap Index Scan walks an index and builds an in-memory bitmap of matching tuple locations rather than fetching per row, and the Bitmap Heap Scan then visits the marked heap pages once in physical page order, which converts the random access of a low-correlation index scan into a single sorted sweep and wins at medium selectivity.verified
  • Because matches are accumulated in an intermediate bitmap, PostgreSQL can combine multiple indexes by merging their bitmaps with BitmapAnd (for AND predicates) or BitmapOr (for OR) before touching the heap, so two single-column indexes can together serve a multi-column query without a composite index.verified
  • The exact per-tuple bitmap lives in work_mem; when the result set is large enough that it would not fit, the bitmap degrades to lossy mode at page granularity, recording only that a page contains some match, which forces the Recheck Cond on the Bitmap Heap Scan to re-test every tuple on those pages and shows up as Heap Blocks lossy, curable by raising work_mem or narrowing the query.verified
  • The seq-versus-index choice is cost-driven, with the index path estimated as roughly matching rows times their heap-page randomness (correlation) times random_page_cost; the folklore 5-10% selectivity tipping point is just where those estimates cross for default settings on a well-correlated column, and lowering random_page_cost toward 1.1 on SSD shifts the whole boundary toward index scans.stable common knowledge
  • Besides the main four, plans can show a TID Scan (reads an exact ctid location with no index), a Function Scan (rows from a set-returning function in FROM, with a default row estimate of 1000 unless the function declares ROWS), and a Values Scan (an inline VALUES list), plus parallel variants including Parallel Index Scan and Parallel Bitmap Heap Scan where workers share one bitmap and split the marked heap pages under a Gather node.verified
  • EXPLAIN (ANALYZE, BUFFERS) reveals the chosen scan node, the gap between estimated and actual rows, and via BUFFERS whether an index-only scan is secretly reading the heap or a bitmap went lossy; pg_stats.correlation explains a 'stubborn' seq scan honestly; and SET enable_seqscan = off does not forbid seq scans but adds a large cost penalty, making it a session-only diagnostic for comparing plan costs rather than a production setting.verified

Cited sources