7.2. Index-Only Scans

To reduce the I/O (input/output) cost, index-only scans (often called index-only access) directly use the index key without accessing the corresponding table pages when all of the target entries of the SELECT statement are included in the index key. This technique is provided by almost all commercial RDBMS, such as DB2 and Oracle. PostgreSQL has introduced this option since version 9.2.

In the following, using a specific example, a description of how index-only scans in PostgreSQL perform is given.

The assumptions of the example are explained below:

  • Table definition
    We have a table ’tbl’ of which the definition is shown below:
testdb=# \d tbl
      Table "public.tbl"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | 
 name   | text    | 
 data   | text    | 
Indexes:
    "tbl_idx" btree (id, name)
  • Index
    The table ’tbl’ has an index ’tbl_idx’, which is composed of two columns: ‘id’ and ’name’.

  • Tuples
    ’tbl’ has already inserted tuples.

    • Tuple_18, of which the id is 18 and name is ‘Queen’, is stored in the 0th page.
    • Tuple_19, of which the id is 19 and name is ‘BOSTON’, is stored in the 1st page.
  • Visibility
    All tuples in the 0th page are always visible; the tuples in the 1st page are not always visible. Note that the visibility of each page is stored in the corresponding visibility map (VM), and the VM is described in Section 6.2.

Let us explore how PostgreSQL reads tuples when the following SELECT command is executed.

testdb=# SELECT id, name FROM tbl WHERE id BETWEEN 18 and 19;
 id |  name   
----+--------
 18 | Queen
 19 | Boston
(2 rows)

This query gets data from two columns of the table, ‘id’ and ’name’, and the index ’tbl_idx’ is composed of these columns. Thus, it seems at first glance that accessing the table pages is not required when using index scan, because the index tuples contain the necessary data.

However, in fact, PostgreSQL has to check the visibility of the tuples in principle. The index tuples do not have any information about transactions, such as the t_xmin and t_xmax of the heap tuples, which are described in Section 5.2.

Therefore, PostgreSQL has to access the table data to check the visibility of the data in the index tuples. This is like putting the cart before the horse.

To avoid this dilemma, PostgreSQL uses the visibility map of the target table. If all tuples stored in a page are visible, PostgreSQL uses the key of the index tuple and does not access the table page that is pointed at from the index tuple to check its visibility. Otherwise, PostgreSQL reads the table tuple that is pointed at from the index tuple and checks the visibility of the tuple, which is the ordinary process.

In this example, Tuple_18 does not need to be accessed because the 0th page that stores Tuple_18 is visible. That is, all tuples in the 0th page, including Tuple_18, are visible. In contrast, Tuple_19 needs to be accessed to handle concurrency control because the visibility of the 1st page is not visible. See Fig. 7.7.

Fig. 7.7. How Index-Only Scans performs