8.1. Overview

This section introduces key concepts that are necessary to understand the descriptions in the subsequent sections.

8.1.1. Buffer Manager Structure

The PostgreSQL buffer manager comprises a buffer table, buffer descriptors, and buffer pool, which are described in the next section.

The buffer pool layer stores data file pages, such as tables and indexes, as well as freespace maps and visibility maps.

The buffer pool is an array, where each slot stores one page of a data file. The Indices of a buffer pool array are referred to as buffer_ids.

Sections 8.2 and 8.3 describe the details of the buffer manager internals.

8.1.2. Buffer Tag

In PostgreSQL, each page of all data files can be assigned a unique tag, i.e. a buffer tag. When the buffer manager receives a request, PostgreSQL uses the buffer_tag of the desired page.

The buffer_tag has five values:

specOid: The OID of the tablespace to which the relation containing the target page belongs.
dbOid: The OID of the database to which the relation containing the target page belongs.
relNumber: The number of the relation file that contains the target page.
blockNum: The block number of the target page in the relation.
forkNum: The fork number of the relation that the page belongs to. The fork numbers of tables, freespace maps, and visibility maps are defined in 0, 1 and 2, respectively.

buffer_tag

/*
 * Buffer tag identifies which disk block the buffer contains.
 *
 * Note: the BufferTag data must be sufficient to determine where to write the
 * block, without reference to pg_class or pg_tablespace entries.  It's
 * possible that the backend flushing the buffer doesn't even believe the
 * relation is visible yet (its xact may have started before the xact that
 * created the rel).  The storage manager must be able to cope anyway.
 *
 * Note: if there's any pad bytes in the struct, InitBufferTag will have
 * to be fixed to zero them, since this struct is used as a hash key.
 */
typedef struct buftag
{
	Oid			spcOid;			/* tablespace oid */
	Oid			dbOid;			/* database oid */
	RelFileNumber relNumber;	/* relation file number */
	ForkNumber	forkNum;		/* fork number */
	BlockNumber blockNum;		/* blknum relative to begin of reln */
} BufferTag;

For example, the buffer_tag ‘{16821, 16384, 37721, 0, 7}’ identifies the page that is in the seventh block of the table whose OID and fork number are 37721 and 0, respectively. The table is contained in the database whose OID is 16384 under the tablespace whose OID is 16821.

Similarly, the buffer_tag ‘{16821, 16384, 37721, 1, 3}’ identifies the page that is in the third block of the freespace map whose OID and fork number are 37721 and 1, respectively.

8.1.3. How a Backend Process Reads Pages

This subsection describes how a backend process reads a page from the buffer manager (Figure 8.2).

Figure 8.2. How a backend reads a page from the buffer manager.

(1) When reading a table or index page, a backend process sends a request that includes the page’s buffer_tag to the buffer manager.
(2) The buffer manager returns the buffer_ID of the slot that stores the requested page. If the requested page is not stored in the buffer pool, the buffer manager loads the page from persistent storage to one of the buffer pool slots and then returns the buffer_ID of the slot.
(3) The backend process accesses the buffer_ID’s slot (to read the desired page).

When a backend process modifies a page in the buffer pool (e.g., by inserting tuples), the modified page, which has not yet been flushed to storage, is referred to as a dirty page.

Section 8.4 describes how the buffer manager works in mode detail.

8.1.4. Page Replacement Algorithm

When all buffer pool slots are occupied and the requested page is not stored, the buffer manager must select one page in the buffer pool to be replaced by the requested page. Typically, in the field of computer science, page selection algorithms are called page replacement algorithms, and the selected page is referred to as a victim page.

Research on page replacement algorithms has been ongoing since the advent of computer science. Many replacement algorithms have been proposed, and PostgreSQL has used the clock sweep algorithm since version 8.1. Clock sweep is simpler and more efficient than the LRU algorithm used in previous versions.

Section 8.4.4 describes the details of clock sweep.

Historical Information

Until version 7.4, PostgreSQL used a simple LRU algorithm.

Although PostgreSQL implemented ARC (Adaptive Replacement Cache) in version 8.0, it might have been possible to violate IBM’s patent.

As a result, the PostgreSQL community tentatively changed the replacement algorithm to 2Q, and then PostgreSQL supported clock sweep in version 8.1.

8.1.5. Flushing Dirty Pages

Dirty pages should eventually be flushed to storage. However, the buffer manager requires help to perform this task. In PostgreSQL, two background processes, checkpointer and background writer, are responsible for this task.

Section 8.6 describes the checkpointer and background writer.

Direct I/O

PostgreSQL versions 15 and earlier do not support direct I/O, although it has been discussed. Refer to this article on the pgsql-ML and this article.

In version 16, the debug-io-direct option has been added. This option is for developers to improve the use of direct I/O in PostgreSQL. If development goes well, direct I/O will be officially supported in the near future.