8.1. Overview

This section introduces key concepts necessary to understand the descriptions in the subsequent sections.

8.1.1. Buffer Manager Structure

The PostgreSQL buffer manager comprises a buffer table, buffer descriptors, and a buffer pool. The next section describes these components.

The buffer pool layer stores data file pages, such as tables and indexes, as well as freespace maps and visibility maps.

The buffer pool is an array where each slot stores one page of a data file. The indices of the buffer pool array are referred to as buffer_ids.

Sections 8.2 and 8.3 describe the details of the buffer manager internals. Sections 8.2 and 8.3 describe the details of the buffer manager internals.

8.1.2. Buffer Tag

In PostgreSQL, each page of all data files can be assigned a unique tag, i.e., a buffer tag. When the buffer manager receives a request, PostgreSQL uses the buffer_tag of the desired page.

The buffer_tag consists of five values:

  • specOid: The OID of the tablespace to which the relation belongs.
  • dbOid: The OID of the database to which the relation belongs.
  • relNumber: The number of the relation file that contains the target page.
  • blockNum: The block number of the target page in the relation.
  • forkNum: The fork number of the relation to which the page belongs. The fork numbers for tables, free space maps, and visibility maps are defined as 0, 1, and 2, respectively.
/*
 * Buffer tag identifies which disk block the buffer contains.
 *
 * Note: the BufferTag data must be sufficient to determine where to write the
 * block, without reference to pg_class or pg_tablespace entries.  It's
 * possible that the backend flushing the buffer doesn't even believe the
 * relation is visible yet (its xact may have started before the xact that
 * created the rel).  The storage manager must be able to cope anyway.
 *
 * Note: if there's any pad bytes in the struct, InitBufferTag will have
 * to be fixed to zero them, since this struct is used as a hash key.
 */
typedef struct buftag
{
	Oid			spcOid;			/* tablespace oid */
	Oid			dbOid;			/* database oid */
	RelFileNumber relNumber;	/* relation file number */
	ForkNumber	forkNum;		/* fork number */
	BlockNumber blockNum;		/* blknum relative to begin of reln */
} BufferTag;

For example, the buffer_tag ‘{16821, 16384, 37721, 0, 7}’ identifies the seventh block of the table whose OID is 37721 and fork number is 0. This table is located in the database with OID 16384 under the tablespace with OID 16821.

Similarly, the buffer_tag ‘{16821, 16384, 37721, 1, 3}’ identifies the third block of the free space map whose OID is 37721 and fork number is 1.

8.1.3. How a Backend Process Reads Pages

This subsection describes how a backend process reads a page from the buffer manager (Figure 8.2).

Figure 8.2. How a backend reads a page from the buffer manager.
  • (1) When reading a table or index page, a backend process sends a request including the page’s buffer_tag to the buffer manager.

  • (2) The buffer manager returns the buffer_id of the slot that stores the requested page. If the requested page is not in the buffer pool, the buffer manager loads the page from persistent storage into a buffer pool slot and then returns the buffer_id.

  • (3) The backend process accesses the slot of the buffer_id to read the desired page.

When a backend process modifies a page in the buffer pool (e.g., by inserting tuples), the modified page is referred to as a dirty page because it has not yet been flushed to storage.

Section 8.4 describes buffer manager operations in more detail.

8.1.4. Page Replacement Algorithm

When all buffer pool slots are occupied and the requested page is not stored, the buffer manager must select a page in the buffer pool to be replaced. In computer science, these selection algorithms are called page replacement algorithms, and the selected page is referred to as a victim page.

Research on page replacement algorithms has been ongoing since the advent of computer science. Many algorithms have been proposed; PostgreSQL has used the clock sweep algorithm since version 8.1. Clock sweep is simpler and more efficient than the LRU algorithm used in previous versions.

Section 8.4.4 describes the details of clock sweep.

Historical Information

PostgreSQL relied on a simple LRU algorithm until version 7.4.

Version 8.0 (released on 19 January 2005) implemented Adaptive Replacement Cache (ARC), but concerns arose over a potential violation of an IBM patent.

As a result, the PostgreSQL community replaced ARC with the 2Q algorithm in version 8.0.2 (7 April 2005), before subsequently adopting the clock sweep algorithm in version 8.1.

8.1.5. Flushing Dirty Pages

Dirty pages must eventually be flushed to storage. However, the buffer manager requires assistance to perform this task. In PostgreSQL, two background processes, the checkpointer and background writer, are responsible for this task.

Section 8.6 describes the checkpointer and background writer.

Direct I/O

PostgreSQL versions 15 and earlier do not support direct I/O. Refer to this article on the pgsql-ML and this article.

In version 16, the debug-io-direct option was added. This option allows developers to improve direct I/O use in PostgreSQL. If development proceeds successfully, PostgreSQL will officially support direct I/O in the near future.