1.3. Internal Layout of a Heap Table File

Inside a data file (heap table, index, free space map, and visibility map), it is divided into pages (or blocks) of fixed length, which is 8192 bytes (8 KB) by default. The pages within each file are numbered sequentially from 0, and these numbers are called block numbers. If the file is full, PostgreSQL adds a new empty page to the end of the file to increase the file size.

The internal layout of pages depends on the data file type. In this section, the table layout is described, as this information will be required in the following chapters.

Fig. 1.4. Page layout of a heap table file.

A page within a table contains three kinds of data:

  1. heap tuple(s) – A heap tuple is a record data itself. Heap tuples are stacked in order from the bottom of the page.
    The internal structure of tuple is described in Section 5.2 and Chapter 9, as it requires knowledge of both concurrency control (CC) and write-ahead logging (WAL) in PostgreSQL.

  2. line pointer(s) – A line pointer is 4 bytes long and holds a pointer to each heap tuple. It is also called an item pointer.
    Line pointers form a simple array that plays the role of an index to the tuples. Each index is numbered sequentially from 1, and called offset number. When a new tuple is added to the page, a new line pointer is also pushed onto the array to point to the new tuple.

  3. header data – A header data defined by the structure PageHeaderData is allocated in the beginning of the page. It is 24 byte long and contains general information about the page.
    The major variables of the structure are described below:

    typedef struct PageHeaderData @src/include/storage/bufpage.h
    {
      /* XXX LSN is member of *any* block, not only page-organized ones */
      PageXLogRecPtr       pd_lsn;      /* LSN: next byte after last byte of xlog
      		       		     * record for last change to this page */
      uint16        	pd_checksum; /* checksum */
      uint16        	pd_flags;    /* flag bits, see below */
      LocationIndex 	pd_lower;    /* offset to start of free space */
      LocationIndex 	pd_upper;    /* offset to end of free space */
      LocationIndex 	pd_special;  /* offset to start of special space */
      uint16        	pd_pagesize_version;
      TransactionId 	pd_prune_xid;/* oldest prunable XID, or zero if none */
      ItemIdData    	pd_linp[1];  /* beginning of line pointer array */
    } PageHeaderData;
    
    typedef PageHeaderData *PageHeader;
    
    typedef uint64 XLogRecPtr;

    (

    • pd_lsn – This variable stores the LSN of XLOG record written by the last change of this page. It is an 8-byte unsigned integer, and is related to the WAL (Write-Ahead Logging) mechanism. The details are described in Section 9.1.2.

    • pd_checksum – This variable stores the checksum value of this page. (Note that this variable is supported in versions 9.3 or later; in earlier versions, this part had stored the timelineId of the page.)

    • pd_lower, pd_upper – pd_lower points to the end of line pointers, and pd_upper to the beginning of the newest heap tuple.

    • pd_special – This variable is for indexes. In the page within tables, it points to the end of the page. (In the page within indexes, it points to the beginning of special space, which is the data area held only by indexes and contains the particular data according to the kind of index types such as B-tree, GiST, GiN, etc.)

An empty space between the end of line pointers and the beginning of the newest tuple is referred to as free space or hole.

To identify a tuple within the table, a tuple identifier (TID) is used internally. A TID comprises a pair of values: the block number of the page that contains the tuple, and the offset number of the line pointer that points to the tuple. A typical example of its usage is index. See more detail in Section 1.4.2.

Info

The structure PageHeaderData is defined in src/include/storage/bufpage.h.

Info

In the field of computer science, this type of page is called a slotted page, and the line pointers correspond to a slot array.

In addition, heap tuple whose size is greater than about 2 KB (about 1/4 of 8 KB) is stored and managed using a method called TOAST (The Oversized-Attribute Storage Technique). Refer PostgreSQL documentation for details.