New WAL format in PostgreSQL 9.5

This post is a part of my document.

The XLOG data format has changed in version 9.5.

Why the format has changed?

In version 9.4 or earlier, there was no common format of XLOG record, so that each resource manager had to define one’s own format. In such a case, it became increasingly difficult to maintain the source code and to implement new features related to WAL. To deal with this issue, a common structured format, which does not depend on resource managers, has been introduced in version 9.5.

New format

XLOG record is composed of a general header portion and each associated data portion. In version 9.5, the common format of the data portion has been defined. See Figure 1.

Figure 1: The new XLOG record format Figure 1: The new XLOG record format

The details will be described in the following.

* If you want to know the old ones, please read my document.

General header portion

The general header portion of XLOG record is defined by the XLogRecord structure.

This portion has not changed, except for removed one variable (xl_len) in version 9.5.

Data portion in version 9.5

The data portion of XLOG record can be divided into two parts: header and data.

Header part contains zero or more XLogRecordBlockHeaders and zero or one XLogRecordDataHeaderShort (or XLogRecordDataHeaderLong); it must contain at least either one of those. When its record stores full-page image (i.e. backup block), XLogRecordBlockHeader includes XLogRecordBlockImageHeader, and also includes XLogRecordBlockCompressHeader if its block is compressed.

Data part is composed of zero or more block data and zero or one main data, which correspond to the XLogRecordBlockHeader(s) and to the XLogRecordDataHeader respectively.

Examples

Some specific examples are shown below.

Figure 2: Examples of XLOG records (version 9.5) Figure 2: Examples of XLOG records (version 9.5)

(a) Backup block

Backup block created by INSERT statement is shown in Figure 2 (a). It is composed of four data structures and one data object as shown below:

  • XLogRecord (general header-portion)
  • XLogRecordBlockHeader including one LogRecordBlockImageHeader
  • XLogRecordDataHeaderShort
  • a backup block (block data)
  • xl_heap_insert structure defined in htup.h (main data)

XLogRecordBlockHeader contains the variables to identify the block in the database cluster (the relfilenode, the fork number, and the block number); XLogRecordImageHeader contains the length of this block and offset number. (These two header structures together can store same data of BkpBlock used until version 9.4.)

XLogRecordDataHeaderShort stores the length of xl_heap_insert structure which is the main data of the record.

The main data of XLOG record which contains full-page image is not used except in some special cases (e.g. being in logical decoding and speculative insertions). It’s ignored when this record is replayed, which is the redundant data. It might be improved in the future.

In addition, main data of backup block records depend on statements which create those. For example, UPDATE statement appends xl_heap_lock or xl_heap_updated.

(b) XLOG record created by INSERT statement

Non-backup block record created by INSERT statement will be described as follows (see also Figure 2 (b)). It is composed of four data structures and one data object as shown below:

  • XLogRecord (general header-portion)
  • XLogRecordBlockHeader
  • XLogRecordDataHeaderShort
  • an inserted tuple (to be exact, a xl_heap_header structure and an inserted data entire)
  • xl_heap_insert structure defined in htup.h (main data)

XLogRecordBlockHeader contains three values (the relfilenode, the fork number, and the block number) to specify the block inserted the tuple, and length of data portion of the inserted tuple. XLogRecordDataHeaderShort contains the length of new xl_heap_insert structure, which is the main data of this record.

The new xl_heap_insert contains only two values: offset number of this tuple within the block, and a visibility flags; it became very simple because XLogRecordBlockHeader stores most of data contained in the old one.

(c) XLOG record created by CHECKPOINT

As the final example, a checkpoint record is shown in the Figure 2 (c). It is composed of three data structure as shown below:

  • XLogRecord (general header-portion)
  • XLogRecordDataHeaderShortcontained of the main data length
  • CheckPoint structure defined in pg_control.h (main data)

Size of XLOG records

The size of many types of XLOG records, especially full-page writes and insertion, is usually smaller than the previous one.

type of XLOG record     9.5     9.4 or earlier
full-page writes     54byte + page     56byte + page
insertion     49byte + tuple     56byte + tuple
checkpoint     26byte + checkpoint data     32byte + checkpoint data

Conclusion

Though the new format is a little complicated for us, it is well-designed for the parser of the resource managers, and also size of many types of XLOG records is usually smaller than the previous one.