πŸŽ‰ 10th Anniversary! πŸŽ‰

10.1. Base Backup

Before the introduction of the pg_basebackup utility in version 9.1, online (full) backups were performed using the pg_backup_start and pg_backup_stop commands (versions 8.0 to 9.0). While these commands are no longer commonly used, understanding them is essential for comprehending PostgreSQL’s backup and Point-in-Time Recovery (PITR) mechanisms. We will explore them in more detail in the following subsections and then discuss how pg_basebackup works.

Figure 10.1 shows the standard procedure for taking a base backup using these commands is as follows:

  1. Issue the pg_backup_start command (versions 14 or earlier, pg_start_backup).
  2. Take a snapshot of the database cluster using the archiving command of your choice.
  3. Issue the pg_backup_stop command (versions 14 or earlier, pg_stop_backup).
Figure 10.1. Making a base backup.

This procedure does not require table locks, allowing all users to continue issuing queries without being affected by the backup operation. This is a significant advantage over other major open-source RDBMSs.

The pg_basebackup utility internally invokes the commands described above and therefore inherits their advantages.

Info

The pg_backup_start and pg_backup_stop commands are defined here: src/backend/access/transam/xlogfuncs.c.

10.1.1. pg_backup_start (Ver.14 or earlier, pg_start_backup)

The pg_backup_start command, internally invokes do_pg_backup_start() function, prepares for making a base backup.

As discussed in Section 9.8, the recovery process starts from a REDO point, so the pg_backup_start command must do a checkpoint to explicitly create a REDO point at the start of making a base backup. Moreover, the checkpoint location of its checkpoint must be saved in a file other than pg_control because regular checkpoints might be done a number of times during the backup. Therefore, the pg_backup_start performs the following four operations:

  1. Force the database into full-page write mode.

  2. Switch to the new WAL segment file (versions 8.4 or later).

  3. Do a checkpoint.

  4. Create a backup_label file – This file, created in the top level of the base directory, contains essential information about base backup itself, such as the checkpoint location of this checkpoint.

The third and fourth operations are the heart of this command. The first and second operations are performed to recover a database cluster more reliably.

A backup_label file contains the following six items (versions 11 or later, seven items):

  • CHECKPOINT LOCATION – This is the LSN location where the checkpoint created by this command has been recorded.

  • START WAL LOCATION – This is not used with PITR, but used with the streaming replication, which is described in Chapter 11. It is named ‘START WAL LOCATION’ because the standby server in replication-mode reads this value only once at initial startup.

  • BACKUP METHOD – This is the method used to make this base backup.

  • BACKUP FROM – This shows whether this backup is taken from the primary or standby server.

  • START TIME – This is the timestamp when the pg_backup_start command was executed.

  • LABEL – This is the label specified at the pg_backup_start command.

  • START TIMELINE – This is the timeline that the backup started. This is for a sanity check and has been introduced in version 11.

backup_label

An actual example of a backup_label file in version 16, which is taken by using pg_basebackup, is shown below:

$ cat /usr/local/pgsql/data/backup_label
START WAL LOCATION: 0/1B000028 (file 00000001000000000000001B)
CHECKPOINT LOCATION: 0/1B000060
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2024-1-1 11:45:19 GMT
LABEL: pg_basebackup base backup
START TIMELINE: 1

As you may imagine, when you recover a database using this base backup, PostgreSQL takes the ‘CHECKPOINT LOCATION’ from the backup_label file to read the checkpoint record from the appropriate archive log. It then gets the REDO point from the record and starts the recovery process. (The details will be described in the next section.)

10.1.2. pg_backup_stop (Ver.14 or earlier, pg_stop_backup)

The pg_backup_stop command, internally invokes do_pg_backup_stop() function, performs the following five operations to complete the backup:

  1. Reset to non-full-page writes mode if it has been forcibly changed by the pg_backup_start command.

  2. Write a XLOG record of backup end.

  3. Switch the WAL segment file.

  4. Create a backup history file. This file contains the contents of the backup_label file and the timestamp that the pg_backup_stop command was executed.

  5. Delete the backup_label file. The backup_label file is required for recovery from the base backup, but once copied, it is not necessary in the original database cluster.

Info

The naming method for backup history file is shown below.

{WAL segment}.{offset value at the time the base backup was started}.backup

10.1.3. pg_basebackup

pg_basebackup is a utility for taking online backups.

In version 16 or earlier, pg_basebackup supported full backups of the entire database cluster. In version 17, incremental backups were added. This section explains how to take a full backup. Incremental backups will be discussed in Section 10.5.

To perform a backup remotely, pg_basebackup utilizes the walsender process, a component of streaming replication, which will be explained in Chapter 11.

For instance, to take a full backup from host 192.168.1.10 to the local server at ‘/usr/local/pgsql/backup/full’, we can execute the following command:

$ pg_basebackup -h 192.168.1.10 -p 5432 -D /usr/local/pgsql/backup/full -X stream -P -v

Figure 10.2 shows the sequence of how the pg_basebackup takes a full backup:

Figure 10.2. The sequence of how the pg_basebackup takes a full backup.
  1. Connection request:
    pg_basebackup requests the PostgreSQL server to connect a walsender process.
  2. Create walsender process:
    The PostgreSQL server creates a walsender process and establishes a connection between the walsender and pg_basebackup.
  3. Base backup request:
    pg_basebackup requests a base backup.
  4. Do do_pg_backup_start():
    The walsender process executes the do_pg_backup_start() function.
  5. Send all files:
    The walsender process sends all files under the database cluster except WAL segment files in the pg_wal directory.
  6. Do do_pg_backup_stop():
    The walsender process executes the do_pg_backup_stop() function.
  7. Send WAL files:
    The walsender process sends all WAL segment files in the pg_wal directory if the ‘–wal-method’ option is set to a value other than ’none’.
  8. Send backup_manifest file:
    The walsender process creates and sends the backup_manifest file, which is explained in the following subsection.

The reason why WAL segment files are not sent in step 5 is to ensure that the last WAL segment files will be sent to pg_basebackup. In the step 6, the current WAL segment is switched by executing the do_pg_backup_stop() function, which means that all WAL segment files generated during the backup procedure are certainly stored in the pg_wal directory.

$ ls /usr/local/pgsql/backup/full
PG_VERSION        global        pg_ident.conf  pg_serial     pg_tblspc             postgresql.conf
backup_label      log           pg_logical     pg_snapshots  pg_twophase
backup_manifest   pg_commit_ts  pg_multixact   pg_stat       pg_wal
base              pg_dynshmem   pg_notify      pg_stat_tmp   pg_xact
current_logfiles  pg_hba.conf   pg_replslot    pg_subtrans   postgresql.auto.conf
Why does pg_basebackup use walsender?

Walsender is a process for replication, as explained in Chapter 11. pg_basebackup does not have a direct relationship with replication.

However, when pg_basebackup was developed (and still today), the only processes that could be started as a PostgreSQL server process from an external program were ‘postgres’ and ‘walsender’.

Therefore, the walsender protocol was extended to accommodate pg_basebackup.

10.1.3.1. Backup Manifest Files

A backup manifest file is a JSON file that contains metadata and verification information about the backup.

Key Components are shown as follows:

Key Values
PostgreSQL-Backup-Manifest-Version Backup manifest version number.
Files List of objects that contains all file’s path, size, checksum, etc.
WAL-Ranges Timeline and the LSN range during the backup procedure:
* Start-LSN: The LSN of the REDO point generated by CHECKPOINT when the do_pg_backup_start function is invoked.
* End-LSN: The LSN of the WAL log created by the do_pg_backup_stop function.
Manifest-Checksum The checksum value of this manifest file.

Here’s a cited example of a backup manifest file:

$ cat /usr/local/pgsql/backup/full/backup_manifest
{ "PostgreSQL-Backup-Manifest-Version": 2,
"System-Identifier": 7426689740139212305,
"Files": [
{ "Path": "backup_label", "Size": 225, "Last-Modified": "2024-10-17 10:41:48 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "1950abcb" },
{ "Path": "postgresql.conf", "Size": 30771, "Last-Modified": "2024-10-17 10:29:00 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "a9c769e0" },
{ "Path": "postgresql.auto.conf", "Size": 88, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "536f950b" },
{ "Path": "pg_ident.conf", "Size": 2640, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "0ce04d87" },
{ "Path": "log/postgresql-2024-10-17_193211.log", "Size": 1359, "Last-Modified": "2024-10-17 10:41:48 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "62ed7886" },
{ "Path": "log/postgresql-2024-10-17_192939.log", "Size": 5631, "Last-Modified": "2024-10-17 10:32:11 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "e9891cb3" },
{ "Path": "pg_xact/0000", "Size": 8192, "Last-Modified": "2024-10-17 10:41:48 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "4c2ce5fc" },
{ "Path": "pg_hba.conf", "Size": 5711, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "d62da38c" },
{ "Path": "PG_VERSION", "Size": 3, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "64440205" },
{ "Path": "base/4/113", "Size": 8192, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "d1bc40bb" },
{ "Path": "base/4/1417", "Size": 0, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
{ "Path": "base/4/2610_fsm", "Size": 24576, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b9b5f34f" },
{ "Path": "base/4/3542", "Size": 16384, "Last-Modified": "2024-10-17 10:29:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "e7f849bf" },

... snip ...

{ "Path": "global/pg_control", "Size": 8192, "Last-Modified": "2024-10-17 10:41:48 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "43872087" }
],
"WAL-Ranges": [
{ "Timeline": 1, "Start-LSN": "0/4000028", "End-LSN": "0/4000120" }
],
"Manifest-Checksum": "4c6d8a85379990904f6986f5bfd98db9f4640cfc96f440f8674abe6251cfffb8"}