Base backup and PITR

This post is the first two sections of my document.

In PostgreSQL, online physical full backup has been available since version 8.0, and a snapshot of a running whole database cluster (i.e. physical backup data) is known as a base backup.

Point-in-Time Recovery (PITR), which has also been available since version 8.0, is the feature to restore a database cluster to any point in time using a base backup and archive logs created by continuous archiving feature.

For example, even if you made a critical mistake (e.g. truncating all tables), this feature enables you to restore the database of the point just before the mistake you have made.

In this post, following topics are described:

What base backup is
How PITR works

Base backup

First of all, the standard procedure to make a base backup using the low-level commands is shown below:

Issue the pg_start_backup command
Take a snapshot of the database cluster with the archiving command you want to use
Issue the pg_stop_backup command

Figure 1: Making a base backup

This simple procedure is easy-to-use for database system administrators, because it requires no special tools but common tools such as copy command or a similar archiving tools to create a base backup. In addition, in this procedure, no table locks are required and all users can issue queries without being affected by backup operation. Those are big advantages over other major open source RDBMS.

More simple way to make a base backup is to use the pg_basebackup utility, but it internally issues those low-level commands.

As the pair of those commands is clearly one of the key points for understanding the PITR, we will explore them in the next section.

pg_start_backup

The pg_start_backup prepares for making a base backup. As discussed in my document, recovery process starts from a REDO point, so the pg_start_backup must do checkpoint to explicitly create a REDO point at the start of making a base backup. Moreover, the checkpoint location of its checkpoint must be saved in a file other than pg_control because regular checkpoint might be done a number of times during backup. Hence the pg_start_backup performs the following four operations:

Force into the full-page wirte mode.
Switch to the current WAL segment file (version 8.4 or later).
Do checkpoint.
Create a backup_label file – This file, created in the top level of the base directory, contains essential information about base backup itself, such as the checkpoint location of this checkpoint.

The third and fourth operations are the heart of this command; the first and second operations are performed to recover a database cluster more reliably.

A backup_label file contains the following five items:

CHECKPOINT LOCATION – This is the LSN location where the checkpoint created by this command has been recorded.
START WAL LOCATION – This is not used with PITR, but used with the streaming replication, which is described in Chapter 11. It is named “START WAL LOCATION” because standby server in replication-mode reads this value only once at initial startup.
BACKUP METHOD – This is the method used to make this base backup. (Either ‘pg_start_backup’ or ‘pg_basebackup’.)
START TIME – This is the timestamp when the pg_start_backup has executed.
LABEL – This is the label specified at the pg_start_backup.

An actual example of backup_label file is shown in the following:

postgres&gt; cat /usr/local/pgsql/data/backup_label
START WAL LOCATION: 0/9000028 (file 000000010000000000000009)
CHECKPOINT LOCATION: 0/9000060
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-1-26 11:45:19 GMT
LABEL: Weekly Backup

As you may imagine, when you recover a database using this base backup, PostgreSQL takes the “CHECKPOINT LOCATION” out of the backup_label file to read the checkpoint record from the appropriate archive log, and then, gets the REDO point from its record and starts recovery process. (The details will be described in the next section.)

pg_stop_backup

The pg_stop_backup performs the following five operations to complete the backup.

Reset to non-full-page writes mode if it has been forcibly changed by the pg_start_backup.
Write a XLOG record of backup end.
Switch the WAL segment file.
Create a backup history file – This file contains the contents of the backup_label file and the timestamp that the pg_stop_backup has been executed.
Delete the backup_label file – The backup_label file is required for recovery from the base backup and once copied, it is not necessary in the original database cluster.

How Point-in-Time recovery works

Figure 2 shows the basic concept of PITR. PostgreSQL in PITR-mode replays the WAL data of the archive logs on the base backup, from the REDO point created by the pg_start_backup up to the point you want to recover. In PostgreSQL, the point to be recovered is referred to as a recovery target.

Figure 2: Basic concept of PITR

Here is the description of how PITR works. Suppose that you made a mistake at 12:05 GMT of 26 January, 2016. You should remove the database cluster and restore the new one using the base backup you made before that. Then, create a recovery.conf file, and set the time of the parameter recovery_target_time within this file at the point you made the mistake (in this case, 12:05 GMT). The recovery.conf file is shown below:

# Place archive logs under /mnt/server/archivedir directory.
restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = "2016-1-26 12:05 GMT"

When PostgreSQL starts up, it enters into PITR mode if there are a recovery.conf and a backup_label in the database cluster.

PITR process is almost the same as the normal recovery process described in my document; the only differences between them are the following two points:

Where are WAL segments/Archive logs read from?
- Normal recovery mode – from the pg_xlog subdirectory under the base directory.
- PITR mode – from an archival directory set in the configuration parameter archive_command.
Where is the checkpoint location read from?
- Normal recovery mode – from a pg_control file.
- PITR mode – from a backup_label file.

The outline of PITR process is described as follows:

In order to find the REDO point, PostgreSQL reads the value of “CHECKPOINT LOCATION” from the backup_label file with the internal function read_backup_label.
PostgreSQL reads some values of parameters from the recovery.conf; in this example, restore_command and recovery_target_time.
PostgreSQL starts replaying WAL data from the REDO point, which can be easily obtained from the value of “CHECKPOINT LOCATION”.
The WAL data are read from archive logs which are copied from the archival area to a temporary area by executing the command written in the parameter resotere_command. (The copied log files in the temporary area are removed after using.)
In this example, PostgreSQL reads and replays WAL data from the REDO point to the one before the timestamp “2016-1-26 12:05:00” because the parameter recovery_target_time is set to this timestamp. If a recovery target is not set to the recovery.conf, PostgreSQL will replay until end of archiving logs.
When the recovery process completes, a timeline history file, such as “00000002.history”, is created in the pg_xlog subdirectory; if archiving log feature is enabled, same named file is also created in the archival directory. (The contents and role of this file will be posted on another day.)

The records of commit and abort actions contain the timestamp at which each action has done (XLOG data portion of both actions are defined in xl_xact_commit and xl_xact_abort respectively). Therefore, if a target time is set to the parameter recovery_target_time, PostgreSQL may select whether to continue recovery or not, whenever it replays XLOG record of either commit or abort action. When XLOG record of each action is replayed, PostgreSQL compares the target time and each timestamp written in the record; and if the timestamp exceed the target time, PITR process will be finished.

Why can we use common archiving tools to make a base backup?

Recovery process is a process to make a database cluster in a consistent state, though the cluster is inconsistent. As PITR is based on the recovery process, it can recover the database cluster even if a base backup is a bunch of inconsistent files. This is the reason why we can use common archiving tools without a file system snapshot capability or a special tool.