10.2. How Point-in-Time Recovery Works

Figure 10.3 shows the basic concept of PITR. In PITR mode, PostgreSQL replays the WAL data of the archive logs on the base backup, from the REDO point created by the pg_backup_start up to the point you want to recover. In PostgreSQL, the point to be recovered is referred to as a recovery target.

Fig. 10.3. Basic concept of PITR.

Here is the description of how PITR works.

Suppose that you made a mistake at 12:05 GMT of 1 January, 2024. You should remove the database cluster and restore the new one using the base backup you made before that.

At first, you need to set the command of the restore_command parameter, and also set the time of the recovery_target_time parameter to the point you made the mistake (in this case, 12:05 GMT) in a postgresql.conf (versions 11 or earlier, recovery.conf).

# Place archive logs under /mnt/server/archivedir directory.
restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = "2024-1-1 12:05 GMT"

When PostgreSQL starts up, it enters into PITR mode if there is a ‘backup_label’ file and a ‘recovery.signal’ file (versions 11 or earlier, ‘recovery.conf’) in the database cluster.

recovery.conf / recovery.signal

The recovery.conf file has been abolished in version 12, and all recovery-related parameters should be written in postgresql.conf. See the official document in detail.

In versions 12 or later, when you restore your server from a basebackup, you need to create an empty file called recovery.signal in the database cluster directory.

$ touch /usr/local/pgsql/data/recovery.signal

The PITR (Point-in-Time Recovery) process is almost the same as the normal recovery process described in Section 9.8. The only two differences are:

  • Where are WAL segments/Archive logs read from?

    • Normal recovery mode – from the pg_wal subdirectory (in versions 9.6 or earlier, pg_xlog subdirectory) under the base directory.
    • PITR mode – from an archival directory set in the archive_command parameter.
  • Where is the checkpoint location read from?

    • Normal recovery mode – from the pg_control file.
    • PITR mode – from a backup_label file.

The outline of PITR process is as follows:

  1. PostgreSQL reads the value of ‘CHECKPOINT LOCATION’ from the backup_label file using the internal function read_backup_label() to find the REDO point.

  2. PostgreSQL reads some values of parameters from the postgresql.conf (versions 11 or earlier, recovery.conf), such as restore_command and recovery_target_time.

  3. PostgreSQL starts replaying WAL data from the REDO point, which can be easily obtained from the value of ‘CHECKPOINT LOCATION’. The WAL data are read from archive logs that are copied from the archival area to a temporary area by executing the command written in the restore_command parameter. (The copied log files in the temporary area are removed after use.)
    In this example, PostgreSQL reads and replays WAL data from the REDO point to the one before the timestamp ‘2024-1-1 12:05:00’ because the recovery_target_time parameter is set to this timestamp. If a recovery target is not set to the postgresql.conf (versions 11 or earlier, recovery.conf), PostgreSQL will replay until the end of the archiving logs.

  4. When the recovery process completes, a timeline history file, such as ‘00000002.history’, is created in the pg_wal subdirectory (in versions 9.6 or earlier, pg_xlog subdirectory).
    If archiving log feature is enabled, the same named file is also created in the archival directory. The contents and role of this file are described in the following sections.

The records of commit and abort actions contain the timestamp at which each action has done (XLOG data portion of both actions are defined in xl_xact_commit and xl_xact_abort respectively).

typedef struct xl_xact_commit
{
        TimestampTz	xact_time;          /* time of commit */
        uint32          xinfo;              /* info flags */
        int            	nrels;              /* number of RelFileNodes */
        int            	nsubxacts;          /* number of subtransaction XIDs */
        int            	nmsgs;              /* number of shared inval msgs */
        Oid            	dbId;               /* MyDatabaseId */
        Oid            	tsId;               /* MyDatabaseTableSpace */
        /* Array of RelFileNode(s) to drop at commit */
        RelFileNode     xnodes[1];          /* VARIABLE LENGTH ARRAY */
        /* ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS */
        /* ARRAY OF SHARED INVALIDATION MESSAGES FOLLOWS */
} xl_xact_commit;
typedef struct xl_xact_abort
{
        TimestampTz     xact_time;          /* time of abort */
        int            	nrels;              /* number of RelFileNodes */
        int             nsubxacts;          /* number of subtransaction XIDs */
        /* Array of RelFileNode(s) to drop at abort */
        RelFileNode     xnodes[1];          /* VARIABLE LENGTH ARRAY */
        /* ARRAY OF ABORTED SUBTRANSACTION XIDs FOLLOWS */
} xl_xact_abort;

Therefore, if a target time is set to the recovery_target_time parameter, PostgreSQL may select whether to continue recovery or not, whenever it replays XLOG record of either commit or abort action. When XLOG record of each action is replayed, PostgreSQL compares the target time and each timestamp written in the record, and if the timestamp exceed the target time, PITR process will be finished.

Info

The function read_backup_label() is defined in src/backend/access/transam/xlog.c.

The structure xl_xact_commit and xl_xact_abort are defined in src/include/access/xact.h.

Why can we use common archiving tools to make a base backup?

The recovery process is a process of restoring a database cluster to a consistent state, even if the cluster is inconsistent. PITR is based on the recovery process, so it can recover a database cluster even if the base backup is a bunch of inconsistent files. This is why we can use common archiving tools without the need for a file system snapshot capability or a special tool.