12.8. Internal Mechanism of Restart and Crash Recovery

Alpha Version: Work in progress.

This section provides an overview of Write-Ahead Log (WAL) data management in logical replication. Based on these fundamentals, it then details the sequences for subscriber restarts and crash recovery.

12.8.1. WAL Data Management

Replication progress is managed based on the Log Sequence Number (LSN) of WAL data.

Unlike streaming replication, where the primary and standby share the exact same WAL space, logical replication requires a mapping between the publisher’s and subscriber’s WAL spaces, as each utilizes its own independent LSN.

In the PostgreSQL implementation, the subscriber is responsible for this mapping. While the publisher consistently uses its own LSNs, the subscriber maintains a record of the correspondence between its local LSNs and the remote LSNs received from the publisher.

12.8.1.1. LSN Management Mechanisms for Publisher and Subscriber

Publisher

The publisher manages the confirmed_flush_lsn within its replication slot:

  • confirmed_flush_lsn: The LSN up to which the logical slot’s consumer has confirmed data reception. Transactions committed prior to this LSN are no longer available for replication.
  • Storage: Replication slot information is held in memory and typically persisted to storage during each checkpoint.
Subscriber

As mentioned in Section 12.7.1.2, WAL records generated by COMMIT (and ABORT) statements on the subscriber include metadata from the publisher: the origin_id, the publisher’s commit LSN (final_lsn), and the commit_timestamp.

Furthermore, the subscriber maintains the most recent commit LSN mapping in memory:

  • local_lsn: The LSN of the subscriber’s own commit.
  • remote_lsn: The publisher’s final_lsn corresponding to that commit.
  • Visibility: These values can be monitored via the pg_replication_origin_status system view.

Because pg_replication_origin_status is a system view, its state is stored in the \$PGDATA/pg_logical/replorigin_checkpoint file at every checkpoint.

Note on Crash Safety: If the subscriber crashes unexpectedly, the most recent mapping in the replorigin_checkpoint file may be lost. However, during the subsequent recovery process, the latest mapping is automatically reconstructed. The details will be described in Section 12.8.3.

12.8.1.2. LSN Data Flow in a Normal Sequence

The following example illustrates how LSNs are managed when a publisher executes an INSERT command.

While this figure illustrates the origin_id and origin_lsn within the COMMIT WAL record written by the subscriber, the precise structure is as follows:
In the COMMIT (or ABORT) WAL record, the origin_id is included in the header portion, whereas the origin_commit_lsn (or origin_abort_lsn) and origin_commit_timestamp (or origin_abort_timestamp) are appended to the extended section.
Figure 12.30. LSN Mapping Flow during a Normal Transaction.

While this figure illustrates the origin_id and origin_lsn within the COMMIT WAL record written by the subscriber, the precise structure is as follows:
In the COMMIT (or ABORT) WAL record, the origin_id is included in the header portion, whereas the origin_commit_lsn (or origin_abort_lsn) and origin_commit_timestamp (or origin_abort_timestamp) are appended to the extended section.

Figure 12.30 [1]

The confirmed_flush_lsn of the replication slot reflects $ \text{LSN}^{P}_{0}$, confirming that the subscriber has successfully applied and flushed changes from the previous transaction.

The publisher then executes an INSERT (txid=100), where the commit WAL record starts at $\text{LSN}^{P}_{1}$ and ends at $\text{LSN}^{P}_{2}$.

The pgoutput plugin generates messages containing these LSNs:

  • B (Begin): final_lsn = $\text{LSN}^{P}_{1}$
  • C (Commit): commit_lsn = $\text{LSN}^{P}_{1}$, end_lsn = $\text{LSN}^{P}_{2}$
Figure 12.30 [2]

The subscriber’s memory initially holds local_lsn = $ \text{LSN}^{S}_{0}$ and remote_lsn = $\text{LSN}^{P}_{0}$, consistent with the replorigin_checkpoint file.

Once the apply worker applies the changes, a commit record is written to the subscriber’s WAL at $\text{LSN}^{S}_{1}$. This record includes the origin_id and the publisher’s final_lsn ($\text{LSN}^{P}_{1}$). The memory state is then updated to local_lsn = $\text{LSN}^{S}_{1}$ and remote_lsn = $\text{LSN}^{P}_{1}$.

Note that the replorigin_checkpoint file remains unchanged until the next checkpoint.

Figure 12.30 [3]

Upon transaction completion, the subscriber sends an ACK containing write_lsn, flush_lsn, and apply_lsn.

In this scenario, all are set to $\text{LSN}^{P}_{1}$. It is critical to note that the subscriber is returning LSNs relative to the publisher’s WAL space.

The publisher then updates its replication slot’s confirmed_flush_lsn to $\text{LSN}^{P}_{1}$ based on this ACK.

12.8.2. Restart Sequence

Once logical replication is configured, the publisher and subscriber begin the replication process. If replication stops—for example, due to a restart of either party—it resumes automatically.

Figure 12.31. Logical Replication Restart Sequence.
  1. Read Checkpoint: The subscriber reads the replorigin_checkpoint file to initialize local_lsn and remote_lsn.
  2. Launch Worker: The logical replication launcher starts an apply worker.
  3. Connection Request: The apply worker requests a connection from the publisher.
  4. Spawn Walsender: The publisher’s postmaster spawns a walsender process.
  5. Establish Connection: The apply worker connects to the walsender.
  6. Negotiate LSN: The apply worker and walsender negotiate the starting point using the remote_lsn. The subscriber sends the remote_lsn, and the walsender begins decoding WAL from that publisher-side position.
  7. Resume Replication: The process resumes from the specified LSN.

12.8.3. Recovery Sequence

The recovery sequence describes the process following a subscriber crash.

Unlike a standard restart, the local_lsn and remote_lsn stored in the replorigin_checkpoint file may be obsolete, reflecting only the state as of the last checkpoint. Therefore, the subscriber must scan its WAL data to reconstruct the latest mapping before connecting to the publisher.

Figure 12.32. Logical Replication Recovery Sequence.
[1] Scanning WAL Segments

During standard crash recovery (redo replay), the system scans the WAL segments and extracts the origin_id and origin_lsn from the publisher-originated commit records.

[2] Restoring origin status

After recovery is complete, the most recent origin_lsn (the publisher’s commit point) and its corresponding subscriber commit LSN are restored to memory as the new remote_lsn and local_lsn, respectively. The replorigin_checkpoint file is also recovered.

From this point, the system proceeds with the standard restart sequence to reconnect with the publisher.