How streaming replication works

Posted on Jan 21, 2016

#postgres #replication

This post is a part of my document.

I briefly explain that how streaming replication works in the following.

This replication feature is based on the log shipping, one of the general replication techniques, in which a primary server continues to send WAL data and then, each standby server replays the received data immediately.

Communication between a primary and a synchronous standby

First, I will explain how the primary manages a single synchronous standby.

In Streaming Replication, three kinds of processes work cooperatively. A walsender process on the primary server sends WAL data to standby server; and then, a walreceiver and a startup processes on standby server receives and replays these data. A walsender and a walreceiver communicate using a single TCP connection.

Suppose that one backend process on the primary server issues a simple INSERT statement in autocommit mode. The backend starts a transaction, issues an INSERT statement, and then commits the transaction immediately. Let’s explore how this commit action. See the following sequence diagram in Figure 1:

Figure 1: Streaming Replication’s communication sequence diagram

The backend process writes and flushes WAL data to a WAL segment file by executing the functions XLogInsert() and XLogFlush().
The walsender process sends the WAL data written into the WAL segment to the walreceiver process.
After sending the WAL data, the backend process continues to wait for an ACK response from the standby server. More precisely, the backend process gets a latch by executing the internal function SyncRepWaitForLSN(), and waits for it to be released.
The walreceiver on standby server writes the received WAL data into the standby’s WAL segment using the write() system call, and returns an ACK response to the walsender.
The walreceiver flushes the WAL data to the WAL segment using the system call such as fsync(), returns another ACK response to the walsender, and informs the startup process about WAL data updated.
The startup process replays the WAL data, which has been written to the WAL segment.
The walsender releases the latch of the backend process on receiving the ACK response from the walreceiver, and then, backend process’s commit or abort action will be completed. The timing for latch-release depends on the parameter synchronous_commit. It is ‘on’ (default), the latch is released when the ACK of 5. received, whereas it is ‘remote_write’, when the ACK of 4 received.

Each ACK response informs the primary server of the internal information of standby server. It contains four items below:

LSN location where the latest WAL data has been written.
LSN location where the latest WAL data has been flushed.
LSN location where the latest WAL data has been replayed in the startup process.
The timestamp when this response has be sent.

Walreceiver returns ACK responses not only when WAL data have been written and flushed, but also periodically as the heartbeat of standby server. The primary server therefore always grasps the status of all connected standby servers.

By issuing the queries as shown below, the LSN related information of the connected standby servers can be displayed.

sampledb=# SELECT application_name AS host,
sampledb-#        write_location AS write_LSN, flush_location AS flush_LSN, 
sampledb-#        replay_location AS replay_LSN FROM pg_stat_replication;

   host   | write_lsn | flush_lsn | replay_lsn 
----------+-----------+-----------+------------
 standby1 | 0/5000280 | 0/5000280 | 0/5000280
 standby2 | 0/5000280 | 0/5000280 | 0/5000280
(2 rows)

The heartbeat’s interval is set to the parameter wal_receiver_status_interval, which is 10 seconds by default.

How the primary manages multiple standbys

Next, I will explain how the primary manages multiple standbys. To simplify the explanation, I suppose that there are two standbys: one synchronous standby and one potential standby.

The primary server waits for ACK responses from the synchronous standby server alone. In other words, the primary server confirms only synchronous standby’s writing and flushing of WAL data. Streaming replication, therefore, ensures that only synchronous standby is in the consistent and synchronous state with the primary.

Figure 2 shows the case in which the ACK response of potential standby has been returned earlier than that of the primary standby. There, the primary server does not complete the commit action of the current transaction, and continues to wait for the primary’s ACK response. And then, when the primary’s response is received, the backend process releases the latch and completes the current transaction processing.

Figure 2: Managing multiple standby servers

Below is the detailed description of the Figure 2:

In spite of receiving an ACK response from the potential standby server, the primary’s backend process continues to wait for an ACK response from the synchronous-standby server.
The primary’s backend process releases the latch, completes the current transaction processing.

In the opposite case (i.e. the primary’s ACK response has been returned earlier than the potential’s one), the primary server immediately completes the commit action of the current transaction without ensuring if the potential standby writes and flushes WAL data or not.

If you want to know more details about Streaming Replication, see my document.