11.2. How to Conduct Streaming Replication

Streaming replication has two aspects: log shipping and database synchronization. Log shipping is the main aspect of streaming replication, as the primary server sends WAL (Write-Ahead Log) data to the connected standby servers whenever they are written. Database synchronization is required for synchronous replication, where the primary server communicates with each standby server to synchronize their database clusters.

To accurately understand how streaming replication works, we need to understand how one primary server manages multiple standby servers. We will start with the simple case (i.e., single-primary single-standby system) in this section, and then discuss the general case (single-primary multi-standby system) in the next section.

11.2.1. Communication Between a Primary and a Synchronous Standby

Assume that the standby server is in the synchronous replication mode, but the configuration parameter hot_standby is disabled and wal_level is ‘replica’. The main parameter of the primary server is shown below:

synchronous_standby_names = 'standby1'
hot_standby = off
wal_level = replica

Additionally, among the three triggers to write the WAL data mentioned in Section 9.5, we focus on the transaction commits here.

Suppose that one backend process on the primary server issues a simple INSERT statement in autocommit mode. The backend starts a transaction, issues an INSERT statement, and then commits the transaction immediately. Let’s explore further how this commit action will be completed. See the following sequence diagram in Fig. 11.2:

Fig. 11.2. Streaming Replication's communication sequence diagram.
  • (1) The backend process writes and flushes WAL data to a WAL segment file by executing the functions XLogInsert() and XLogFlush().

  • (2) The walsender process sends the WAL data written into the WAL segment to the walreceiver process.

  • (3) After sending the WAL data, the backend process continues to wait for an ACK response from the standby server. More precisely, the backend process gets a latch by executing the internal function SyncRepWaitForLSN(), and waits for it to be released.

  • (4) The walreceiver on the standby server writes the received WAL data into the standby’s WAL segment using the write() system call, and returns an ACK response to the walsender.

  • (5) The walreceiver flushes the WAL data to the WAL segment using the system call such as fsync(), returns another ACK response to the walsender, and informs the startup process about WAL data updated.

  • (6) The startup process replays the WAL data, which has been written to the WAL segment.

  • (7) The walsender releases the latch of the backend process on receiving the ACK response from the walreceiver, and then, the backend process’s commit or abort action will be completed. The timing for latch-release depends on the parameter synchronous_commit.
    It is ‘on’ (default), the latch is released when the ACK of step (5) received, whereas it is ‘remote_write’, the latch is released when the ACK of step (4) is received.

Each ACK response informs the primary server of the internal information of standby server. It contains four items below:

  • The LSN location where the latest WAL data has been written.
  • The LSN location where the latest WAL data has been flushed.
  • The LSN location where the latest WAL data has been replayed in the startup process.
  • The timestamp when this response has be sent.
XLogWalRcvSendReply(void)@src/backend/replication/walreceiver.c
	/* Construct a new message */
	writePtr = LogstreamResult.Write;
	flushPtr = LogstreamResult.Flush;
	applyPtr = GetXLogReplayRecPtr(NULL);

	resetStringInfo(&reply_message);
	pq_sendbyte(&reply_message, 'r');
	pq_sendint64(&reply_message, writePtr);
	pq_sendint64(&reply_message, flushPtr);
	pq_sendint64(&reply_message, applyPtr);
	pq_sendint64(&reply_message, GetCurrentTimestamp());
	pq_sendbyte(&reply_message, requestReply ? 1 : 0);

The walreceiver returns ACK responses not only when WAL data have been written and flushed, but also periodically as a heartbeat from the standby server. The primary server therefore always has an accurate understanding of the status of all connected standby servers.

The LSN-related information of the connected standby servers can be displayed by issuing the queries shown below:

testdb=# SELECT application_name AS host,
        write_location AS write_LSN, flush_location AS flush_LSN, 
        replay_location AS replay_LSN FROM pg_stat_replication;

   host   | write_lsn | flush_lsn | replay_lsn 
----------+-----------+-----------+------------
 standby1 | 0/5000280 | 0/5000280 | 0/5000280
 standby2 | 0/5000280 | 0/5000280 | 0/5000280
(2 rows)
Info

The heartbeat interval is set to the parameter wal_receiver_status_interval, which is 10 seconds by default.

11.2.2. Behavior When a Failure Occurs

In this subsection, I describe how the primary server behaves when a synchronous standby server fails, and how to deal with the situation.

Even if a synchronous standby server fails and is no longer able to return an ACK response, the primary server will continue to wait for responses forever. This means that running transactions cannot commit and subsequent query processing cannot be started. In other words, all primary server operations are effectively stopped. (Streaming replication does not support a function to automatically revert to asynchronous mode after a timeout.)

There are two ways to avoid such situation. One is to use multiple standby servers to increase system availability. The other is to manually switch from synchronous to asynchronous mode by performing the following steps:

  • (1) Set the parameter synchronous_standby_names to an empty string.

    synchronous_standby_names = ''

  • (2) Execute the pg_ctl command with reload option.

    postgres> pg_ctl -D $PGDATA reload

This procedure does not affect connected clients. The primary server will continue to process transactions and all sessions between clients and their respective backend processes will be maintained.