11.3. Managing Multiple-Standby Servers

In this section, I describe the way streaming replication works with multiple standby servers.

11.3.1. sync_priority and sync_state

The primary server assigns the sync_priority and sync_state attributes to all managed standby servers, and treats each standby server according to its respective values. (The primary server assigns these values even if it manages just one standby server; this was not mentioned in the previous section.)

The sync_priority attribute indicates the priority of the standby server in synchronous mode. The lower the value, the higher the priority. The special value 0 means that the standby server is ‘in asynchronous mode’. The priorities of the standby servers are assigned in the order listed in the primary server’s configuration parameter synchronous_standby_names. For example, in the following configuration, the priorities of standby1 and standby2 are 1 and 2, respectively.

synchronous_standby_names = 'standby1, standby2'

(Standby servers that are not listed in this parameter are in asynchronous mode and have a priority of 0.)

sync_state is the state of the standby server. The sync_state attribute indicates the state of the standby server. It can be one of the following values:

  • sync: The standby server is in synchronous mode and is the highest priority standby server that is currently working.

  • potential: The standby server is in synchronous mode and is a lower priority standby server that is currently working. If the current sync standby server fails, this standby server will be promoted to sync state.

  • async: The standby server is in asynchronous mode. (It will never be in ‘sync’ or ‘potential’ mode.)

The priority and state of the standby servers can be shown by issuing the following query:

testdb=# SELECT application_name AS host, 
         sync_priority, sync_state FROM pg_stat_replication;
   host   | sync_priority | sync_state
----------+---------------+------------
 standby1 |             1 | sync
 standby2 |             2 | potential
(2 rows)

11.3.2. How the Primary Manages Multiple-standbys

The primary server waits for ACK responses from the synchronous standby server alone. In other words, the primary server confirms only the synchronous standby’s writing and flushing of WAL data. Streaming replication, therefore, ensures that only the synchronous standby is in a consistent and synchronous state with the primary.

Figure 11.3 shows the case in which the ACK response of the potential standby has been returned earlier than that of the primary standby. In this case, the primary server does not complete the commit action of the current transaction and continues to wait for the primary’s ACK response. When the primary’s response is received, the backend process releases the latch and completes the current transaction processing.

The sync_state of standby1 and standby2 are ‘sync’ and ‘potential’, respectively. (1) The primary’s backend process continues to wait for an ACK response from the synchronous standby server, even though it has received an ACK response from the potential standby server. (2) After receiving the ACK response from the synchronous standby server, the primary’s backend process releases the latch and completes the current transaction processing.
Fig. 11.3. Managing multiple standby servers.

The sync_state of standby1 and standby2 are ‘sync’ and ‘potential’, respectively.
(1) The primary’s backend process continues to wait for an ACK response from the synchronous standby server, even though it has received an ACK response from the potential standby server.
(2) After receiving the ACK response from the synchronous standby server, the primary’s backend process releases the latch and completes the current transaction processing.

In the opposite case (i.e., the primary’s ACK response has been returned earlier than the potential’s), the primary server immediately completes the commit action of the current transaction without ensuring if the potential standby writes and flushes WAL data or not.

11.3.3. Behavior When a Failure Occurs

Once again, let’s see how the primary server behaves when a standby server has failed.

When either a potential or an asynchronous standby server has failed, the primary server terminates the walsender process connected to the failed standby and continues all processing. In other words, transaction processing on the primary server would not be affected by the failure of either type of standby server.

When a synchronous standby server has failed, the primary server terminates the walsender process connected to the failed standby, and replaces the synchronous standby with the highest priority potential standby. See Fig. 11.4. In contrast to the failure described above, query processing on the primary server will be paused from the point of failure to the replacement of the synchronous standby. (Therefore, failure detection of the standby server is a very important function to increase the availability of the replication system. Failure detection will be described in the next section.)

Fig. 11.4. Replacing of synchronous standby server.

In any case, if one or more standby servers are running in synchronous mode, the primary server keeps only one synchronous standby server at all times, and the synchronous standby server is always in a consistent and synchronous state with the primary.