5.9. Serializable Snapshot Isolation

PostgreSQL has embedded Serializable Snapshot Isolation (SSI) since version 9.1 to realize a true SERIALIZABLE isolation level.

Since the explanation of SSI is complex, this section provides only an outline. For details, see the original paper: “Serializable Snapshot Isolation in PostgreSQL”

In the following, several technical terms are used without definitions. Please refer to the references for these terms:

precedence graph (also known as dependency graph and serialization graph)
serialization anomalies (e.g. Write-Skew)

References

Abraham Silberschatz, Henry F. Korth, and S. Sudarshan, “Database System Concepts”, McGraw-Hill Education, ISBN-13: 978-0073523323
Thomas M. Connolly, and Carolyn E. Begg, “Database Systems”, Pearson, ISBN-13: 978-0321523068

Section Contents

5.9.1. Basic Strategy for SSI Implementation

A serialization anomaly occurs if a cycle is present in the precedence graph. The simplest anomaly, write-skew, demonstrates this.

Figure 5.12(1) shows a schedule. Here, Transaction_A reads Tuple_B and Transaction_B reads Tuple_A. Then, Transaction_A writes Tuple_A and Transaction_B writes Tuple_B. In this case, two rw-conflicts exist. These conflicts form a cycle in the precedence graph of this schedule, as shown in Figure 5.12(2). Thus, this schedule contains a serialization anomaly, Write-Skew.

Figure 5.12. Write-Skew schedule and its precedence graph.

Conceptually, three types of conflicts exist: wr-conflicts (Dirty Reads), ww-conflicts (Lost Updates), and rw-conflicts. However, PostgreSQL ignores wr- and ww-conflicts because it prevents them as shown in the previous sections. Therefore, the SSI implementation in PostgreSQL only considers rw-conflicts.

PostgreSQL adopts the following strategy for SSI implementation:

Record all objects (tuples, pages, relations) accessed by transactions as SIREAD locks.
Detect rw-conflicts using SIREAD locks whenever any heap or index tuple is written.
Abort the transaction if a serialization anomaly is detected by checking the detected rw-conflicts.

5.9.2. Implementing SSI in PostgreSQL

To realize the strategy described above, PostgreSQL implements various functions and data structures. This section focuses on two primary data structures to describe the SSI mechanism: SIREAD locks and rw-conflicts. These structures are stored in shared memory.

Note

For simplicity, this documentation omits some important data structures, such as SERIALIZABLEXACT. Consequently, the explanations of functions — specifically CheckForSerializableConflictOut(), CheckForSerializableConflictIn(), and PreCommit_CheckForSerializationFailure() — are also highly simplified.

For example, this section indicates which functions detect conflicts but does not explain the detection details. Refer to the source code for detailed information: src/backend/storage/lmgr/predicate.c.

5.9.2.1. SIREAD locks

An SIREAD lock, internally called a predicate lock, is a pair consisting of an object and (virtual) txids. It stores information about which transactions have accessed which objects.

Note that this description omits virtual txids. The term txid is used rather than virtual txid to simplify the following explanation.

The CheckForSerializableConflictOut() function creates SIREAD locks whenever a DML command is executed in SERIALIZABLE mode. For example, if txid 100 reads Tuple_1 of a table, an SIREAD lock {Tuple_1, {100}} is created. If txid 101 also reads Tuple_1, the SIREAD lock is updated to {Tuple_1, {100, 101}}.

An SIREAD lock is also created when an index page is read. This occurs during Index-Only Scans (described in Section 7.2), where the index is read without accessing the table page.

Lock Levels and Lock Aggregation:

SIREAD locks have three levels: tuple, page, and relation.

PostgreSQL aggregates SIREAD locks to reduce memory space. If SIREAD locks are created for all tuples within a single page, they are merged into a single page-level SIREAD lock, and the individual tuple-level locks are released (refer to Figure 5.13). The same logic applies when all pages in a relation are read.

Figure 5.13. Example of SIREAD Lock Aggregation.
Transaction Tx reads tuple_1 and tuple_2 in Page_1, creating two tuple-level SIREAD locks. When Tx subsequently reads tuple_3, completing the scan of Page_1, PostgreSQL replaces the individual tuple-level locks with a single page-level SIREAD lock.

When using a sequential scan, PostgreSQL creates a relation-level SIREAD lock from the beginning, regardless of indexes or WHERE clauses. In certain situations, this implementation can cause false-positive detections of serialization anomalies. Details are provided in Section 5.9.4.

5.9.2.2. rw-conflicts

A rw-conflict is a triplet consisting of an SIREAD lock and two txids: one that reads and one that writes the object associated with the SIREAD lock.

The CheckForSerializableConflictIn() function is invoked whenever an INSERT, UPDATE, or DELETE command is executed in SERIALIZABLE mode. This function creates rw-conflicts when it detects a conflict by checking existing SIREAD locks.

For example, txid 100 reads Tuple_1, and then txid 101 updates Tuple_1. In this case, CheckForSerializableConflictIn(), invoked by the UPDATE command in txid 101, detects a rw-conflict on Tuple_1 between txid 100 and 101. It then creates a rw-conflict {r=100, w=101, {Tuple_1}}.

5.9.2.3. Conflict Detection and First-Committer-Win

Both the CheckForSerializableConflictOut() and CheckForSerializableConflictIn() functions, as well as the PreCommit_CheckForSerializationFailure() function, which is invoked when the COMMIT command is executed in SERIALIZABLE mode, check serialization anomalies using the created rw-conflicts. If they detect anomalies, only the first-committed transaction is committed and the other transactions are aborted by the first-committer-win scheme.

5.9.3. How SSI Performs

This section describes how SSI resolves Write-Skew anomalies using the simple table tbl shown below:

testdb=# CREATE TABLE tbl (id INT primary key, flag bool DEFAULT false);
testdb=# INSERT INTO tbl (id) SELECT generate_series(1,2000);
testdb=# ANALYZE tbl;

Transactions Tx_A and Tx_B execute the commands shown in Figure 5.14.

Assume all commands use index scans. When executed, these commands read both heap tuples and index pages. Each index page contains the index tuple that points to the corresponding heap tuple (Figure 5.15).

Figure 5.15. Relationship between the index and table in the scenario shown in Figure 5.14.

T1: Tx_A executes a SELECT command. This command reads a heap tuple (Tuple_2000) and one page of the primary key (Pkey_2).
T2: Tx_B executes a SELECT command. This command reads a heap tuple (Tuple_1) and one page of the primary key (Pkey_1).
T3: Tx_A executes an UPDATE command to update Tuple_1.
T4: Tx_B executes an UPDATE command to update Tuple_2000.
T5: Tx_A commits.
T6: Tx_B attempts to commit; however, it aborts due to a Write-Skew anomaly.

Figure 5.16 shows how PostgreSQL detects and resolves the Write-Skew anomaly in this scenario.

Figure 5.16. SIREAD locks and rw-conflicts, and schedule of the scenario shown in Figure 5.14.

T1:
During the execution of Tx_A’s SELECT command, CheckForSerializableConflictOut() creates SIREAD locks. In this scenario, the function creates two SIREAD locks, L1 and L2, associated with Pkey_2 and Tuple_2000, respectively.
T2:
During the execution of Tx_B’s SELECT command, CheckForSerializableConflictOut() creates two SIREAD locks, L3 and L4, associated with Pkey_1 and Tuple_1, respectively.
T3:
When Tx_A executes its UPDATE command, the system invokes both CheckForSerializableConflictOut() and CheckForSerializableConflictIn() before and after ExecUpdate.

In this scenario, CheckForSerializableConflictOut() does nothing.

CheckForSerializableConflictIn() creates an rw-conflict, C1, which involves both Pkey_1 and Tuple_1 between Tx_B and Tx_A. This occurs because both Pkey_1 and Tuple_1 were read by Tx_B and subsequently written by Tx_A.
T4:
When Tx_B executes its UPDATE command, CheckForSerializableConflictIn() creates an rw-conflict, C2, which involves both Pkey_2 and Tuple_2000 between Tx_A and Tx_B.

In this scenario, C1 and C2 form a cycle in the precedence graph, placing Tx_A and Tx_B in a non-serializable state.

However, because neither transaction has committed yet, CheckForSerializableConflictIn() does not abort Tx_B. This behavior occurs because PostgreSQL’s SSI implementation is based on the first-committer-win scheme.
T5:
When Tx_A attempts to commit, PreCommit_CheckForSerializationFailure() is invoked. This function detects serialization anomalies and executes a commit action if possible.

In this scenario, Tx_A commits because Tx_B is still in progress.
T6:
When Tx_B attempts to commit, PreCommit_CheckForSerializationFailure() detects a serialization anomaly. Since Tx_A has already committed, Tx_B aborts.

5.9.3.1. Other Scenarios

If Tx_B executes the UPDATE command after Tx_A has committed (after T5), Tx_B aborts immediately. This occurs because CheckForSerializableConflictIn(), invoked by Tx_B’s UPDATE command, detects a serialization anomaly (Figure 5.17(1)).

If Tx_B executes a SELECT command instead of COMMIT at T6, Tx_B aborts immediately. This occurs because CheckForSerializableConflictOut(), invoked by Tx_B’s SELECT command, detects a serialization anomaly (Figure 5.17(2)).

Figure 5.17. Other Write-Skew scenarios.

Info

This Wiki details several more complex anomalies.

5.9.4. False-Positive Serialization Anomalies

In SERIALIZABLE mode, PostgreSQL always fully guarantees the serializability of concurrent transactions because false-negative serialization anomalies never occur.

However, PostgreSQL may detect false-positive anomalies under some circumstances. Users should consider this behavior when using SERIALIZABLE mode.

The following describes situations where PostgreSQL detects false-positive anomalies.

5.9.4.1. False-Positive Scenario 1.

Figure 5.18 shows a scenario where a false-positive serialization anomaly occurs.

Figure 5.18. Scenario where false-positive serialization anomaly occurs.

As mentioned in the explanation of SIREAD locks, PostgreSQL creates a relation-level SIREAD lock when using a sequential scan.

Figure 5.19(1) shows the SIREAD locks and rw-conflicts during a sequential scan.

In this case, rw-conflicts C1 and C2 are associated with the relation-level SIREAD lock of ’tbl’. These conflicts create a cycle in the precedence graph.

Consequently, PostgreSQL detects a false-positive Write-Skew anomaly and aborts either Tx_A or Tx_B even though no actual conflict exists.

Figure 5.19. False-positive anomaly (1) - Using sequential scan.

5.9.4.2. False-Positive Scenario 2.

PostgreSQL also detects a false-positive anomaly during an index scan if both transactions Tx_A and Tx_B acquire the same index-page SIREAD lock. Figure 5.20 illustrates this situation.

Figure 5.20. False-positive anomaly (2) - Index scan using the same index page.

Assume that index page Pkey_1 contains two index items: one pointing to Tuple_1 and the other pointing to Tuple_2.

When Tx_A and Tx_B execute their respective SELECT and UPDATE commands, both transactions read and write Pkey_1.

In this case, rw-conflicts C1 and C2, both associated with Pkey_1, create a cycle in the precedence graph. Thus, PostgreSQL detects a false-positive Write-Skew anomaly.

(If Tx_A and Tx_B acquire SIREAD locks for different index pages, no false-positive is detected and both transactions can commit.)

5.9. Serializable Snapshot Isolation

5.9.1. Basic Strategy for SSI Implementation

Figure 5.12. Write-Skew schedule and its precedence graph.

5.9.2. Implementing SSI in PostgreSQL

5.9.2.1. SIREAD locks

Lock Levels and Lock Aggregation:

Figure 5.13. Example of SIREAD Lock Aggregation.

5.9.2.2. rw-conflicts

5.9.2.3. Conflict Detection and First-Committer-Win

5.9.3. How SSI Performs

Figure 5.14. Write-Skew scenario.

Figure 5.15. Relationship between the index and table in the scenario shown in Figure 5.14.

Figure 5.16. SIREAD locks and rw-conflicts, and schedule of the scenario shown in Figure 5.14.

5.9.3.1. Other Scenarios

Figure 5.17. Other Write-Skew scenarios.

5.9.4. False-Positive Serialization Anomalies

5.9.4.1. False-Positive Scenario 1.

Figure 5.18. Scenario where false-positive serialization anomaly occurs.

Figure 5.19. False-positive anomaly (1) - Using sequential scan.

5.9.4.2. False-Positive Scenario 2.

Figure 5.20. False-positive anomaly (2) - Index scan using the same index page.