3.5.2. Merge Join

Unlike the nested loop join, the merge join is restricted to natural joins and equi-joins.

The cost of a merge join is estimated by the initial_cost_mergejoin() and final_cost_mergejoin() functions.

Since the exact cost estimation is complex, this section focuses only on the runtime order of the merge join algorithm.

Similar to the nested loop join, the merge join in PostgreSQL includes four variations.

3.5.2.1. Merge Join

Figure 3.23 provides a conceptual illustration of a merge join.

Figure 3.23. Merge join.

If all tuples fit in memory, the sorting operations are performed entirely in memory. Otherwise, the executor falls back to using temporary files on disk.

A specific example of the EXPLAIN result for a merge join is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
testdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND b.id < 1000;
                               QUERY PLAN
-------------------------------------------------------------------------
 Merge Join  (cost=944.71..984.71 rows=1000 width=16)
   Merge Cond: (a.id = b.id)
   ->  Sort  (cost=809.39..834.39 rows=10000 width=8)
         Sort Key: a.id
         ->  Seq Scan on tbl_a a  (cost=0.00..145.00 rows=10000 width=8)
   ->  Sort  (cost=135.33..137.83 rows=1000 width=8)
         Sort Key: b.id
         ->  Seq Scan on tbl_b b  (cost=0.00..85.50 rows=1000 width=8)
               Filter: (id < 1000)
(9 rows)
  • Line 9: The executor sorts the inner table tbl_b, which is retrieved via sequential scan (Line 11).
  • Line 6: The executor sorts the outer table tbl_a, which is retrieved via sequential scan (Line 8).
  • Line 4: The executor carries out the merge join operation using the sorted tbl_a as the outer table and the sorted tbl_b as the inner table.

3.5.2.2. Materialized Merge Join

Similar to the nested loop join, the merge join supports a materialized variation. This approach materializes the inner table to make the subsequent inner table scans more efficient.

Figure 3.24. Materialized merge join.

The following example shows the EXPLAIN result for a materialized merge join. The primary difference from the standard merge join is the addition of the Materialize node in Line 9.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
testdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Merge Join  (cost=10466.08..10578.58 rows=5000 width=2064)
   Merge Cond: (a.id = b.id)
   ->  Sort  (cost=6708.39..6733.39 rows=10000 width=1032)
         Sort Key: a.id
         ->  Seq Scan on tbl_a a  (cost=0.00..1529.00 rows=10000 width=1032)
   ->  Materialize  (cost=3757.69..3782.69 rows=5000 width=1032)
         ->  Sort  (cost=3757.69..3770.19 rows=5000 width=1032)
               Sort Key: b.id
               ->  Seq Scan on tbl_b b  (cost=0.00..1193.00 rows=5000 width=1032)
(9 rows)
  • Line 10: The executor sorts the inner table tbl_b, which is retrieved via sequential scan (Line 12).
  • Line 9: The executor creates a materialized representation of the sorted tbl_b.
  • Line 6: The executor sorts the outer table tbl_a, which is retrieved via sequential scan (Line 8).
  • Line 4: The executor carries out the merge join operation using the sorted *tbl_a( as the outer table and the materialized sorted tbl_b as the inner table.

3.5.2.3. Other Variations

Similar to the nested loop join, the merge join in PostgreSQL includes variations where an index scan is used for the outer table.

Figure 3.25. The three variations of the merge join with an outer index scan.

The EXPLAIN results for these join variations are shown below:

(a) Merge join with outer index scan

testdb=# SET enable_hashjoin TO off;
SET
testdb=# SET enable_nestloop TO off;
SET
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_b AS b WHERE c.id = b.id AND b.id < 1000;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Merge Join  (cost=135.61..322.11 rows=1000 width=16)
   Merge Cond: (c.id = b.id)
   ->  Index Scan using tbl_c_pkey on tbl_c c  (cost=0.29..318.29 rows=10000 width=8)
   ->  Sort  (cost=135.33..137.83 rows=1000 width=8)
         Sort Key: b.id
         ->  Seq Scan on tbl_b b  (cost=0.00..85.50 rows=1000 width=8)
               Filter: (id < 1000)
(7 rows)

(b) Materialized merge join with outer index scan

testdb=# SET enable_hashjoin TO off;
SET
testdb=# SET enable_nestloop TO off;
SET
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_b AS b WHERE c.id = b.id AND b.id < 4500;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Merge Join  (cost=421.84..672.09 rows=4500 width=16)
   Merge Cond: (c.id = b.id)
   ->  Index Scan using tbl_c_pkey on tbl_c c  (cost=0.29..318.29 rows=10000 width=8)
   ->  Materialize  (cost=421.55..444.05 rows=4500 width=8)
         ->  Sort  (cost=421.55..432.80 rows=4500 width=8)
               Sort Key: b.id
               ->  Seq Scan on tbl_b b  (cost=0.00..85.50 rows=4500 width=8)
                     Filter: (id < 4500)
(8 rows)

(c) Indexed merge join with outer index scan

testdb=# SET enable_hashjoin TO off;
SET
testdb=# SET enable_nestloop TO off;
SET
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_d AS d WHERE c.id = d.id AND d.id < 1000;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Merge Join  (cost=0.57..226.07 rows=1000 width=16)
   Merge Cond: (c.id = d.id)
   ->  Index Scan using tbl_c_pkey on tbl_c c  (cost=0.29..318.29 rows=10000 width=8)
   ->  Index Scan using tbl_d_pkey on tbl_d d  (cost=0.28..41.78 rows=1000 width=8)
         Index Cond: (id < 1000)
(5 rows)