4.2. How the Postgres_fdw Extension Performs
The postgres_fdw extension is a specialized module officially maintained by the PostgreSQL Global Development Group. Its source code is integrated directly into the PostgreSQL source code tree.
postgres_fdw is subject to continuous improvement. Table 4.1 summarizes the release notes related to postgres_fdw from the official documentation.
| Version | Description | 9.3 |
|
|---|---|
| 9.6 |
|
| 10 |
|
| 11 |
|
| 12 |
|
| 14 |
|
| 15 |
|
| 16 |
|
| 17 |
|
While the previous section detailed how postgres_fdw processes single-table queries, the following subsections describe the processing of multi-table queries, sort operations, and aggregate functions.
Although this section focuses primarily on SELECT statements, postgres_fdw also supports other DML operations, including INSERT, UPDATE, and DELETE.
Note: PostgreSQL’s FDW does not detect deadlock
The postgres_fdw extension and the underlying FDW framework do not support for distributed lock management and distributed deadlock detection. Consequently, cross-server deadlocks can occur.
For instance, consider a scenario where Client A updates a local table tbl_local followed by a foreign table tbl_remote, while Client B updates tbl_remote followed by tbl_local in the opposite order.
Although these two transactions enter a state of mutual deadlock, PostgreSQL is unable to detect the dependency across the remote boundary. As a result, both transactions remain stalled indefinitely and cannot be committed.
localdb=# -- Client A
localdb=# BEGIN;
BEGIN
localdb=# UPDATE tbl_local SET data = 0 WHERE id = 1;
UPDATE 1
localdb=# UPDATE tbl_remote SET data = 0 WHERE id = 1;
UPDATE 1localdb=# -- Client B
localdb=# BEGIN;
BEGIN
localdb=# UPDATE tbl_remote SET data = 0 WHERE id = 1;
UPDATE 1
localdb=# UPDATE tbl_local SET data = 0 WHERE id = 1;
UPDATE 14.2.1. Multi-Table Query
To execute a multi-table query, postgres_fdw typically fetches each foreign table using a single-table SELECT statement and then performs the join on the local server.
In version 9.5 and earlier, postgres_fdw fetches foreign tables individually even if they are stored on the same remote server. The join operation is always created and executed locally.
In version 9.6 and later, postgres_fdw has been improved to support remote join operations. When the foreign tables reside on the same remote server and the use_remote_estimate option is enabled, the planner can create a join path to be executed directly on the remote server.
The execution details are described below.
4.2.1.1. Versions 9.5 or earlier
The following describes how PostgreSQL processes a query that joins two foreign tables, ’tbl_a’ and ’tbl_b'.
localdb=# SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND a.id < 200;The result of the EXPLAIN command for this query is shown below:
|
|
The output indicates that the executor selects a merge join, which is processed through the following steps:
- Line 8: The executor fetches rows from table tbl_a using a foreign scan.
- Line 6: The executor sorts the fetched rows of tbl_a on the local server.
- Line 11: The executor fetches rows from table tbl_b using a foreign scan.
- Line 9: The executor sorts the fetched rows of tbl_b on the local server.
- Line 4: The executor carries out a merge join operation on the local server.
The sequence of row retrieval is described below (refer to Figure 4.6):
Figure 4.6. Sequence of SQL statements to execute the Multi-Table Query in versions 9.5 or earlier.
-
(5-1) Transaction Start: Initiates a remote transaction.
-
(5-2) Cursor Declaration (c1): Declares cursor c1 with the following SELECT statement:
SELECT id, data FROM public.tbl_a WHERE (id < 200) -
(5-3) Fetching (c1): Executes FETCH commands to retrieve the results of cursor c1.
-
(5-4) Cursor Declaration (c2): Declares cursor c2 with the following SELECT statement:
The following examines the declaration of cursor c2. Although the original query filter is tbl_a.id = tbl_b.id AND tbl_a.id < 200, which logically implies a tbl_b.id < 200 constraint, postgres_fdw in these versions cannot perform this inference. Consequently, the executor executes the statement without a WHERE clause and must retrieve all rows from the foreign table tbl_b.SELECT id, data FROM public.tbl_b
This behavior is inefficient because unnecessary data is transmitted from the remote server over the network. -
(5-5) Fetching (c2): Executes FETCH commands to retrieve the results of cursor c2.
-
(5-6) Cursor Closure (c1): Closes cursor c1.
-
(5-7) Cursor Closure (c2): Closes cursor c2.
-
(5-8) Commitment: Commits the remote transaction.
After the rows are received, the executor sorts the data from both tbl_a and tbl_b and then completes the merge join operation using the sorted results.
4.2.1.2. Versions 9.6 or later
If the use_remote_estimate option is set to ‘on’ (the default is ‘off’), postgres_fdw sends multiple EXPLAIN commands to obtain cost estimates for all plan paths related to the foreign tables.
To achieve this, postgres_fdw issues EXPLAIN commands for each single-table query as well as for SELECT statements that represent potential remote join operations.
In this example, the following seven EXPLAIN commands are sent to the remote server. The planner then utilizes these results to select the cheapest plan.
(1) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200))
(2) EXPLAIN SELECT id, data FROM public.tbl_b
(3) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
(4) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((((SELECT null::integer)::integer) = id)) AND ((id < 200))
(5) EXPLAIN SELECT id, data FROM public.tbl_b ORDER BY id ASC NULLS LAST
(6) EXPLAIN SELECT id, data FROM public.tbl_b WHERE ((((SELECT null::integer)::integer) = id))
(7) EXPLAIN SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200))))Local EXPLAIN output shows the plan selected by the planner.
localdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND a.id < 200;
QUERY PLAN
-----------------------------------------------------------
Foreign Scan (cost=134.35..244.45 rows=80 width=16)
Relations: (public.tbl_a a) INNER JOIN (public.tbl_b b)
(2 rows)The output confirms that the planner selects an inner join to be processed on the remote server, which significantly improves efficiency by reducing data transfer.
The sequence of operations performed by postgres_fdw is described below (refer to Figure 4.7):
Figure 4.7. Sequence of SQL statements to execute the remote-join operation in versions 9.6 or later.
- (3-1) Transaction Start: Initiates a remote transaction.
- (3-2) Cost Estimation: Executes the EXPLAIN commands to estimate the cost of each potential plan path.
In this specific case, seven EXPLAIN commands are executed. The planner then identifies the SELECT query with the lowest cost based on the returned values.
- (5-1) Cursor Declaration (c1): Declares cursor c1 using the following remote-join statement:
SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200)))) - (5-2) Result Retrieval: Receives the joined results from the remote server.
- (5-3) Cursor Closure (c1): Closes cursor c1.
- (5-4) Commitment: Commits the remote transaction.
It should be noted that if the use_remote_estimate option remains ‘off’, a remote-join query is rarely selected. This is because, without remote feedback, the planner estimates the cost of remote joins using a very large embedded constant value, making local joins appear more favorable by comparison.
4.2.2. Sort Operations
4.2.2.1. Versions 9.5 or earlier
In these versions, sort operations — such as those triggered by an ORDER BY clause — are processed exclusively on the local server. Consequently, the local server must retrieve all target rows from the remote server before the sort operation can begin.
The following EXPLAIN output illustrates how a simple query with an ORDER BY clause is processed:
|
|
- Line 6: The executor sends the following query to the remote server and retrieves the results:
SELECT id, data FROM public.tbl_a WHERE ((id < 200)) - Line 4: The executor performs the sort operation on the retrieved rows of tbl_a locally.
4.2.2.2. Versions 9.6 or later
Starting with version 9.6, postgres_fdw can execute SELECT statements with an ORDER BY clause directly on the remote server when possible.
|
|
- Line 4: The executor sends a query containing an ORDER BY clause to the remote server. The retrieved results are already sorted, eliminating the need for local sorting:
SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
This improvement reduces the computational workload on the local server and can significantly decrease overall execution time.
4.2.3. Aggregate Functions
4.2.3.1. Versions 9.6 or earlier
Similar to the sort operations described in the previous subsection, aggregate functions such as AVG() and COUNT() are processed on the local server. The process consists of the following steps:
|
|
- Line 5: The executor retrieves all target rows from the remote server by sending the following query:
SELECT id, data FROM public.tbl_a WHERE ((id < 200)) - Line 4: The executor calculates the average of the retrieved rows on the local server.
This approach is inefficient when dealing with large datasets, as transmitting a high volume of rows consumes significant network bandwidth and increases execution time.
4.2.3.2. Versions 10 or later
Starting with version 10, postgres_fdw can execute SELECT statements containing aggregate functions directly on the remote server when possible.
|
|
- Line 4: The executor sends a query containing the AVG() function to the remote server and retrieves only the final result.
SELECT avg(data) FROM public.tbl_a WHERE ((id < 200))
This process is significantly more efficient because the remote server performs the calculation and transmits only a single row as the result, minimizing network traffic.
Push-down is the process of delegating tasks, such as aggregation or sorting, to the remote server.