Citus with BRIN index
In this weekend, I tried Citus. The latest version of Citus is for PostgreSQL 9.5, so it can be used BRIN index.
Refer to the document.
In this time, I used the append mode to create distributed tables.
Refer to document.
$ csql -h localhost -p 9700 -d postgres
csql (9.5.3)
Type "help" for help.
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------
public | github_events | table | postgres
(1 row)
postgres=# \d github_events
Table "public.github_events"
Column | Type | Modifiers
--------------+-----------------------------+-----------
event_id | bigint |
event_type | text |
event_public | boolean |
repo_id | bigint |
payload | jsonb |
repo | jsonb |
actor | jsonb |
org | jsonb |
created_at | timestamp without time zone |
postgres=# SELECT count(*) FROM github_events;
count
-------
39599
(1 row)
At first, let’s create an index of event_id with B-tree and do SELECT statement.
postgres=# CREATE INDEX github_events_event_id_idx ON github_events (event_id);
CREATE INDEX
postgres=# ANALYZE;
ANALYZE
postgres=# EXPLAIN ANALYZE SELECT event_id, event_type, created_at FROM github_events
postgres-# WHERE event_id <= 2489378095;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Distributed Query into pg_merge_job_0049
Executor: Real-Time
Task Count: 7
Tasks Shown: One of 7
-> Task
Node: host=localhost port=9701 dbname=postgres
-> Index Scan using github_events_event_id_idx_102008 on github_events_102008 github_events (cost=0.28..418.
48 rows=2640 width=27) (actual time=0.020..2.219 rows=2638 loops=1)
Index Cond: (event_id <= '2489378095'::bigint)
Planning time: 0.529 ms
Execution time: 2.762 ms
Master Query
-> Seq Scan on pg_merge_job_0049 (cost=0.00..0.00 rows=0 width=0) (actual time=0.003..0.197 rows=2638 loops=1)
Planning time: 0.260 ms
Execution time: 0.312 ms
(14 rows)
The created index is used.
Next, let’s create BRIN index for created_at column.
postgres=# CREATE INDEX github_events_created_at_brin_idx ON github_events
postgres-# USING brin(created_at) WITH (pages_per_range = 10);
CREATE INDEX
postgres=# ANALYZE;
ANALYZE
postgres=# \d github_events
Table "public.github_events"
Column | Type | Modifiers
--------------+-----------------------------+-----------
event_id | bigint |
event_type | text |
event_public | boolean |
repo_id | bigint |
payload | jsonb |
repo | jsonb |
actor | jsonb |
org | jsonb |
created_at | timestamp without time zone |
Indexes:
"github_events_created_at_brin_idx" brin (created_at) WITH (pages_per_range='10')
"github_events_event_id_idx" btree (event_id)
Do SELECT statement below:
postgres=# EXPLAIN ANALYZE SELECT event_id, event_type, created_at FROM github_events
postgres-# WHERE event_id <= 2489378095
postgres-# AND created_at BETWEEN '2015-01-01 00:10:00'
postgres-# AND '2015-01-01 00:12:00';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Distributed Query into pg_merge_job_0051
Executor: Real-Time
Task Count: 2
Tasks Shown: One of 2
-> Task
Node: host=localhost port=9701 dbname=postgres
-> Bitmap Heap Scan on github_events_102008 github_events (cost=71.30..366.45 rows=100 width=27) (actual tim
e=0.394..0.648 rows=280 loops=1)
Recheck Cond: ((created_at >= '2015-01-01 00:10:00'::timestamp without time zone) AND (created_at <= '20
15-01-01 00:12:00'::timestamp without time zone) AND (event_id <= '2489378095'::bigint))
Rows Removed by Index Recheck: 110
Heap Blocks: lossy=50
-> BitmapAnd (cost=71.30..71.30 rows=100 width=0) (actual time=0.286..0.286 rows=0 loops=1)
-> Bitmap Index Scan on github_events_created_at_brin_idx_102008 (cost=0.00..14.92 rows=292 width
=0) (actual time=0.170..0.170 rows=500 loops=1)
Index Cond: ((created_at >= '2015-01-01 00:10:00'::timestamp without time zone) AND (created
_at <= '2015-01-01 00:12:00'::timestamp without time zone))
-> Bitmap Index Scan on github_events_event_id_idx_102008 (cost=0.00..56.08 rows=2640 width=0) (
actual time=0.007..0.007 rows=2638 loops=1)
Index Cond: (event_id <= '2489378095'::bigint)
Planning time: 0.460 ms
Execution time: 0.833 ms
Master Query
-> Seq Scan on pg_merge_job_0051 (cost=0.00..0.00 rows=0 width=0) (actual time=0.003..0.019 rows=280 loops=1)
Planning time: 1.247 ms
Execution time: 0.043 ms
(21 rows)
The BRIN index works! Great!
Tresure Data is a big data management company, and they have a distributed analytical database called Plazma. Plazma is based on PostgreSQL, and the part of realtime storage uses a GiST Index of timestamp range column.
BRIN is easy-to-use for time sequential data, but it may not be effective in complicated tables and/or queries. On the other hand, GiST can be used for general purpose, and it already has a proven track record in the TD’s Plazma.
In any case, now I am very interested in capability of Citus with BRIN (and GiST). I am going to investigate the internal of Citus and the limit of BRIN.