Citus with BRIN index

In this weekend, I tried Citus. The latest version of Citus is for PostgreSQL 9.5, so it can be used BRIN index.

Installation

Refer to the document.

Set up

In this time, I used the append mode to create distributed tables. Refer to document.

Login and check

$ csql -h localhost -p 9700 -d postgres
csql (9.5.3)
Type "help" for help.

postgres=# \d
             List of relations
 Schema |     Name      | Type  |  Owner   
--------+---------------+-------+----------
 public | github_events | table | postgres
(1 row)

postgres=# \d github_events
              Table "public.github_events"
    Column    |            Type             | Modifiers 
--------------+-----------------------------+-----------
 event_id     | bigint                      | 
 event_type   | text                        | 
 event_public | boolean                     | 
 repo_id      | bigint                      | 
 payload      | jsonb                       | 
 repo         | jsonb                       | 
 actor        | jsonb                       | 
 org          | jsonb                       | 
 created_at   | timestamp without time zone | 

postgres=# SELECT count(*) FROM github_events;
 count 
-------
 39599
(1 row)

Creating Indexes

B-Tree

At first, let’s create an index of event_id with B-tree and do SELECT statement.

postgres=# CREATE INDEX github_events_event_id_idx ON github_events (event_id);
CREATE INDEX
postgres=# ANALYZE;
ANALYZE

postgres=# EXPLAIN ANALYZE SELECT event_id, event_type, created_at FROM github_events 
postgres-#                                     WHERE event_id <= 2489378095;
                         QUERY PLAN
------------------------------------------------------------------------------------------------------
 Distributed Query into pg_merge_job_0049
   Executor: Real-Time
   Task Count: 7
   Tasks Shown: One of 7
   ->  Task
         Node: host=localhost port=9701 dbname=postgres
         ->  Index Scan using github_events_event_id_idx_102008 on github_events_102008 github_events  (cost=0.28..418.
48 rows=2640 width=27) (actual time=0.020..2.219 rows=2638 loops=1)
               Index Cond: (event_id <= '2489378095'::bigint)
             Planning time: 0.529 ms
             Execution time: 2.762 ms
 Master Query
   ->  Seq Scan on pg_merge_job_0049  (cost=0.00..0.00 rows=0 width=0) (actual time=0.003..0.197 rows=2638 loops=1)
 Planning time: 0.260 ms
 Execution time: 0.312 ms
(14 rows)

The created index is used.

BRIN

Next, let’s create BRIN index for created_at column.

postgres=# CREATE INDEX github_events_created_at_brin_idx ON github_events
postgres-#       USING brin(created_at) WITH (pages_per_range = 10);
CREATE INDEX
postgres=# ANALYZE;
ANALYZE

postgres=# \d github_events
              Table "public.github_events"
    Column    |            Type             | Modifiers 
--------------+-----------------------------+-----------
 event_id     | bigint                      | 
 event_type   | text                        | 
 event_public | boolean                     | 
 repo_id      | bigint                      | 
 payload      | jsonb                       | 
 repo         | jsonb                       | 
 actor        | jsonb                       | 
 org          | jsonb                       | 
 created_at   | timestamp without time zone | 
Indexes:
    "github_events_created_at_brin_idx" brin (created_at) WITH (pages_per_range='10')
    "github_events_event_id_idx" btree (event_id)

Do SELECT statement below:

postgres=# EXPLAIN ANALYZE SELECT event_id, event_type, created_at FROM github_events 
postgres-#     WHERE event_id <= 2489378095 
postgres-#         AND created_at BETWEEN '2015-01-01 00:10:00' 
postgres-#                            AND '2015-01-01 00:12:00';
                         QUERY PLAN
------------------------------------------------------------------------------------------------------
 Distributed Query into pg_merge_job_0051
   Executor: Real-Time
   Task Count: 2
   Tasks Shown: One of 2
   ->  Task
         Node: host=localhost port=9701 dbname=postgres
         ->  Bitmap Heap Scan on github_events_102008 github_events  (cost=71.30..366.45 rows=100 width=27) (actual tim
e=0.394..0.648 rows=280 loops=1)
               Recheck Cond: ((created_at >= '2015-01-01 00:10:00'::timestamp without time zone) AND (created_at <= '20
15-01-01 00:12:00'::timestamp without time zone) AND (event_id <= '2489378095'::bigint))
               Rows Removed by Index Recheck: 110
               Heap Blocks: lossy=50
               ->  BitmapAnd  (cost=71.30..71.30 rows=100 width=0) (actual time=0.286..0.286 rows=0 loops=1)
                     ->  Bitmap Index Scan on github_events_created_at_brin_idx_102008  (cost=0.00..14.92 rows=292 width
=0) (actual time=0.170..0.170 rows=500 loops=1)
                           Index Cond: ((created_at >= '2015-01-01 00:10:00'::timestamp without time zone) AND (created
_at <= '2015-01-01 00:12:00'::timestamp without time zone))
                     ->  Bitmap Index Scan on github_events_event_id_idx_102008  (cost=0.00..56.08 rows=2640 width=0) (
actual time=0.007..0.007 rows=2638 loops=1)
                           Index Cond: (event_id <= '2489378095'::bigint)
             Planning time: 0.460 ms
             Execution time: 0.833 ms
 Master Query
   ->  Seq Scan on pg_merge_job_0051  (cost=0.00..0.00 rows=0 width=0) (actual time=0.003..0.019 rows=280 loops=1)
 Planning time: 1.247 ms
 Execution time: 0.043 ms
(21 rows)

The BRIN index works! Great!

BRIN vs GiST

Tresure Data is a big data management company, and they have a distributed analytical database called Plazma. Plazma is based on PostgreSQL, and the part of realtime storage uses a GiST Index of timestamp range column.

Note: When they created the Plazma, PostgreSQL did not support BRIN.

BRIN is easy-to-use for time sequential data, but it may not be effective in complicated tables and/or queries. On the other hand, GiST can be used for general purpose, and it already has a proven track record in the TD’s Plazma.

In any case, now I am very interested in capability of Citus with BRIN (and GiST). I am going to investigate the internal of Citus and the limit of BRIN.