44
55## Introduction
66
7- The ** rum** module provides access method to work with RUM index. It is based
8- on the GIN access methods code.
7+ The ** rum** module provides access method to work with ` RUM ` index. It is based
8+ on the ` GIN ` access methods code.
9+
10+ ` GIN ` index allows to perform fast full text search using ` tsvector ` and
11+ ` tsquery ` types. But full text search with GIN index has several problems:
12+
13+ - Slow ranking. It is need position information about lexems to ranking. ` GIN `
14+ index doesn't store positions of lexems. So after index scan we need additional
15+ heap scan to retreive lexems positions.
16+ - There isn't phrase search with ` GIN ` index. This problem relates with previous
17+ problem. It is need position information to perform phrase search.
18+ - Slow ordering by timestamp. ` GIN ` index can't store some related information
19+ in index with lexemes. So it is necessary to perform additional heap scan.
20+
21+ ` RUM ` solves this problems by storing additional information in posting tree.
22+ For example, positional information of lexemes or timestamps. You can get an
23+ idea of ` RUM ` by the following picture:
24+
25+ ![ How RUM stores additional information] ( img/gin_rum.png )
26+
27+ Drawback of ` RUM ` is that it has slower build and insert time than ` GIN ` .
28+ It is because we need to store additional information besides keys and because
29+ ` RUM ` uses generic WAL.
930
1031## License
1132
@@ -16,7 +37,7 @@ This module available under the same license as
1637
1738Before build and install ** rum** you should ensure following:
1839
19- * PostgreSQL version is 9.6.
40+ * PostgreSQL version is 9.6+ .
2041
2142Typical installation procedure may look like this:
2243
@@ -27,18 +48,27 @@ Typical installation procedure may look like this:
2748 $ make USE_PGXS=1 installcheck
2849 $ psql DB -c "CREATE EXTENSION rum;"
2950
30- ## New access method and operator class
31-
32- The ** rum** module provides the access method ** rum** and the operator class
33- ** rum_tsvector_ops** .
51+ ## Common operators and functions
3452
35- The module provides new operators.
53+ ** rum ** module provides next operators.
3654
3755| Operator | Returns | Description
3856| -------------------- | ------- | ----------------------------------------------
3957| tsvector < ; => ; tsquery | float4 | Returns distance between tsvector and tsquery.
58+ | timestamp < ; => ; timestamp | float8 | Returns distance between two timestamps.
59+ | timestamp < ; =| ; timestamp | float8 | Returns distance only for left timestamps.
60+ | timestamp | ; => ; timestamp | float8 | Returns distance only for right timestamps.
61+
62+ ## Operator classes
4063
41- ## Examples
64+ ** rum** provides next operator classes
65+
66+ ### rum_tsvector_ops
67+
68+ For type: ` tsvector `
69+
70+ This operator class stores ` tsvector ` lexemes with positional information. Supports
71+ ordering by ` <=> ` operator and prefix search. There is the example.
4272
4373Let us assume we have the table:
4474
@@ -70,9 +100,9 @@ And we can execute the following queries:
70100
71101``` sql
72102= # SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
73- FROM test_rum
74- WHERE a @@ to_tsquery(' english' , ' beautiful | place' )
75- ORDER BY a <=> to_tsquery(' english' , ' beautiful | place' );
103+ FROM test_rum
104+ WHERE a @@ to_tsquery(' english' , ' beautiful | place' )
105+ ORDER BY a <=> to_tsquery(' english' , ' beautiful | place' );
76106 t | rank
77107-- -------------------------------+-----------
78108 The situation is most beautiful | 0 .0303964
@@ -81,16 +111,128 @@ And we can execute the following queries:
81111(3 rows)
82112
83113= # SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
84- FROM test_rum
85- WHERE a @@ to_tsquery(' english' , ' place | situation' )
86- ORDER BY a <=> to_tsquery(' english' , ' place | situation' );
114+ FROM test_rum
115+ WHERE a @@ to_tsquery(' english' , ' place | situation' )
116+ ORDER BY a <=> to_tsquery(' english' , ' place | situation' );
87117 t | rank
88118-- -------------------------------+-----------
89119 The situation is most beautiful | 0 .0303964
90120 It looks like a beautiful place | 0 .0303964
91121(2 rows)
92122```
93123
124+ ### rum_tsvector_hash_ops
125+
126+ For type: ` tsvector `
127+
128+ This operator class stores hash of ` tsvector ` lexemes with positional information.
129+ Supports ordering by ` <=> ` operator. But ** doesn't** support prefix search.
130+
131+ ### rum_timestamp_ops
132+
133+ For type: ` timestamp `
134+
135+ Operator class provides fast search and ordering by timestamp fields. Supports
136+ ordering by ` <=> ` , ` <=| ` and ` |=> ` operators. Can be used
137+ with ` rum_tsvector_timestamp_ops ` operator class.
138+
139+ ### rum_timestamptz_ops
140+
141+ For type: ` timestamptz `
142+
143+ Operator class provides fast search and ordering by timestamptz fields. Supports
144+ ordering by ` <=> ` , ` <=| ` and ` |=> ` operators. Can be used
145+ with ` rum_tsvector_timestamptz_ops ` operator class.
146+
147+ ### rum_tsvector_timestamp_ops
148+
149+ For type: ` tsvector `
150+
151+ This operator class stores ` tsvector ` lexems with timestamp field. There is the example:
152+
153+ Let us assume we have the table:
154+ ``` sql
155+ CREATE TABLE tsts (id int , t tsvector, d timestamp );
156+
157+ \copy tsts from ' rum/data/tsts.data'
158+
159+ CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160+ WITH (attach = ' d' , to = ' t' );
161+ ```
162+
163+ Now we can execute the following queries:
164+ ``` sql
165+ EXPLAIN (costs off)
166+ SELECT id, d, d <=> ' 2016-05-16 14:21:25' FROM tsts WHERE t @@ ' wr&qh' ORDER BY d <=> ' 2016-05-16 14:21:25' LIMIT 5 ;
167+ QUERY PLAN
168+ -- ---------------------------------------------------------------------------------
169+ Limit
170+ - > Index Scan using tsts_idx on tsts
171+ Index Cond: (t @@ ' ' ' wr' ' & ' ' qh' ' ' ::tsquery)
172+ Order By : (d <=> ' Mon May 16 14:21:25 2016' ::timestamp without time zone )
173+ (4 rows)
174+
175+ SELECT id, d, d <=> ' 2016-05-16 14:21:25' FROM tsts WHERE t @@ ' wr&qh' ORDER BY d <=> ' 2016-05-16 14:21:25' LIMIT 5 ;
176+ id | d | ?column?
177+ -- ---+---------------------------------+---------------
178+ 355 | Mon May 16 14 :21 :22 .326724 2016 | 2 .673276
179+ 354 | Mon May 16 13 :21 :22 .326724 2016 | 3602 .673276
180+ 371 | Tue May 17 06 :21 :22 .326724 2016 | 57597 .326724
181+ 406 | Wed May 18 17 :21 :22 .326724 2016 | 183597 .326724
182+ 415 | Thu May 19 02 :21 :22 .326724 2016 | 215997 .326724
183+ (5 rows)
184+ ```
185+
186+ ### rum_tsvector_timestamptz_ops
187+
188+ For type: ` tsvector `
189+
190+ See comments for ` rum_tsvector_timestamp_ops ` operator class.
191+
192+ ### rum_tsvector_hash_timestamp_ops
193+
194+ For type: ` tsvector `
195+
196+ This operator class stores hash of ` tsvector ` lexems with timestamp field.
197+ ** Doesn't** support prefix search.
198+
199+ ### rum_tsvector_hash_timestamptz_ops
200+
201+ For type: ` tsvector `
202+
203+ This operator class stores hash of ` tsvector ` lexems with timestamptz field.
204+ ** Doesn't** support prefix search.
205+
206+ ### rum_tsquery_ops
207+
208+ For type: ` tsquery `
209+
210+ Stores branches of query tree in additional information. For example we have the table:
211+ ``` sql
212+ CREATE TABLE query (q tsquery, tag text );
213+
214+ INSERT INTO query VALUES (' supernova & star' , ' sn' ),
215+ (' black' , ' color' ),
216+ (' big & bang & black & hole' , ' bang' ),
217+ (' spiral & galaxy' , ' shape' ),
218+ (' black & hole' , ' color' );
219+
220+ CREATE INDEX query_idx ON query USING rum(q);
221+ ```
222+
223+ Now we can execute the following fast query:
224+ ``` sql
225+ SELECT * FROM query
226+ WHERE to_tsvector(' black holes never exists before we think about them' ) @@ q;
227+ q | tag
228+ -- ----------------+-------
229+ ' black' | color
230+ ' black' & ' hole' | color
231+ (2 rows)
232+
233+
234+ ```
235+
94236## Authors
95237
96238Alexander Korotkov <a.korotkov@postgrespro.ru > Postgres Professional Ltd., Russia
0 commit comments