You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/best-practices/skipping-indexes-examples.md
+35-26Lines changed: 35 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,25 +7,30 @@ title: 'Data Skipping Index Examples'
7
7
doc_type: 'guide'
8
8
---
9
9
10
-
# Data Skipping Index Examples {#data-skipping-index-examples}
10
+
# Data skipping index examples {#data-skipping-index-examples}
11
11
12
-
This page consolidates ClickHouse data skipping index examples, showing how to declare each type, when to use them, and how to verify they're applied. All features work with MergeTree-family tables.
12
+
This page consolidates ClickHouse data skipping index examples, showing how to declare each type, when to use them, and how to verify they're applied. All features work with [MergeTree-family tables](/engines/table-engines/mergetree-family/mergetree).
13
13
14
-
**Index syntax:**`INDEX name expr TYPE type(...) [GRANULARITY N]`
14
+
**Index syntax:**
15
+
16
+
```sql
17
+
INDEX name expr TYPE type(...) [GRANULARITY N]`
15
18
16
19
ClickHouse supports five skip index types:
17
20
18
-
***minmax**\- Tracks minimum and maximum values in each granule
19
-
***set(N)**\- Stores up to N distinct values per granule
20
-
***bloom\_filter(\[false\_positive\_rate\])**\- Probabilistic filter for existence checks
21
-
***ngrambf\_v1**\- N-gram bloom filter for substring searches
22
-
***tokenbf\_v1**\- Token-based bloom filter for full-text searches
21
+
| Index Type | Description |
22
+
|------------|-------------|
23
+
| **minmax** | Tracks minimum and maximum values in each granule |
24
+
| **set(N)** | Stores up to N distinct values per granule |
25
+
| **bloom_filter([false_positive_rate])** | Probabilistic filter for existence checks |
26
+
| **ngrambf_v1** | N-gram bloom filter for substring searches |
27
+
| **tokenbf_v1** | Token-based bloom filter for full-text searches |
23
28
24
29
Each section provides examples with sample data and demonstrates how to verify index usage in query execution.
25
30
26
31
## MinMax index {#minmax-index}
27
32
28
-
Best for range predicates on loosely sorted data or columns correlated with ORDER BY.
33
+
The`minmax` index is best for range predicates on loosely sorted data or columns correlated with `ORDER BY`.
29
34
30
35
```sql
31
36
-- Define in CREATE TABLE
@@ -43,19 +48,19 @@ ORDER BY ts;
43
48
ALTERTABLE events ADD INDEX ts_minmax ts TYPE minmax GRANULARITY 1;
44
49
ALTERTABLE events MATERIALIZE INDEX ts_minmax;
45
50
46
-
-- Query that benefits
51
+
-- Query that benefits from the index
47
52
SELECTcount() FROM events WHERE ts >= now() -3600;
48
53
49
54
-- Verify usage
50
55
EXPLAIN indexes =1
51
56
SELECTcount() FROM events WHERE ts >= now() -3600;
52
57
```
53
58
54
-
See a [worked example](/best-practices/use-data-skipping-indices-where-appropriate#example) with EXPLAIN and pruning.
59
+
See a [worked example](/best-practices/use-data-skipping-indices-where-appropriate#example) with `EXPLAIN` and pruning.
55
60
56
61
## Set index {#set-index}
57
62
58
-
Use when local (per-block) cardinality is low; not helpful if each block has many distinct values.
63
+
Use the `set` index when local (per-block) cardinality is low; not helpful if each block has many distinct values.
59
64
60
65
```sql
61
66
ALTERTABLE events ADD INDEX user_set user_id TYPE set(100) GRANULARITY 1;
@@ -67,11 +72,11 @@ EXPLAIN indexes = 1
67
72
SELECT*FROM events WHERE user_id IN (101, 202);
68
73
```
69
74
70
-
Creation/materialization workflow and before/after effect are shown in the [basic operation guide](/optimize/skipping-indexes#basic-operation).
75
+
A creation/materialization workflow and the before/after effect are shown in the [basic operation guide](/optimize/skipping-indexes#basic-operation).
Good for "needle in a haystack" equality/IN membership. Optional parameter is the false-positive rate (default 0.025).
79
+
The `bloom_filter` index is good for "needle in a haystack" equality/IN membership. It accepts an optional parameter which is the false-positive rate (default 0.025).
75
80
76
81
```sql
77
82
ALTERTABLE events ADD INDEX value_bf value TYPE bloom_filter(0.01) GRANULARITY 3;
@@ -85,7 +90,7 @@ SELECT * FROM events WHERE value IN (7, 42, 99);
85
90
86
91
## N-gram Bloom filter (ngrambf\_v1) for substring search {#n-gram-bloom-filter-ngrambf-v1-for-substring-search}
87
92
88
-
Splits strings into n-grams; works well for LIKE '%...%'. Supports String/FixedString/Map (via mapKeys/mapValues). Tunable size, hash count, seed. See documentation for [N-gram bloom filter](/engines/table-engines/mergetree-family/mergetree#n-gram-bloom-filter).
93
+
The `ngrambf_v1` index splits strings into n-grams. It works well for `LIKE '%...%'` queries. It supports String/FixedString/Map (via mapKeys/mapValues), as well as tunable size, hash count, and seed. See the documentation for [N-gram bloom filter](/engines/table-engines/mergetree-family/mergetree#n-gram-bloom-filter) for further details.
89
94
90
95
```sql
91
96
-- Create index for substring search
@@ -121,7 +126,9 @@ See [parameter docs](/engines/table-engines/mergetree-family/mergetree#n-gram-bl
121
126
122
127
## Token Bloom filter (tokenbf\_v1) for word-based search {#token-bloom-filter-tokenbf-v1-for-word-based-search}
123
128
124
-
Indexes tokens separated by non-alphanumeric characters; use with hasToken, LIKE word patterns, equals/IN. Supports String/FixedString/Map. See: [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types).
129
+
`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hastoken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
130
+
131
+
See [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types) pages for more details.
125
132
126
133
```sql
127
134
ALTERTABLE logs ADD INDEX msg_token lower(msg) TYPE tokenbf_v1(10000, 7, 7) GRANULARITY 1;
@@ -138,7 +145,7 @@ See observability examples and guidance on token vs ngram [here](/use-cases/obse
138
145
139
146
## Add indexes during CREATE TABLE (multiple examples) {#add-indexes-during-create-table-multiple-examples}
140
147
141
-
Also supports composite expressions and Map/Tuple/Nested.
148
+
Skipping indexes also support composite expressions and `Map`/`Tuple`/`Nested` types. This is demonstrated in the example below:
142
149
143
150
```sql
144
151
CREATETABLEt
@@ -159,7 +166,7 @@ ORDER BY u64;
159
166
160
167
## Materializing on existing data and verifying {#materializing-on-existing-data-and-verifying}
161
168
162
-
Add an index to existing data parts using MATERIALIZE, and inspect pruning with EXPLAIN or trace logs.
169
+
You can add an index to existing data parts using `MATERIALIZE`, and inspect pruning with `EXPLAIN` or trace logs, as shown below:
163
170
164
171
```sql
165
172
ALTERTABLE t MATERIALIZE INDEX idx_bf;
@@ -171,26 +178,28 @@ SELECT count() FROM t WHERE u64 IN (123, 456);
171
178
SET send_logs_level ='trace';
172
179
```
173
180
174
-
A[worked minmax example](/best-practices/use-data-skipping-indices-where-appropriate#example) demonstrates EXPLAIN output structure and pruning counts.
181
+
This[worked minmax example](/best-practices/use-data-skipping-indices-where-appropriate#example) demonstrates EXPLAIN output structure and pruning counts.
175
182
176
-
## When use and when to avoid {#when-use-and-when-to-avoid}
183
+
## When to use and when to avoid skipping indexes {#when-use-and-when-to-avoid}
177
184
178
185
**Use skip indexes when:**
179
186
180
187
* Filter values are sparse within data blocks
181
-
* Strong correlation exists with ORDER BY columns or data ingestion patterns group similar values together
182
-
* Performing text searches on large log datasets (ngrambf\_v1/tokenbf\_v1 types)
188
+
* Strong correlation exists with `ORDER BY` columns or data ingestion patterns group similar values together
189
+
* Performing text searches on large log datasets (`ngrambf_v1`/`tokenbf_v1` types)
183
190
184
191
**Avoid skip indexes when:**
185
192
186
193
* Most blocks likely contain at least one matching value (blocks will be read regardless)
187
194
* Filtering on high-cardinality columns with no correlation to data ordering
188
195
189
-
**Important considerations:** If a value appears even once in a data block, ClickHouse must read the entire block. Test indexes with realistic datasets and adjust granularity and type-specific parameters based on actual performance measurements.
196
+
:::note Important considerations
197
+
If a value appears even once in a data block, ClickHouse must read the entire block. Test indexes with realistic datasets and adjust granularity and type-specific parameters based on actual performance measurements.
198
+
:::
190
199
191
200
## Temporarily ignore or force indexes {#temporarily-ignore-or-force-indexes}
192
201
193
-
Disable specific indexes by name for individual queries during testing and troubleshooting. Settings also exist to force index usage when needed. See [ignore\_data\_skipping\_indices](/operations/settings/settings#ignore_data_skipping_indices).
202
+
Disable specific indexes by name for individual queries during testing and troubleshooting. Settings also exist to force index usage when needed. See [`ignore_data_skipping_indices`](/operations/settings/settings#ignore_data_skipping_indices).
*Only supported on MergeTree-family tables; pruning happens at the granule/block level.
213
+
*Skipping indexes are only supported on [MergeTree-family tables](/engines/table-engines/mergetree-family/mergetree); pruning happens at the granule/block level.
205
214
* Bloom-filter-based indexes are probabilistic (false positives cause extra reads but won't skip valid data).
206
-
* Bloom filters and other skip indexes should be validated with EXPLAIN and tracing; adjust granularity to balance pruning vs. index size.
215
+
* Bloom filters and other skip indexes should be validated with `EXPLAIN` and tracing; adjust granularity to balance pruning vs. index size.
207
216
208
217
## Related docs {#related-docs}
209
218
-[Data skipping index guide](/optimize/skipping-indexes)
0 commit comments