Skip to content

Commit 2bfa48a

Browse files
dhtclkBlargian
andauthored
Apply suggestions from code review
Co-authored-by: Shaun Struwig <41984034+Blargian@users.noreply.github.com>
1 parent 185ab05 commit 2bfa48a

File tree

1 file changed

+35
-26
lines changed

1 file changed

+35
-26
lines changed

docs/guides/best-practices/skipping-indexes-examples.md

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,30 @@ title: 'Data Skipping Index Examples'
77
doc_type: 'guide'
88
---
99

10-
# Data Skipping Index Examples {#data-skipping-index-examples}
10+
# Data skipping index examples {#data-skipping-index-examples}
1111

12-
This page consolidates ClickHouse data skipping index examples, showing how to declare each type, when to use them, and how to verify they're applied. All features work with MergeTree-family tables.
12+
This page consolidates ClickHouse data skipping index examples, showing how to declare each type, when to use them, and how to verify they're applied. All features work with [MergeTree-family tables](/engines/table-engines/mergetree-family/mergetree).
1313

14-
**Index syntax:** `INDEX name expr TYPE type(...) [GRANULARITY N]`
14+
**Index syntax:**
15+
16+
```sql
17+
INDEX name expr TYPE type(...) [GRANULARITY N]`
1518
1619
ClickHouse supports five skip index types:
1720
18-
* **minmax** \- Tracks minimum and maximum values in each granule
19-
* **set(N)** \- Stores up to N distinct values per granule
20-
* **bloom\_filter(\[false\_positive\_rate\])** \- Probabilistic filter for existence checks
21-
* **ngrambf\_v1** \- N-gram bloom filter for substring searches
22-
* **tokenbf\_v1** \- Token-based bloom filter for full-text searches
21+
| Index Type | Description |
22+
|------------|-------------|
23+
| **minmax** | Tracks minimum and maximum values in each granule |
24+
| **set(N)** | Stores up to N distinct values per granule |
25+
| **bloom_filter([false_positive_rate])** | Probabilistic filter for existence checks |
26+
| **ngrambf_v1** | N-gram bloom filter for substring searches |
27+
| **tokenbf_v1** | Token-based bloom filter for full-text searches |
2328
2429
Each section provides examples with sample data and demonstrates how to verify index usage in query execution.
2530
2631
## MinMax index {#minmax-index}
2732
28-
Best for range predicates on loosely sorted data or columns correlated with ORDER BY.
33+
The`minmax` index is best for range predicates on loosely sorted data or columns correlated with `ORDER BY`.
2934
3035
```sql
3136
-- Define in CREATE TABLE
@@ -43,19 +48,19 @@ ORDER BY ts;
4348
ALTER TABLE events ADD INDEX ts_minmax ts TYPE minmax GRANULARITY 1;
4449
ALTER TABLE events MATERIALIZE INDEX ts_minmax;
4550

46-
-- Query that benefits
51+
-- Query that benefits from the index
4752
SELECT count() FROM events WHERE ts >= now() - 3600;
4853

4954
-- Verify usage
5055
EXPLAIN indexes = 1
5156
SELECT count() FROM events WHERE ts >= now() - 3600;
5257
```
5358

54-
See a [worked example](/best-practices/use-data-skipping-indices-where-appropriate#example) with EXPLAIN and pruning.
59+
See a [worked example](/best-practices/use-data-skipping-indices-where-appropriate#example) with `EXPLAIN` and pruning.
5560

5661
## Set index {#set-index}
5762

58-
Use when local (per-block) cardinality is low; not helpful if each block has many distinct values.
63+
Use the `set` index when local (per-block) cardinality is low; not helpful if each block has many distinct values.
5964

6065
```sql
6166
ALTER TABLE events ADD INDEX user_set user_id TYPE set(100) GRANULARITY 1;
@@ -67,11 +72,11 @@ EXPLAIN indexes = 1
6772
SELECT * FROM events WHERE user_id IN (101, 202);
6873
```
6974

70-
Creation/materialization workflow and before/after effect are shown in the [basic operation guide](/optimize/skipping-indexes#basic-operation).
75+
A creation/materialization workflow and the before/after effect are shown in the [basic operation guide](/optimize/skipping-indexes#basic-operation).
7176

7277
## Generic Bloom filter (scalar) {#generic-bloom-filter-scalar}
7378

74-
Good for "needle in a haystack" equality/IN membership. Optional parameter is the false-positive rate (default 0.025).
79+
The `bloom_filter` index is good for "needle in a haystack" equality/IN membership. It accepts an optional parameter which is the false-positive rate (default 0.025).
7580

7681
```sql
7782
ALTER TABLE events ADD INDEX value_bf value TYPE bloom_filter(0.01) GRANULARITY 3;
@@ -85,7 +90,7 @@ SELECT * FROM events WHERE value IN (7, 42, 99);
8590

8691
## N-gram Bloom filter (ngrambf\_v1) for substring search {#n-gram-bloom-filter-ngrambf-v1-for-substring-search}
8792

88-
Splits strings into n-grams; works well for LIKE '%...%'. Supports String/FixedString/Map (via mapKeys/mapValues). Tunable size, hash count, seed. See documentation for [N-gram bloom filter](/engines/table-engines/mergetree-family/mergetree#n-gram-bloom-filter).
93+
The `ngrambf_v1` index splits strings into n-grams. It works well for `LIKE '%...%'` queries. It supports String/FixedString/Map (via mapKeys/mapValues), as well as tunable size, hash count, and seed. See the documentation for [N-gram bloom filter](/engines/table-engines/mergetree-family/mergetree#n-gram-bloom-filter) for further details.
8994

9095
```sql
9196
-- Create index for substring search
@@ -121,7 +126,9 @@ See [parameter docs](/engines/table-engines/mergetree-family/mergetree#n-gram-bl
121126

122127
## Token Bloom filter (tokenbf\_v1) for word-based search {#token-bloom-filter-tokenbf-v1-for-word-based-search}
123128

124-
Indexes tokens separated by non-alphanumeric characters; use with hasToken, LIKE word patterns, equals/IN. Supports String/FixedString/Map. See: [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types).
129+
`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hastoken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
130+
131+
See [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types) pages for more details.
125132

126133
```sql
127134
ALTER TABLE logs ADD INDEX msg_token lower(msg) TYPE tokenbf_v1(10000, 7, 7) GRANULARITY 1;
@@ -138,7 +145,7 @@ See observability examples and guidance on token vs ngram [here](/use-cases/obse
138145

139146
## Add indexes during CREATE TABLE (multiple examples) {#add-indexes-during-create-table-multiple-examples}
140147

141-
Also supports composite expressions and Map/Tuple/Nested.
148+
Skipping indexes also support composite expressions and `Map`/`Tuple`/`Nested` types. This is demonstrated in the example below:
142149

143150
```sql
144151
CREATE TABLE t
@@ -159,7 +166,7 @@ ORDER BY u64;
159166

160167
## Materializing on existing data and verifying {#materializing-on-existing-data-and-verifying}
161168

162-
Add an index to existing data parts using MATERIALIZE, and inspect pruning with EXPLAIN or trace logs.
169+
You can add an index to existing data parts using `MATERIALIZE`, and inspect pruning with `EXPLAIN` or trace logs, as shown below:
163170

164171
```sql
165172
ALTER TABLE t MATERIALIZE INDEX idx_bf;
@@ -171,26 +178,28 @@ SELECT count() FROM t WHERE u64 IN (123, 456);
171178
SET send_logs_level = 'trace';
172179
```
173180

174-
A [worked minmax example](/best-practices/use-data-skipping-indices-where-appropriate#example) demonstrates EXPLAIN output structure and pruning counts.
181+
This [worked minmax example](/best-practices/use-data-skipping-indices-where-appropriate#example) demonstrates EXPLAIN output structure and pruning counts.
175182

176-
## When use and when to avoid {#when-use-and-when-to-avoid}
183+
## When to use and when to avoid skipping indexes {#when-use-and-when-to-avoid}
177184

178185
**Use skip indexes when:**
179186

180187
* Filter values are sparse within data blocks
181-
* Strong correlation exists with ORDER BY columns or data ingestion patterns group similar values together
182-
* Performing text searches on large log datasets (ngrambf\_v1/tokenbf\_v1 types)
188+
* Strong correlation exists with `ORDER BY` columns or data ingestion patterns group similar values together
189+
* Performing text searches on large log datasets (`ngrambf_v1`/`tokenbf_v1` types)
183190

184191
**Avoid skip indexes when:**
185192

186193
* Most blocks likely contain at least one matching value (blocks will be read regardless)
187194
* Filtering on high-cardinality columns with no correlation to data ordering
188195

189-
**Important considerations:** If a value appears even once in a data block, ClickHouse must read the entire block. Test indexes with realistic datasets and adjust granularity and type-specific parameters based on actual performance measurements.
196+
:::note Important considerations
197+
If a value appears even once in a data block, ClickHouse must read the entire block. Test indexes with realistic datasets and adjust granularity and type-specific parameters based on actual performance measurements.
198+
:::
190199

191200
## Temporarily ignore or force indexes {#temporarily-ignore-or-force-indexes}
192201

193-
Disable specific indexes by name for individual queries during testing and troubleshooting. Settings also exist to force index usage when needed. See [ignore\_data\_skipping\_indices](/operations/settings/settings#ignore_data_skipping_indices).
202+
Disable specific indexes by name for individual queries during testing and troubleshooting. Settings also exist to force index usage when needed. See [`ignore_data_skipping_indices`](/operations/settings/settings#ignore_data_skipping_indices).
194203

195204
```sql
196205
-- Ignore an index by name
@@ -201,9 +210,9 @@ SETTINGS ignore_data_skipping_indices = 'msg_token';
201210

202211
## Notes and caveats {#notes-and-caveats}
203212

204-
* Only supported on MergeTree-family tables; pruning happens at the granule/block level.
213+
* Skipping indexes are only supported on [MergeTree-family tables](/engines/table-engines/mergetree-family/mergetree); pruning happens at the granule/block level.
205214
* Bloom-filter-based indexes are probabilistic (false positives cause extra reads but won't skip valid data).
206-
* Bloom filters and other skip indexes should be validated with EXPLAIN and tracing; adjust granularity to balance pruning vs. index size.
215+
* Bloom filters and other skip indexes should be validated with `EXPLAIN` and tracing; adjust granularity to balance pruning vs. index size.
207216

208217
## Related docs {#related-docs}
209218
- [Data skipping index guide](/optimize/skipping-indexes)

0 commit comments

Comments
 (0)