You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import stackOverflowSchema from '@site/static/images/data-modeling/stackoverflow-schema.png';
9
9
import schemaDesignTypes from '@site/static/images/data-modeling/schema-design-types.png';
10
10
import schemaDesignIndices from '@site/static/images/data-modeling/schema-design-indices.png';
11
+
import Image from '@theme/IdealImage';
11
12
12
13
Understanding effective schema design is key to optimizing ClickHouse performance and includes choices that often involve trade-offs, with the optimal approach depending on the queries being served as well as factors such as data update frequency, latency requirements, and data volume. This guide provides an overview of schema design best practices and data modeling techniques for optimizing ClickHouse performance.
13
14
@@ -17,7 +18,7 @@ For the examples in this guide, we use a subset of the Stack Overflow dataset. T
17
18
18
19
> The primary keys and relationships indicated are not enforced through constraints (Parquet is file not table format) and purely indicate how the data is related and the unique keys it possesses.
@@ -203,9 +204,7 @@ Users coming from OLTP databases often look for the equivalent concept in ClickH
203
204
204
205
At the scale at which ClickHouse is often used, memory and disk efficiency are paramount. Data is written to ClickHouse tables in chunks known as parts, with rules applied for merging the parts in the background. In ClickHouse, each part has its own primary index. When parts are merged, then the merged part's primary indexes are also merged. The primary index for a part has one index entry per group of rows - this technique is called sparse indexing.
<Imageimg={schemaDesignIndices}size="md"alt="Sparse Indexing in ClickHouse"/>
209
208
210
209
The selected key in ClickHouse will determine not only the index, but also order in which data is written on disk. Because of this, it can dramatically impact compression levels which can in turn affect query performance. An ordering key which causes the values of most columns to be written in contiguous order will allow the selected compression algorithm (and codecs) to compress the data more effectively.
Copy file name to clipboardExpand all lines: docs/dictionary/index.md
+3-8Lines changed: 3 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@ description: 'A dictionary provides a key-value representation of data for fast
7
7
8
8
import dictionaryUseCases from '@site/static/images/dictionary/dictionary-use-cases.png';
9
9
import dictionaryLeftAnyJoin from '@site/static/images/dictionary/dictionary-left-any-join.png';
10
+
import Image from '@theme/IdealImage';
10
11
11
12
# Dictionary
12
13
@@ -16,19 +17,13 @@ Dictionaries are useful for:
16
17
- Improving the performance of queries, especially when used with `JOIN`s
17
18
- Enriching ingested data on the fly without slowing down the ingestion process
18
19
19
-
<img src={dictionaryUseCases}
20
-
class="image"
21
-
alt="Use cases for Dictionary in ClickHouse"
22
-
style={{width: '100%', background: 'none'}} />
20
+
<Imageimg={dictionaryUseCases}size="lg"alt="Use cases for Dictionary in ClickHouse"/>
23
21
24
22
## Speeding up joins using a Dictionary {#speeding-up-joins-using-a-dictionary}
25
23
26
24
Dictionaries can be used to speed up a specific type of `JOIN`: the [`LEFT ANY` type](/sql-reference/statements/select/join#supported-types-of-join) where the join key needs to match the key attribute of the underlying key-value storage.
27
25
28
-
<img src={dictionaryLeftAnyJoin}
29
-
class="image"
30
-
alt="Using Dictionary with LEFT ANY JOIN"
31
-
style={{width: '300px', background: 'none'}} />
26
+
<Imageimg={dictionaryLeftAnyJoin}size="sm"alt="Using Dictionary with LEFT ANY JOIN"/>
32
27
33
28
If this is the case, ClickHouse can exploit the dictionary to perform a [Direct Join](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4#direct-join). This is ClickHouse's fastest join algorithm and is applicable when the underlying [table engine](/engines/table-engines) for the right-hand side table supports low-latency key-value requests. ClickHouse has three table engines providing this: [Join](/engines/table-engines/special/join) (that is basically a pre-calculated hash table), [EmbeddedRocksDB](/engines/table-engines/integrations/embedded-rocksdb) and [Dictionary](/engines/table-engines/special/dictionary). We will describe the dictionary-based approach, but the mechanics are the same for all three engines.
Copy file name to clipboardExpand all lines: docs/managing-data/drop_partition.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ PARTITION BY toYear(CreationDate)
29
29
30
30
Read about setting the partition expression in a section [How to set the partition expression](/sql-reference/statements/alter/partition/#how-to-set-partition-expression).
31
31
32
-
In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subsets, between [storage tiers](/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/sql-reference/statements/alter/partition).
32
+
In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subsets, between [storage tiers](/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/sql-reference/statements/alter/partition).
Copy file name to clipboardExpand all lines: docs/materialized-view/incremental-materialized-view.md
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@ score: 10000
7
7
---
8
8
9
9
import materializedViewDiagram from '@site/static/images/materialized-view/materialized-view-diagram.png';
10
+
import Image from '@theme/IdealImage';
10
11
11
12
# Incremental Materialized Views
12
13
@@ -18,10 +19,7 @@ The principal motivation for materialized views is that the results inserted int
18
19
19
20
Materialized views in ClickHouse are updated in real time as data flows into the table they are based on, functioning more like continually updating indexes. This is in contrast to other databases where materialized views are typically static snapshots of a query that must be refreshed (similar to ClickHouse [refreshable materialized views](/sql-reference/statements/create/view#refreshable-materialized-view)).
|[Refreshable Materialized View](/materialized-view/refreshable-materialized-view)| Conceptually similar to incremental materialized views but require the periodic execution of the query over the full dataset - the results of which are stored in a target table for querying. |
12
12
13
13
14
-
<iframewidth="560"height="315"src="https://www.youtube.com/embed/-A3EtQgDn_0?si=TBiN_E80BKZ0DPpd"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
14
+
<iframewidth="1024"height="576"src="https://www.youtube.com/embed/-A3EtQgDn_0?si=TBiN_E80BKZ0DPpd"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
import refreshableMaterializedViewDiagram from '@site/static/images/materialized-view/refreshable-materialized-view-diagram.png';
9
+
import Image from '@theme/IdealImage';
9
10
10
11
[Refreshable materialized views](/sql-reference/statements/create/view#refreshable-materialized-view) are conceptually similar to materialized views in traditional OLTP databases, storing the result of a specified query for quick retrieval and reducing the need to repeatedly execute resource-intensive queries. Unlike ClickHouse’s [incremental materialized views](/materialized-view/incremental-materialized-view), this requires the periodic execution of the query over the full dataset - the results of which are stored in a target table for querying. This result set should, in theory, be smaller than the original dataset, allowing the subsequent query to execute faster.
11
12
12
13
The diagram explains how Refreshable Materialized Views work:
import postgres_replacingmergetree from '@site/static/images/migrations/postgres-replacingmergetree.png';
9
+
import Image from '@theme/IdealImage';
9
10
10
11
While transactional databases are optimized for transactional update and delete workloads, OLAP databases offer reduced guarantees for such operations. Instead, they optimize for immutable data inserted in batches for the benefit of significantly faster analytical queries. While ClickHouse offers update operations through mutations, as well as a lightweight means of deleting rows, its column-orientated structure means these operations should be scheduled with care, as described above. These operations are handled asynchronously, processed with a single thread, and require (in the case of updates) data to be rewritten on disk. They should thus not be used for high numbers of small changes.
11
12
In order to process a stream of update and delete rows while avoiding the above usage patterns, we can use the ClickHouse table engine ReplacingMergeTree.
@@ -28,7 +29,7 @@ As a result of this merge process, we have four rows representing the final stat
0 commit comments