You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/why-clickhouse-is-so-fast.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,15 +21,17 @@ To avoid that too many parts accumulate, ClickHouse runs a [merge](/merges) oper
21
21
22
22
This approach has several advantages: All data processing can be [offloaded to background part merges](/concepts/why-clickhouse-is-so-fast#storage-layer-merge-time-computation), keeping data writes lightweight and highly efficient. Individual inserts are "local" in the sense that they do not need to update global, i.e. per-table data structures. As a result, multiple simultaneous inserts need no mutual synchronization or synchronization with existing table data, and thus inserts can be performed almost at the speed of disk I/O.
23
23
24
-
🤿 Deep dive into this [here](/docs/academic_overview#3-1-on-disk-format).
24
+
the holistic performance optimization section of the VLDB paper.
25
+
26
+
🤿 Deep dive into this in the [On-Disk Format](/docs/academic_overview#3-1-on-disk-format) section of the web version of our VLDB 2024 paper.
25
27
26
28
## Storage Layer: Concurrent inserts and selects are isolated {#storage-layer-concurrent-inserts-and-selects-are-isolated}
27
29
28
30
<iframewidth="768"height="432"src="https://www.youtube.com/embed/dvGlPh2bJFo?si=F3MSALPpe0gAoq5k"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
29
31
30
32
Inserts are fully isolated from SELECT queries, and merging inserted data parts happens in the background without affecting concurrent queries.
31
33
32
-
🤿 Deep dive into this [here](/docs/academic_overview#3-7-acid-compliance).
34
+
🤿 Deep dive into this in the [Storage Layer](/docs/academic_overview#3-storage-layer) section of the web version of our VLDB 2024 paper.
@@ -49,7 +51,7 @@ On the one hand, user queries may become significantly faster, sometimes by 1000
49
51
50
52
On the other hand, the majority of the runtime of merges is consumed by loading the input parts and saving the output part. The additional effort to transform the data during merge does usually not impact the runtime of merges too much. All of this magic is completely transparent and does not affect the result of queries (besides their performance).
51
53
52
-
🤿 Deep dive into this [here](/docs/academic_overview#3-3-merge-time-data-transformation).
54
+
🤿 Deep dive into this in the [Merge-time Data Transformation](/docs/academic_overview#3-3-merge-time-data-transformation) section of the web version of our VLDB 2024 paper.
53
55
54
56
## Storage Layer: Data pruning {#storage-layer-data-pruning}
55
57
@@ -65,7 +67,7 @@ In practice, many queries are repetitive, i.e., run unchanged or only with sligh
65
67
66
68
All three techniques aim to skip as many rows during full-column reads as possible because the fastest way to read data is to not read it at all.
67
69
68
-
🤿 Deep dive into this [here](/docs/academic_overview#3-2-data-pruning).
70
+
🤿 Deep dive into this in the [Data Pruning](/docs/academic_overview#3-2-data-pruning) section of the web version of our VLDB 2024 paper.
69
71
70
72
## Storage Layer: Data compression {#storage-layer-data-compression}
71
73
@@ -79,7 +81,7 @@ Users can [specify](https://clickhouse.com/blog/optimize-clickhouse-codecs-compr
79
81
80
82
Data compression not only reduces the storage size of the database tables, but in many cases, it also improves query performance as local disks and network I/O are often constrained by low throughput.
81
83
82
-
🤿 Deep dive into this [here](/docs/academic_overview#3-1-on-disk-format).
84
+
🤿 Deep dive into this in the [On-Disk Format](/docs/academic_overview#3-1-on-disk-format) section of the web version of our VLDB 2024 paper.
@@ -91,7 +93,7 @@ Modern systems have dozens of CPU cores. To utilize all cores, ClickHouse unfold
91
93
92
94
If a single node becomes too small to hold the table data, further nodes can be added to form a cluster. Tables can be split ("sharded") and distributed across the nodes. ClickHouse will run queries on all nodes that store table data and thereby scale "horizontally" with the number of available nodes.
93
95
94
-
🤿 Deep dive into this [here](/academic_overview#4-query-processing-layer).
96
+
🤿 Deep dive into this in the [Query Processing Layer](/academic_overview#4-query-processing-layer) section of the web version of our VLDB 2024 paper.
95
97
96
98
## Meticulous attention to detail {#meticulous-attention-to-detail}
97
99
@@ -121,7 +123,7 @@ The [hash table implementation in ClickHouse](https://clickhouse.com/blog/hash-t
121
123
122
124
Algorithms that rely on data characteristics often perform better than their generic counterparts. If the data characteristics are not known in advance, the system can try various implementations and choose the one that works best at runtime. For an example, see the [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/).
123
125
124
-
🤿 Deep dive into this [here](/academic_overview#4-4-holistic-performance-optimization).
126
+
🤿 Deep dive into this in the [Holistic Performance Optimization](/academic_overview#4-4-holistic-performance-optimization) section of the web version of our VLDB 2024 paper.
0 commit comments