You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[Table parts](/parts)| Learn what table parts are in ClickHouse. |
14
+
|[Table partitions](/partitions)| Learn what table partitions are and what they are used for. |
15
+
|[Table part merges](/merges)| Learn what table part merges are and what they are used for. |
16
+
|[Table shards and replicas](/shards)| Learn what table shards and replicas are and what they are used for. |
17
17
|[Primary indexes](/guides/best-practices/sparse-primary-indexes)| A deep dive into ClickHouse indexing including how it differs from other DB systems, how ClickHouse builds and uses a table's spare primary index and what some of the best practices are for indexing in ClickHouse. |
18
-
|[Architectural Overview](/academic_overview)| A concise academic overview of all components of the ClickHouse architecture, based on our VLDB 2024 scientific paper.|
18
+
|[Architectural Overview](/academic_overview)| A concise academic overview of all components of the ClickHouse architecture, based on our VLDB 2024 scientific paper. |
import image_01 from '@site/static/images/managing-data/core-concepts/shards_01.png'
9
9
import image_02 from '@site/static/images/managing-data/core-concepts/shards_02.png'
10
10
import image_03 from '@site/static/images/managing-data/core-concepts/shards_03.png'
11
11
import image_04 from '@site/static/images/managing-data/core-concepts/shards_04.png'
12
+
import image_05 from '@site/static/images/managing-data/core-concepts/shards_replicas_01.png'
12
13
13
14
## What are table shards in ClickHouse? {#what-are-table-shards-in-clickhouse}
15
+
<br/>
16
+
17
+
:::note
18
+
This topic doesn’t apply to ClickHouse Cloud, where [Parallel Replicas](/docs/deployment-guides/parallel-replicas) serve the same purpose as multiple shards in traditional shared-nothing ClickHouse clusters.
19
+
:::
14
20
15
-
> This topic doesn’t apply to ClickHouse Cloud, where [Parallel Replicas](/docs/deployment-guides/parallel-replicas) serve the same purpose.
16
21
17
-
<br/>
18
22
19
-
In ClickHouse OSS, sharding is used when ① the data is too large for a single server or ② a single server is too slow for processing. The next figure illustrates case ①, where the [uk_price_paid_simple](/parts) table exceeds a single machine’s capacity:
23
+
In traditional [shared-nothing](https://en.wikipedia.org/wiki/Shared-nothing_architecture)ClickHouse clusters, sharding is used when ① the data is too large for a single server or ② a single server is too slow for processing the data. The next figure illustrates case ①, where the [uk_price_paid_simple](/parts) table exceeds a single machine’s capacity:
20
24
21
25
<imgsrc={image_01}alt='SHARDS'class='image' />
22
26
<br/>
@@ -26,7 +30,7 @@ In such a case the data can be split over multiple ClickHouse servers in the for
26
30
<imgsrc={image_02}alt='SHARDS'class='image' />
27
31
<br/>
28
32
29
-
Each shard holds a subset of the data and functions as a regular ClickHouse table that can be queried independently. However, queries will only process that subset, which may be valid depending on data distribution. Typically, a [distributed table](/docs/engines/table-engines/special/distributed) (often per server) provides a unified view of the full dataset. It doesn’t store data itself but forwards **SELECT** queries to all shards, assembles the results, and routes **INSERTS** to distribute data evenly.
33
+
Each shard holds a subset of the data and functions as a regular ClickHouse table that can be queried independently. However, queries will only process that subset, which may be a valid use case depending on data distribution. Typically, a [distributed table](/docs/engines/table-engines/special/distributed) (often per server) provides a unified view of the full dataset. It doesn’t store data itself but forwards **SELECT** queries to all shards, assembles the results, and routes **INSERTS** to distribute data evenly.
@@ -75,11 +79,36 @@ This diagram shows how SELECT queries are processed with a distributed table in
75
79
76
80
Then, the ClickHouse server hosting the initially targeted distributed table ③ collects all local results, ④ merges them into the final global result, and ⑤ returns it to the query sender.
77
81
82
+
## What are table replicas in ClickHouse? {#what-are-table-replicas-in-clickhouse}
83
+
84
+
Replication in ClickHouse ensures **data integrity** and **failover** by maintaining **copies of shard data** across multiple servers. Since hardware failures are inevitable, replication prevents data loss by ensuring that each shard has multiple replicas. Writes can be directed to any replica, either directly or via a [distributed table](#distributed-table-creation), which selects a replica for the operation. Changes are automatically propagated to other replicas. In case of a failure or maintenance, data remains available on other replicas, and once a failed host recovers, it synchronizes automatically to stay up to date.
85
+
86
+
Note that replication requires a [Keeper](https://clickhouse.com/clickhouse/keeper) component in the [cluster architecture](/docs/architecture/horizontal-scaling#architecture-diagram).
87
+
88
+
The following diagram illustrates a ClickHouse cluster with six servers, where the two table shards `Shard-1` and `Shard-2` introduced earlier each have three replicas. A query is sent to this cluster:
89
+
90
+
<imgsrc={image_05}alt='SHARDS'class='image' />
91
+
<br/>
92
+
93
+
Query processing works similarly to setups without replicas, with only a single replica from each shard executing the query.
94
+
95
+
> Replicas not only ensure data integrity and failover but also improve query processing throughput by allowing multiple queries to run in parallel across different replicas.
96
+
97
+
① A query targeting the distributed table is sent to corresponding ClickHouse server, either directly or via a load balancer.
98
+
99
+
② The Distributed table forwards the query to one replica from each shard, where each ClickHouse server hosting the selected replica computes its local query result in parallel.
100
+
101
+
The rest works the [same](#select-forwarding) as in setups without replicas and is not shown in the diagram above. The ClickHouse server hosting the initially targeted distributed table collects all local results, merges them into the final global result, and returns it to the query sender.
102
+
103
+
Note that ClickHouse allows configuring the query forwarding strategy for ②. By default—unlike in the diagram above—the distributed table [prefers](/docs/operations/settings/settings#prefer_localhost_replica) a local replica if available, but other load balancing [strategies](/docs/operations/settings/settings#load_balancing) can be used.
104
+
105
+
106
+
78
107
## Where to find more information {#where-to-find-more-information}
79
108
80
-
For more details beyond this high-level introduction to table shards, check out our [deployment and scaling guide](/docs/architecture/horizontal-scaling).
109
+
For more details beyond this high-level introduction to table shards and replicas, check out our [deployment and scaling guide](/docs/architecture/horizontal-scaling).
81
110
82
-
We also highly recommend this tutorial video for a deeper dive into ClickHouse shards:
111
+
We also highly recommend this tutorial video for a deeper dive into ClickHouse shards and replicas:
83
112
84
113
<iframewidth="768"height="432"src="https://www.youtube.com/embed/vBjCJtw_Ei0?si=WqopTrnti6usCMRs"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
0 commit comments