Skip to content

Commit f517797

Browse files
authored
Merge pull request #3445 from ClickHouse/replicas
Replicas section at the end of shards core concept doc.
2 parents 0ff61cd + 9a6bda0 commit f517797

File tree

7 files changed

+45
-16
lines changed

7 files changed

+45
-16
lines changed

docs/managing-data/core-concepts/index.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ keywords: [concepts, part, partition, primary index]
88
In this section of the documentation,
99
you will learn some of the core concepts of how ClickHouse works.
1010

11-
| Page | Description |
12-
|------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
13-
| [Table parts](/parts) | Learn what table parts are in ClickHouse. |
14-
| [Table partitions](/partitions) | Learn what table partitions are and what they are used for. |
15-
| [Table part merges](/merges) | Learn what table part merges are and what they are used for. |
16-
| [Table shards](/shards) | Learn what table shards are and what they are used for. |
11+
| Page | Description |
12+
|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
13+
| [Table parts](/parts) | Learn what table parts are in ClickHouse. |
14+
| [Table partitions](/partitions) | Learn what table partitions are and what they are used for. |
15+
| [Table part merges](/merges) | Learn what table part merges are and what they are used for. |
16+
| [Table shards and replicas](/shards) | Learn what table shards and replicas are and what they are used for. |
1717
| [Primary indexes](/guides/best-practices/sparse-primary-indexes) | A deep dive into ClickHouse indexing including how it differs from other DB systems, how ClickHouse builds and uses a table's spare primary index and what some of the best practices are for indexing in ClickHouse. |
18-
| [Architectural Overview](/academic_overview) | A concise academic overview of all components of the ClickHouse architecture, based on our VLDB 2024 scientific paper. |
18+
| [Architectural Overview](/academic_overview) | A concise academic overview of all components of the ClickHouse architecture, based on our VLDB 2024 scientific paper. |

docs/managing-data/core-concepts/shards.md

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,26 @@
11
---
22
slug: /shards
3-
title: Table shards
4-
description: What are table shards in ClickHouse
5-
keywords: [shard, shards, sharding]
3+
title: Table shards and replicas
4+
description: What are table shards and replicas in ClickHouse
5+
keywords: [shard, shards, sharding, replica, replicas]
66
---
77

88
import image_01 from '@site/static/images/managing-data/core-concepts/shards_01.png'
99
import image_02 from '@site/static/images/managing-data/core-concepts/shards_02.png'
1010
import image_03 from '@site/static/images/managing-data/core-concepts/shards_03.png'
1111
import image_04 from '@site/static/images/managing-data/core-concepts/shards_04.png'
12+
import image_05 from '@site/static/images/managing-data/core-concepts/shards_replicas_01.png'
1213

1314
## What are table shards in ClickHouse? {#what-are-table-shards-in-clickhouse}
15+
<br/>
16+
17+
:::note
18+
This topic doesn’t apply to ClickHouse Cloud, where [Parallel Replicas](/docs/deployment-guides/parallel-replicas) serve the same purpose as multiple shards in traditional shared-nothing ClickHouse clusters.
19+
:::
1420

15-
> This topic doesn’t apply to ClickHouse Cloud, where [Parallel Replicas](/docs/deployment-guides/parallel-replicas) serve the same purpose.
1621

17-
<br/>
1822

19-
In ClickHouse OSS, sharding is used when ① the data is too large for a single server or ② a single server is too slow for processing. The next figure illustrates case ①, where the [uk_price_paid_simple](/parts) table exceeds a single machine’s capacity:
23+
In traditional [shared-nothing](https://en.wikipedia.org/wiki/Shared-nothing_architecture) ClickHouse clusters, sharding is used when ① the data is too large for a single server or ② a single server is too slow for processing the data. The next figure illustrates case ①, where the [uk_price_paid_simple](/parts) table exceeds a single machine’s capacity:
2024

2125
<img src={image_01} alt='SHARDS' class='image' />
2226
<br/>
@@ -26,7 +30,7 @@ In such a case the data can be split over multiple ClickHouse servers in the for
2630
<img src={image_02} alt='SHARDS' class='image' />
2731
<br/>
2832

29-
Each shard holds a subset of the data and functions as a regular ClickHouse table that can be queried independently. However, queries will only process that subset, which may be valid depending on data distribution. Typically, a [distributed table](/docs/engines/table-engines/special/distributed) (often per server) provides a unified view of the full dataset. It doesn’t store data itself but forwards **SELECT** queries to all shards, assembles the results, and routes **INSERTS** to distribute data evenly.
33+
Each shard holds a subset of the data and functions as a regular ClickHouse table that can be queried independently. However, queries will only process that subset, which may be a valid use case depending on data distribution. Typically, a [distributed table](/docs/engines/table-engines/special/distributed) (often per server) provides a unified view of the full dataset. It doesn’t store data itself but forwards **SELECT** queries to all shards, assembles the results, and routes **INSERTS** to distribute data evenly.
3034

3135
## Distributed table creation {#distributed-table-creation}
3236

@@ -75,11 +79,36 @@ This diagram shows how SELECT queries are processed with a distributed table in
7579

7680
Then, the ClickHouse server hosting the initially targeted distributed table ③ collects all local results, ④ merges them into the final global result, and ⑤ returns it to the query sender.
7781

82+
## What are table replicas in ClickHouse? {#what-are-table-replicas-in-clickhouse}
83+
84+
Replication in ClickHouse ensures **data integrity** and **failover** by maintaining **copies of shard data** across multiple servers. Since hardware failures are inevitable, replication prevents data loss by ensuring that each shard has multiple replicas. Writes can be directed to any replica, either directly or via a [distributed table](#distributed-table-creation), which selects a replica for the operation. Changes are automatically propagated to other replicas. In case of a failure or maintenance, data remains available on other replicas, and once a failed host recovers, it synchronizes automatically to stay up to date.
85+
86+
Note that replication requires a [Keeper](https://clickhouse.com/clickhouse/keeper) component in the [cluster architecture](/docs/architecture/horizontal-scaling#architecture-diagram).
87+
88+
The following diagram illustrates a ClickHouse cluster with six servers, where the two table shards `Shard-1` and `Shard-2` introduced earlier each have three replicas. A query is sent to this cluster:
89+
90+
<img src={image_05} alt='SHARDS' class='image' />
91+
<br/>
92+
93+
Query processing works similarly to setups without replicas, with only a single replica from each shard executing the query.
94+
95+
> Replicas not only ensure data integrity and failover but also improve query processing throughput by allowing multiple queries to run in parallel across different replicas.
96+
97+
① A query targeting the distributed table is sent to corresponding ClickHouse server, either directly or via a load balancer.
98+
99+
② The Distributed table forwards the query to one replica from each shard, where each ClickHouse server hosting the selected replica computes its local query result in parallel.
100+
101+
The rest works the [same](#select-forwarding) as in setups without replicas and is not shown in the diagram above. The ClickHouse server hosting the initially targeted distributed table collects all local results, merges them into the final global result, and returns it to the query sender.
102+
103+
Note that ClickHouse allows configuring the query forwarding strategy for ②. By default—unlike in the diagram above—the distributed table [prefers](/docs/operations/settings/settings#prefer_localhost_replica) a local replica if available, but other load balancing [strategies](/docs/operations/settings/settings#load_balancing) can be used.
104+
105+
106+
78107
## Where to find more information {#where-to-find-more-information}
79108

80-
For more details beyond this high-level introduction to table shards, check out our [deployment and scaling guide](/docs/architecture/horizontal-scaling).
109+
For more details beyond this high-level introduction to table shards and replicas, check out our [deployment and scaling guide](/docs/architecture/horizontal-scaling).
81110

82-
We also highly recommend this tutorial video for a deeper dive into ClickHouse shards:
111+
We also highly recommend this tutorial video for a deeper dive into ClickHouse shards and replicas:
83112

84113
<iframe width="768" height="432" src="https://www.youtube.com/embed/vBjCJtw_Ei0?si=WqopTrnti6usCMRs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
85114

174 KB
Loading
227 KB
Loading
90.3 KB
Loading
112 KB
Loading
1.77 MB
Loading

0 commit comments

Comments
 (0)