Skip to content

Commit 58194db

Browse files
authored
Merge pull request #4498 from Blargian/revise_sharding_replication_docs
Scaling docs: edits to make the guides flow easier
2 parents e60dcac + e5c8adf commit 58194db

File tree

3 files changed

+37
-24
lines changed

3 files changed

+37
-24
lines changed

docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -557,11 +557,7 @@ SHOW DATABASES;
557557

558558
## Create a table on the cluster {#creating-a-table}
559559

560-
Now that the database has been created, you will create a distributed table.
561-
Distributed tables are tables which have access to shards located on different
562-
hosts and are defined using the `Distributed` table engine. The distributed table
563-
acts as the interface across all the shards in the cluster.
564-
560+
Now that the database has been created, you will create a table.
565561
Run the following query from any of the host clients:
566562

567563
```sql
@@ -608,8 +604,6 @@ SHOW TABLES IN uk;
608604
└─────────────────────┘
609605
```
610606

611-
## Insert data into a distributed table {#inserting-data}
612-
613607
Before we insert the UK price paid data, let's perform a quick experiment to see
614608
what happens when we insert data into an ordinary table from either host.
615609

@@ -622,7 +616,7 @@ CREATE TABLE test.test_table ON CLUSTER cluster_2S_1R
622616
`id` UInt64,
623617
`name` String
624618
)
625-
ENGINE = ReplicatedMergeTree
619+
ENGINE = MergeTree()
626620
ORDER BY id;
627621
```
628622

@@ -654,16 +648,18 @@ SELECT * FROM test.test_table;
654648
-- └────┴────────────────────┘
655649
```
656650

657-
You will notice that only the row that was inserted into the table on that
651+
You will notice that unlike with a `ReplicatedMergeTree` table only the row that was inserted into the table on that
658652
particular host is returned and not both rows.
659653

660-
To read the data from the two shards we need an interface which can handle queries
654+
To read the data across the two shards, we need an interface which can handle queries
661655
across all the shards, combining the data from both shards when we run select queries
662-
on it, and handling the insertion of data to the separate shards when we run insert queries.
656+
on it or inserting data to both shards when we run insert queries.
663657

664-
In ClickHouse this interface is called a distributed table, which we create using
658+
In ClickHouse this interface is called a **distributed table**, which we create using
665659
the [`Distributed`](/engines/table-engines/special/distributed) table engine. Let's take a look at how it works.
666660

661+
## Create a distributed table {#create-distributed-table}
662+
667663
Create a distributed table with the following query:
668664

669665
```sql
@@ -674,8 +670,12 @@ ENGINE = Distributed('cluster_2S_1R', 'test', 'test_table', rand())
674670
In this example, the `rand()` function is chosen as the sharding key so that
675671
inserts are randomly distributed across the shards.
676672

677-
Now query the distributed table from either host and you will get back
678-
both of the rows which were inserted on the two hosts:
673+
Now query the distributed table from either host, and you will get back
674+
both of the rows which were inserted on the two hosts, unlike in our previous example:
675+
676+
```sql
677+
SELECT * FROM test.test_table_dist;
678+
```
679679

680680
```sql
681681
┌─id─┬─name───────────────┐
@@ -694,6 +694,8 @@ ON CLUSTER cluster_2S_1R
694694
ENGINE = Distributed('cluster_2S_1R', 'uk', 'uk_price_paid_local', rand());
695695
```
696696

697+
## Insert data into a distributed table {#inserting-data-into-distributed-table}
698+
697699
Now connect to either of the hosts and insert the data:
698700

699701
```sql

docs/deployment-guides/replication-sharding-examples/03_2_shards_2_replicas.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -586,12 +586,9 @@ SHOW DATABASES;
586586
└────────────────────┘
587587
```
588588

589-
## Create a distributed table on the cluster {#creating-a-table}
589+
## Create a table on the cluster {#creating-a-table}
590590

591-
Now that the database has been created, next you will create a distributed table.
592-
Distributed tables are tables which have access to shards located on different
593-
hosts and are defined using the `Distributed` table engine. The distributed table
594-
acts as the interface across all the shards in the cluster.
591+
Now that the database has been created, next you will create a table with replication.
595592

596593
Run the following query from any of the host clients:
597594

@@ -663,14 +660,16 @@ SHOW TABLES IN uk;
663660

664661
## Insert data into a distributed table {#inserting-data-using-distributed}
665662

666-
To insert data into the distributed table, `ON CLUSTER` cannot be used as it does
663+
To insert data into the table, `ON CLUSTER` cannot be used as it does
667664
not apply to DML (Data Manipulation Language) queries such as `INSERT`, `UPDATE`,
668665
and `DELETE`. To insert data, it is necessary to make use of the
669666
[`Distributed`](/engines/table-engines/special/distributed) table engine.
667+
As you learned in the [guide](/architecture/horizontal-scaling) for setting up a cluster with 2 shards and 1 replica, distributed tables are tables which have access to shards located on different
668+
hosts and are defined using the `Distributed` table engine.
669+
The distributed table acts as the interface across all the shards in the cluster.
670670

671671
From any of the host clients, run the following query to create a distributed table
672-
using the existing table we created previously with `ON CLUSTER` and use of the
673-
`ReplicatedMergeTree`:
672+
using the existing replicated table we created in the previous step:
674673

675674
```sql
676675
CREATE TABLE IF NOT EXISTS uk.uk_price_paid_distributed
@@ -749,4 +748,16 @@ SELECT count(*) FROM uk.uk_price_paid_local;
749748
└──────────┘
750749
```
751750

752-
</VerticalStepper>
751+
</VerticalStepper>
752+
753+
## Conclusion {#conclusion}
754+
755+
The advantage of this cluster topology with 2 shards and 2 replicas is that it provides both scalability and fault tolerance.
756+
Data is distributed across separate hosts, reducing storage and I/O requirements per node, while queries are processed in parallel across both shards for improved performance and memory efficiency.
757+
Critically, the cluster can tolerate the loss of one node and continue serving queries without interruption, as each shard has a backup replica available on another node.
758+
759+
The main disadvantage of this cluster topology is the increased storage overhead—it requires twice the storage capacity compared to a setup without replicas, as each shard is duplicated.
760+
Additionally, while the cluster can survive a single node failure, losing two nodes simultaneously may render the cluster inoperable, depending on which nodes fail and how shards are distributed.
761+
This topology strikes a balance between availability and cost, making it suitable for production environments where some level of fault tolerance is required without the expense of higher replication factors.
762+
763+
To learn how ClickHouse Cloud processes queries, offering both scalability and fault-tolerance, see the section ["Parallel Replicas"](/deployment-guides/parallel-replicas).

docs/deployment-guides/replication-sharding-examples/_snippets/_working_example.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
The following steps will walk you through setting up the cluster from
33
scratch. If you prefer to skip these steps and jump straight to running the
44
cluster, you can obtain the example
5-
files from the [examples repository](https://github.com/ClickHouse/examples/tree/main/docker-compose-recipes)
5+
files from the examples repository ['docker-compose-recipes' directory](https://github.com/ClickHouse/examples/tree/main/docker-compose-recipes/recipes).
66
:::

0 commit comments

Comments
 (0)