Skip to content

Commit 5e91389

Browse files
authored
Merge branch 'main' into update_functions_links
2 parents 6f8cbd0 + 00bbd82 commit 5e91389

File tree

41 files changed

+661
-535
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+661
-535
lines changed

docs/cloud/features/04_infrastructure/automatic_scaling.md renamed to docs/cloud/features/04_infrastructure/automatic_scaling/01_auto_scaling.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,15 @@ For Enterprise tier services scaling works as follows:
3838
- Custom profiles (`highMemory` and `highCPU`) do not support vertical autoscaling or manual vertical scaling. However, these services can be scaled vertically by contacting support.
3939

4040
:::note
41-
Scaling in ClickHouse Cloud happens in what we call "Make Before Break" (MBB) approach. This adds one or more replicas of the new size before removing the old replicas, preventing any loss of capacity during scaling operations. By eliminating the gap between removing existing replicas and adding new ones, MBB creates a more seamless and less disruptive scaling process. It is especially beneficial in scale-up scenarios, where high resource utilization triggers the need for additional capacity, since removing replicas prematurely would only exacerbate the resource constraints. As part of this approach we wait up to an hour to let any existing queries complete on the older replicas before we will remove them. This balances the need for existing queries to complete, while at the same time ensuring that older replicas do not linger around for too long.
41+
Scaling in ClickHouse Cloud happens in what we call a ["Make Before Break" (MBB)](/cloud/features/mbb) approach.
42+
This adds one or more replicas of the new size before removing the old replicas, preventing any loss of capacity during scaling operations.
43+
By eliminating the gap between removing existing replicas and adding new ones, MBB creates a more seamless and less disruptive scaling process.
44+
It is especially beneficial in scale-up scenarios, where high resource utilization triggers the need for additional capacity, since removing replicas prematurely would only exacerbate the resource constraints.
45+
As part of this approach, we wait up to an hour to let any existing queries complete on the older replicas before removing them.
46+
This balances the need for existing queries to complete, while at the same time ensuring that older replicas do not linger around for too long.
4247

4348
Please note that as part of this change:
44-
1. Historical system table data will be retained for up to a maximum of 30 days as part of scaling events. In addition, any system table data older than December 19, 2024, for services on AWS or GCP and older than January 14, 2025, for services on Azure will not be retained as part of the migration to the new organization tiers.
49+
1. Historical system table data is retained for up to a maximum of 30 days as part of scaling events. In addition, any system table data older than December 19, 2024, for services on AWS or GCP and older than January 14, 2025, for services on Azure will not be retained as part of the migration to the new organization tiers.
4550
2. For services utilizing TDE (Transparent Data Encryption) system table data is currently not maintained after MBB operations. We are working on removing this limitation.
4651
:::
4752

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
sidebar_position: 1
3+
sidebar_label: 'Make Before Break (MBB)'
4+
slug: /cloud/features/mbb
5+
description: 'Page describing Make Before Break (MBB) operations in ClickHouse Cloud'
6+
keywords: ['Make Before Break', 'MBB', 'Scaling', 'ClickHouse Cloud']
7+
title: 'Make Before Break (MBB) operations in ClickHouse Cloud'
8+
doc_type: 'guide'
9+
---
10+
11+
import Image from '@theme/IdealImage';
12+
import mbb_diagram from '@site/static/images/cloud/features/mbb/vertical_scaling.png';
13+
14+
ClickHouse Cloud performs cluster upgrades and cluster scaling utilizing a **Make Before Break** (MBB) approach.
15+
In this approach, new replicas are added to the cluster before removing old replicas from it.
16+
This is as opposed to the break-first approach, where old replicas would first be removed, before adding new ones.
17+
18+
The MBB approach has several benefits:
19+
* Since capacity is added to the cluster before removal, the **overall cluster capacity does not go down** unlike with the break-first approach. Of course, unplanned events such as node or disk failures etc. can still happen in a cloud environment.
20+
* This approach is especially useful in situations where the cluster is under heavy load as it **prevents existing replicas from being overloaded** as would happen with a break-first approach.
21+
* Because replicas can be added quickly without having to wait to remove replicas first, this approach leads to a **faster, more responsive** scaling experience.
22+
23+
The image below shows how this might happen for a cluster with 3 replicas where the service is scaled vertically:
24+
25+
<Image img={mbb_diagram} size="lg" alt="Example diagram for a cluster with 3 replicas which gets vertically scaled" />
26+
27+
Overall, MBB leads to a seamless, less disruptive scaling and upgrade experience compared to the break-first approach previously utilized.
28+
29+
With MBB, there are some key behaviors that users need to be aware of:
30+
31+
1. MBB operations wait for existing workloads to finish on the current replicas before being terminated.
32+
This period is currently set to 1 hour, which means that scaling or upgrades can wait up to one hour for a long-running query on a replica before the replica is removed.
33+
Additionally, if a backup process is running on a replica, it is left to complete before the replica is terminated.
34+
2. Due to the fact that there is a waiting time before a replica is terminated, there can be situations where a cluster might have more than the maximum number of replicas set for the cluster.
35+
For example, you might have a service with 6 total replicas, but with an MBB operation in progress, 3 additional replicas may get added to the cluster leading to a total of 9 replicas, while the older replicas are still serving queries.
36+
This means that for a period of time, the cluster will have more than the desired number of replicas.
37+
Additionally, multiple MBB operations themselves can overlap, leading to replica accumulation. This can happen, for instance, in scenarios where several vertical scaling requests are sent to the cluster via the API.
38+
ClickHouse Cloud has checks in place to restrict the number of replicas that a cluster might accumulate.
39+
3. With MBB operations, system table data is kept for 30 days. This means every time an MBB operation happens on a cluster, 30 days worth of system table data is replicated from the old replicas to the new ones.
40+
41+
If you are interested in learning more about the mechanics of MBB operations, please look at this [blog post](https://clickhouse.com/blog/make-before-break-faster-scaling-mechanics-for-clickhouse-cloud) from the ClickHouse engineering team.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "Automatic Scaling",
3+
"collapsible": true,
4+
"collapsed": true,
5+
}

docs/cloud/features/08_backups/overview.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -154,9 +154,7 @@ After you have successfully inserted the data into your original service, make s
154154

155155
## Undeleting or undropping tables {#undeleting-or-undropping-tables}
156156

157-
<CloudNotSupportedBadge/>
158-
159-
The `UNDROP` command is not supported in ClickHouse Cloud. If you accidentally `DROP` a table, the best course of action is to restore your last backup and recreate the table from the backup.
157+
The `UNDROP` command is supported in ClickHouse Cloud through [Shared Catalog](https://clickhouse.com/docs/cloud/reference/shared-catalog).
160158

161159
To prevent users from accidentally dropping tables, you can use [`GRANT` statements](/sql-reference/statements/grant) to revoke permissions for the [`DROP TABLE` command](/sql-reference/statements/drop#drop-table) for a specific user or role.
162160

docs/cloud/reference/02_architecture.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ description: 'This page describes the architecture of ClickHouse Cloud'
66
doc_type: 'reference'
77
---
88

9-
import Architecture from '@site/static/images/cloud/reference/architecture.svg';
9+
import Image from '@theme/IdealImage';
10+
import Architecture from '@site/static/images/cloud/reference/architecture.png';
1011

1112
# ClickHouse Cloud architecture
1213

13-
<Architecture alt='ClickHouse Cloud architecture' class='image' />
14+
<Image img={Architecture} size='lg' alt='Cloud architecture'/>
1415

1516
## Storage backed by object store {#storage-backed-by-object-store}
1617
- Virtually unlimited storage

docs/faq/use-cases/key-value.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ If you decide to go against recommendations and run some key-value-like queries
1717

1818
- The key reason why point queries are expensive in ClickHouse is its sparse primary index of main [MergeTree table engine family](../..//engines/table-engines/mergetree-family/mergetree.md). This index can't point to each specific row of data, instead, it points to each N-th and the system has to scan from the neighboring N-th row to the desired one, reading excessive data along the way. In a key-value scenario, it might be useful to reduce the value of N with the `index_granularity` setting.
1919
- ClickHouse keeps each column in a separate set of files, so to assemble one complete row it needs to go through each of those files. Their count increases linearly with the number of columns, so in the key-value scenario, it might be worth avoiding using many columns and put all your payload in a single `String` column encoded in some serialization format like JSON, Protobuf, or whatever makes sense.
20-
- There's an alternative approach that uses [Join](../../engines/table-engines/special/join.md) table engine instead of normal `MergeTree` tables and [joinGet](../../sql-reference/functions/other-functions.md#joinget) function to retrieve the data. It can provide better query performance but might have some usability and reliability issues. Here's an [usage example](https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/00800_versatile_storage_join.sql#L49-L51).
20+
- There's an alternative approach that uses [Join](../../engines/table-engines/special/join.md) table engine instead of normal `MergeTree` tables and [joinGet](../../sql-reference/functions/other-functions.md#joinGet) function to retrieve the data. It can provide better query performance but might have some usability and reliability issues. Here's an [usage example](https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/00800_versatile_storage_join.sql#L49-L51).

docs/guides/developer/merge-table-function.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ AND multiIf(
129129
);
130130
```
131131

132-
We use the [`variantType`](/docs/sql-reference/functions/other-functions#varianttype) function to check the type of `winner_seed` for each row and then [`variantElement`](/docs/sql-reference/functions/other-functions#variantelement) to extract the underlying value.
132+
We use the [`variantType`](/docs/sql-reference/functions/other-functions#variantType) function to check the type of `winner_seed` for each row and then [`variantElement`](/docs/sql-reference/functions/other-functions#variantElement) to extract the underlying value.
133133
When the type is `String`, we cast to a number and then do the comparison.
134134
The result of running the query is shown below:
135135

docs/guides/generating-test-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Read the [Generating Random Data in ClickHouse](https://clickhouse.com/blog/gene
147147

148148
## Generating random tables {#generating-random-tables}
149149

150-
The [`generateRandomStructure`](/sql-reference/functions/other-functions#generaterandomstructure) function is particularly useful when combined with the [`generateRandom`](/sql-reference/table-functions/generate) table engine for testing, benchmarking, or creating mock data with arbitrary schemas.
150+
The [`generateRandomStructure`](/sql-reference/functions/other-functions#generateRandomStructure) function is particularly useful when combined with the [`generateRandom`](/sql-reference/table-functions/generate) table engine for testing, benchmarking, or creating mock data with arbitrary schemas.
151151

152152
Let's start by just seeing what a random structure looks like using the `generateRandomStructure` function:
153153

docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,41 @@ dbt relies on a read-after-insert consistency model. This is not compatible with
144144
| settings | A map/dictionary of "TABLE" settings to be used to DDL statements like 'CREATE TABLE' with this model | |
145145
| query_settings | A map/dictionary of ClickHouse user level settings to be used with `INSERT` or `DELETE` statements in conjunction with this model | |
146146
| ttl | A TTL expression to be used with the table. The TTL expression is a string that can be used to specify the TTL for the table. | |
147-
| indexes | A list of indexes to create, available only for `table` materialization. For examples look at ([#397](https://github.com/ClickHouse/dbt-clickhouse/pull/397)) | |
148-
| sql_security | Allow you to specify which ClickHouse user to use when executing the view's underlying query. [`SQL SECURITY`](https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security) has two legal values`definer` `invoker`. | |
147+
| indexes | A list of [data skipping indexes to create](/optimize/skipping-indexes). Check below for more information. | |
148+
| sql_security | Allow you to specify which ClickHouse user to use when executing the view's underlying query. `SQL SECURITY` [has two legal values](/sql-reference/statements/create/view#sql_security)`definer` `invoker`. | |
149149
| definer | If `sql_security` was set to `definer`, you have to specify any existing user or `CURRENT_USER` in the `definer` clause. | |
150+
| projections | A list of [projections](/data-modeling/projections) to be created. Check [About projections](#projections) for details. | |
151+
152+
#### About data skipping indexes {#data-skipping-indexes}
153+
154+
Data skipping indexes are only available for the `table` materialization. To add a list of data skipping indexes to a table, use the `indexes` configuration:
155+
156+
```sql
157+
{{ config(
158+
materialized='table',
159+
indexes=[{
160+
'name': 'your_index_name',
161+
'definition': 'your_column TYPE minmax GRANULARITY 2'
162+
}]
163+
) }}
164+
```
165+
166+
#### About projections {#projections}
167+
168+
You can add [projections](/data-modeling/projections) to `table` and `distributed_table` materializations using the `projections` configuration:
169+
170+
```sql
171+
{{ config(
172+
materialized='table',
173+
projections=[
174+
{
175+
'name': 'your_projection_name',
176+
'query': 'SELECT department, avg(age) AS avg_age GROUP BY department'
177+
}
178+
]
179+
) }}
180+
```
181+
**Note**: For distributed tables, the projection is applied to the `_local` tables, not to the distributed proxy table.
150182

151183
### Supported table engines {#supported-table-engines}
152184

@@ -191,7 +223,7 @@ should be carefully researched and tested.
191223
| codec | A string consisting of arguments passed to `CODEC()` in the column's DDL. For example: `codec: "Delta, ZSTD"` will be compiled as `CODEC(Delta, ZSTD)`. |
192224
| ttl | A string consisting of a [TTL (time-to-live) expression](https://clickhouse.com/docs/guides/developer/ttl) that defines a TTL rule in the column's DDL. For example: `ttl: ts + INTERVAL 1 DAY` will be compiled as `TTL ts + INTERVAL 1 DAY`. |
193225

194-
#### Example {#example}
226+
#### Example of schema configuration {#example-of-schema-configuration}
195227

196228
```yaml
197229
models:
@@ -209,6 +241,30 @@ models:
209241
ttl: ts + INTERVAL 1 DAY
210242
```
211243
244+
#### Adding complex types {#adding-complex-types}
245+
246+
dbt automatically determines the data type of each column by analyzing the SQL used to create the model. However, in some cases this process may not accurately determine the data type, leading to conflicts with the types specified in the contract `data_type` property. To address this, we recommend using the `CAST()` function in the model SQL to explicitly define the desired type. For example:
247+
248+
```sql
249+
{{
250+
config(
251+
materialized="materialized_view",
252+
engine="AggregatingMergeTree",
253+
order_by=["event_type"],
254+
)
255+
}}
256+
257+
select
258+
-- event_type may be infered as a String but we may prefer LowCardinality(String):
259+
CAST(event_type, 'LowCardinality(String)') as event_type,
260+
-- countState() may be infered as `AggregateFunction(count)` but we may prefer to change the type of the argument used:
261+
CAST(countState(), 'AggregateFunction(count, UInt32)') as response_count,
262+
-- maxSimpleState() may be infered as `SimpleAggregateFunction(max, String)` but we may prefer to also change the type of the argument used:
263+
CAST(maxSimpleState(event_type), 'SimpleAggregateFunction(max, LowCardinality(String))') as max_event_type
264+
from {{ ref('user_events') }}
265+
group by event_type
266+
```
267+
212268
## Features {#features}
213269

214270
### Materialization: view {#materialization-view}

0 commit comments

Comments
 (0)