Skip to content

Commit 0535f4c

Browse files
committed
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs into cta-component
2 parents 76f8edd + 67662e8 commit 0535f4c

File tree

61 files changed

+2622
-442
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+2622
-442
lines changed

docs/_snippets/_gather_your_details_http.mdx

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,18 @@ import Image from '@theme/IdealImage';
44

55
To connect to ClickHouse with HTTP(S) you need this information:
66

7-
- The HOST and PORT: typically, the port is 8443 when using TLS or 8123 when not using TLS.
7+
| Parameter(s) | Description |
8+
|-------------------------|---------------------------------------------------------------------------------------------------------------|
9+
|`HOST` and `PORT` | Typically, the port is 8443 when using TLS or 8123 when not using TLS. |
10+
|`DATABASE NAME` | Out of the box, there is a database named `default`, use the name of the database that you want to connect to.|
11+
|`USERNAME` and `PASSWORD`| Out of the box, the username is `default`. Use the username appropriate for your use case. |
812

9-
- The DATABASE NAME: out of the box, there is a database named `default`, use the name of the database that you want to connect to.
10-
11-
- The USERNAME and PASSWORD: out of the box, the username is `default`. Use the username appropriate for your use case.
12-
13-
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click **Connect**:
13+
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console.
14+
Select a service and click **Connect**:
1415

1516
<Image img={cloud_connect_button} size="md" alt="ClickHouse Cloud service connect button" border />
1617

17-
Choose **HTTPS**, and the details are available in an example `curl` command.
18+
Choose **HTTPS**. Connection details are displayed in an example `curl` command.
1819

1920
<Image img={connection_details_https} size="md" alt="ClickHouse Cloud HTTPS connection details" border/>
2021

docs/_snippets/_gather_your_details_native.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,14 @@ import Image from '@theme/IdealImage';
44

55
To connect to ClickHouse with native TCP you need this information:
66

7-
- The HOST and PORT: typically, the port is 9440 when using TLS, or 9000 when not using TLS.
8-
9-
- The DATABASE NAME: out of the box there is a database named `default`, use the name of the database that you want to connect to.
10-
11-
- The USERNAME and PASSWORD: out of the box the username is `default`. Use the username appropriate for your use case.
12-
13-
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click **Connect**:
7+
| Parameter(s) | Description |
8+
|---------------------------|---------------------------------------------------------------------------------------------------------------|
9+
| `HOST` and `PORT` | Typically, the port is 9440 when using TLS, or 9000 when not using TLS. |
10+
| `DATABASE NAME` | Out of the box there is a database named `default`, use the name of the database that you want to connect to. |
11+
| `USERNAME` and `PASSWORD` | Out of the box the username is `default`. Use the username appropriate for your use case. |
12+
13+
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console.
14+
Select the service that you will connect to and click **Connect**:
1415

1516
<Image img={cloud_connect_button} size="md" alt="ClickHouse Cloud service connect button" border/>
1617

docs/concepts/why-clickhouse-is-so-fast.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,6 @@ To avoid that too many parts accumulate, ClickHouse runs a [merge](/merges) oper
2626

2727
This approach has several advantages: All data processing can be [offloaded to background part merges](/concepts/why-clickhouse-is-so-fast#storage-layer-merge-time-computation), keeping data writes lightweight and highly efficient. Individual inserts are "local" in the sense that they do not need to update global, i.e. per-table data structures. As a result, multiple simultaneous inserts need no mutual synchronization or synchronization with existing table data, and thus inserts can be performed almost at the speed of disk I/O.
2828

29-
the holistic performance optimization section of the VLDB paper.
30-
3129
🤿 Deep dive into this in the [On-Disk Format](/docs/academic_overview#3-1-on-disk-format) section of the web version of our VLDB 2024 paper.
3230

3331
## Storage layer: concurrent inserts and selects are isolated {#storage-layer-concurrent-inserts-and-selects-are-isolated}
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
sidebar_label: 'Create your first object storage ClickPipe'
3+
description: 'Seamlessly connect your object storage to ClickHouse Cloud.'
4+
slug: /integrations/clickpipes/object-storage
5+
title: 'Creating your first object-storage ClickPipe'
6+
doc_type: 'guide'
7+
integration:
8+
- support_level: 'core'
9+
- category: 'clickpipes'
10+
---
11+
12+
import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
13+
import cp_step1 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step1.png';
14+
import cp_step2_object_storage from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step2_object_storage.png';
15+
import cp_step3_object_storage from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step3_object_storage.png';
16+
import cp_step4a from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a.png';
17+
import cp_step4a3 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a3.png';
18+
import cp_step4b from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4b.png';
19+
import cp_step5 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step5.png';
20+
import cp_success from '@site/static/images/integrations/data-ingestion/clickpipes/cp_success.png';
21+
import cp_remove from '@site/static/images/integrations/data-ingestion/clickpipes/cp_remove.png';
22+
import cp_destination from '@site/static/images/integrations/data-ingestion/clickpipes/cp_destination.png';
23+
import cp_overview from '@site/static/images/integrations/data-ingestion/clickpipes/cp_overview.png';
24+
import Image from '@theme/IdealImage';
25+
26+
Object Storage ClickPipes provide a simple and resilient way to ingest data from Amazon S3, Google Cloud Storage, Azure Blob Storage, and DigitalOcean Spaces into ClickHouse Cloud. Both one-time and continuous ingestion are supported with exactly-once semantics.
27+
28+
# Creating your first object storage ClickPipe {#creating-your-first-clickpipe}
29+
30+
## Prerequisite {#prerequisite}
31+
32+
- You have familiarized yourself with the [ClickPipes intro](../index.md).
33+
34+
## Navigate to data sources {#1-load-sql-console}
35+
36+
In the cloud console, select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe"
37+
38+
<Image img={cp_step0} alt="Select imports" size="lg" border/>
39+
40+
## Select a data source {#2-select-data-source}
41+
42+
Select your data source.
43+
44+
<Image img={cp_step1} alt="Select data source type" size="lg" border/>
45+
46+
## Configure the ClickPipe {#3-configure-clickpipe}
47+
48+
Fill out the form by providing your ClickPipe with a name, a description (optional), your IAM role or credentials, and bucket URL.
49+
You can specify multiple files using bash-like wildcards.
50+
For more information, [see the documentation on using wildcards in path](/integrations/clickpipes/object-storage/reference/#limitations).
51+
52+
<Image img={cp_step2_object_storage} alt="Fill out connection details" size="lg" border/>
53+
54+
## Select data format {#4-select-format}
55+
56+
The UI will display a list of files in the specified bucket.
57+
Select your data format (we currently support a subset of ClickHouse formats) and if you want to enable continuous ingestion.
58+
([More details below](/integrations/clickpipes/object-storage/reference/#continuous-ingest)).
59+
60+
<Image img={cp_step3_object_storage} alt="Set data format and topic" size="lg" border/>
61+
62+
## Configure table, schema and settings {#5-configure-table-schema-settings}
63+
64+
In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one.
65+
Follow the instructions in the screen to modify your table name, schema, and settings.
66+
You can see a real-time preview of your changes in the sample table at the top.
67+
68+
<Image img={cp_step4a} alt="Set table, schema, and settings" size="lg" border/>
69+
70+
You can also customize the advanced settings using the controls provided
71+
72+
<Image img={cp_step4a3} alt="Set advanced controls" size="lg" border/>
73+
74+
Alternatively, you can decide to ingest your data in an existing ClickHouse table.
75+
In that case, the UI will allow you to map fields from the source to the ClickHouse fields in the selected destination table.
76+
77+
<Image img={cp_step4b} alt="Use an existing table" size="lg" border/>
78+
79+
:::info
80+
You can also map [virtual columns](../../sql-reference/table-functions/s3#virtual-columns), like `_path` or `_size`, to fields.
81+
:::
82+
83+
## Configure permissions {#6-configure-permissions}
84+
85+
Finally, you can configure permissions for the internal ClickPipes user.
86+
87+
**Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
88+
- `Full access`: with the full access to the cluster. Required if you use materialized view or Dictionary with the destination table.
89+
- `Only destination table`: with the `INSERT` permissions to the destination table only.
90+
91+
<Image img={cp_step5} alt="Permissions" size="lg" border/>
92+
93+
## Complete setup {#7-complete-setup}
94+
95+
By clicking on "Complete Setup", the system will register your ClickPipe, and you'll be able to see it listed in the summary table.
96+
97+
<Image img={cp_success} alt="Success notice" size="sm" border/>
98+
99+
<Image img={cp_remove} alt="Remove notice" size="lg" border/>
100+
101+
The summary table provides controls to display sample data from the source or the destination table in ClickHouse
102+
103+
<Image img={cp_destination} alt="View destination" size="lg" border/>
104+
105+
As well as controls to remove the ClickPipe and display a summary of the ingest job.
106+
107+
<Image img={cp_overview} alt="View overview" size="lg" border/>
108+
109+
**Congratulations!** you have successfully set up your first ClickPipe.
110+
If this is a streaming ClickPipe, it will be continuously running, ingesting data in real-time from your remote data source.
111+
Otherwise, it will ingest the batch and complete.
Lines changed: 9 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
---
2-
sidebar_label: 'ClickPipes for object storage'
3-
description: 'Seamlessly connect your object storage to ClickHouse Cloud.'
4-
slug: /integrations/clickpipes/object-storage
5-
title: 'Integrating Object Storage with ClickHouse Cloud'
6-
doc_type: 'guide'
2+
sidebar_label: 'Reference'
3+
description: 'Details supported formats, exactly-once semantics, view-support, scaling, limitations, authentication with object storage ClickPipes'
4+
slug: /integrations/clickpipes/object-storage/reference
5+
sidebar_position: 1
6+
title: 'Reference'
7+
doc_type: 'reference'
78
integration:
89
- support_level: 'core'
910
- category: 'clickpipes'
@@ -14,85 +15,8 @@ import S3svg from '@site/static/images/integrations/logos/amazon_s3_logo.svg';
1415
import Gcssvg from '@site/static/images/integrations/logos/gcs.svg';
1516
import DOsvg from '@site/static/images/integrations/logos/digitalocean.svg';
1617
import ABSsvg from '@site/static/images/integrations/logos/azureblobstorage.svg';
17-
import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
18-
import cp_step1 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step1.png';
19-
import cp_step2_object_storage from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step2_object_storage.png';
20-
import cp_step3_object_storage from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step3_object_storage.png';
21-
import cp_step4a from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a.png';
22-
import cp_step4a3 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4a3.png';
23-
import cp_step4b from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step4b.png';
24-
import cp_step5 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step5.png';
25-
import cp_success from '@site/static/images/integrations/data-ingestion/clickpipes/cp_success.png';
26-
import cp_remove from '@site/static/images/integrations/data-ingestion/clickpipes/cp_remove.png';
27-
import cp_destination from '@site/static/images/integrations/data-ingestion/clickpipes/cp_destination.png';
28-
import cp_overview from '@site/static/images/integrations/data-ingestion/clickpipes/cp_overview.png';
2918
import Image from '@theme/IdealImage';
3019

31-
# Integrating object storage with ClickHouse Cloud
32-
Object Storage ClickPipes provide a simple and resilient way to ingest data from Amazon S3, Google Cloud Storage, Azure Blob Storage, and DigitalOcean Spaces into ClickHouse Cloud. Both one-time and continuous ingestion are supported with exactly-once semantics.
33-
34-
## Prerequisite {#prerequisite}
35-
You have familiarized yourself with the [ClickPipes intro](./index.md).
36-
37-
## Creating your first ClickPipe {#creating-your-first-clickpipe}
38-
39-
1. In the cloud console, select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe"
40-
41-
<Image img={cp_step0} alt="Select imports" size="lg" border/>
42-
43-
2. Select your data source.
44-
45-
<Image img={cp_step1} alt="Select data source type" size="lg" border/>
46-
47-
3. Fill out the form by providing your ClickPipe with a name, a description (optional), your IAM role or credentials, and bucket URL. You can specify multiple files using bash-like wildcards. For more information, [see the documentation on using wildcards in path](#limitations).
48-
49-
<Image img={cp_step2_object_storage} alt="Fill out connection details" size="lg" border/>
50-
51-
4. The UI will display a list of files in the specified bucket. Select your data format (we currently support a subset of ClickHouse formats) and if you want to enable continuous ingestion [More details below](#continuous-ingest).
52-
53-
<Image img={cp_step3_object_storage} alt="Set data format and topic" size="lg" border/>
54-
55-
5. In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
56-
57-
<Image img={cp_step4a} alt="Set table, schema, and settings" size="lg" border/>
58-
59-
You can also customize the advanced settings using the controls provided
60-
61-
<Image img={cp_step4a3} alt="Set advanced controls" size="lg" border/>
62-
63-
6. Alternatively, you can decide to ingest your data in an existing ClickHouse table. In that case, the UI will allow you to map fields from the source to the ClickHouse fields in the selected destination table.
64-
65-
<Image img={cp_step4b} alt="Use an existing table" size="lg" border/>
66-
67-
:::info
68-
You can also map [virtual columns](../../sql-reference/table-functions/s3#virtual-columns), like `_path` or `_size`, to fields.
69-
:::
70-
71-
7. Finally, you can configure permissions for the internal ClickPipes user.
72-
73-
**Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role:
74-
- `Full access`: with the full access to the cluster. Required if you use materialized view or Dictionary with the destination table.
75-
- `Only destination table`: with the `INSERT` permissions to the destination table only.
76-
77-
<Image img={cp_step5} alt="Permissions" size="lg" border/>
78-
79-
8. By clicking on "Complete Setup", the system will register you ClickPipe, and you'll be able to see it listed in the summary table.
80-
81-
<Image img={cp_success} alt="Success notice" size="sm" border/>
82-
83-
<Image img={cp_remove} alt="Remove notice" size="lg" border/>
84-
85-
The summary table provides controls to display sample data from the source or the destination table in ClickHouse
86-
87-
<Image img={cp_destination} alt="View destination" size="lg" border/>
88-
89-
As well as controls to remove the ClickPipe and display a summary of the ingest job.
90-
91-
<Image img={cp_overview} alt="View overview" size="lg" border/>
92-
93-
Image
94-
9. **Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source. Otherwise it will ingest the batch and complete.
95-
9620
## Supported data sources {#supported-data-sources}
9721

9822
| Name |Logo|Type| Status | Description |
@@ -134,9 +58,9 @@ To increase the throughput on large ingest jobs, we recommend scaling the ClickH
13458
- ClickPipes will only attempt to ingest objects at 10GB or smaller in size. If a file is greater than 10GB an error will be appended to the ClickPipes dedicated error table.
13559
- Azure Blob Storage pipes with continuous ingest on containers with over 100k files will have a latency of around 10–15 seconds in detecting new files. Latency increases with file count.
13660
- Object Storage ClickPipes **does not** share a listing syntax with the [S3 Table Function](/sql-reference/table-functions/s3), nor Azure with the [AzureBlobStorage Table function](/sql-reference/table-functions/azureBlobStorage).
137-
- `?` Substitutes any single character
138-
- `*` Substitutes any number of any characters except / including empty string
139-
- `**` Substitutes any number of any character include / including empty string
61+
- `?` - Substitutes any single character
62+
- `*` - Substitutes any number of any characters except / including empty string
63+
- `**` - Substitutes any number of any character include / including empty string
14064

14165
:::note
14266
This is a valid path (for S3):
@@ -179,13 +103,3 @@ Currently only protected buckets are supported for DigitalOcean spaces. You requ
179103

180104
### Azure Blob Storage {#azureblobstorage}
181105
Currently only protected buckets are supported for Azure Blob Storage. Authentication is done via a connection string, which supports access keys and shared keys. For more information, read [this guide](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string).
182-
183-
## FAQ {#faq}
184-
185-
- **Does ClickPipes support GCS buckets prefixed with `gs://`?**
186-
187-
No. For interoperability reasons we ask you to replace your `gs://` bucket prefix with `https://storage.googleapis.com/`.
188-
189-
- **What permissions does a GCS public bucket require?**
190-
191-
`allUsers` requires appropriate role assignment. The `roles/storage.objectViewer` role must be granted at the bucket level. This role provides the `storage.objects.list` permission, which allows ClickPipes to list all objects in the bucket which is required for onboarding and ingestion. This role also includes the `storage.objects.get` permission, which is required to read or download individual objects in the bucket. See: [Google Cloud Access Control](https://cloud.google.com/storage/docs/access-control/iam-roles) for further information.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
sidebar_label: 'FAQ'
3+
description: 'FAQ for object storage ClickPipes'
4+
slug: /integrations/clickpipes/object-storage/faq
5+
sidebar_position: 1
6+
title: 'FAQ'
7+
doc_type: 'reference'
8+
integration:
9+
- support_level: 'core'
10+
- category: 'clickpipes'
11+
---
12+
13+
## FAQ {#faq}
14+
15+
<details>
16+
<summary>Does ClickPipes support GCS buckets prefixed with `gs://`?</summary>
17+
18+
No. For interoperability reasons we ask you to replace your `gs://` bucket prefix with `https://storage.googleapis.com/`.
19+
20+
</details>
21+
22+
<details>
23+
<summary>What permissions does a GCS public bucket require?</summary>
24+
25+
`allUsers` requires appropriate role assignment. The `roles/storage.objectViewer` role must be granted at the bucket level. This role provides the `storage.objects.list` permission, which allows ClickPipes to list all objects in the bucket which is required for onboarding and ingestion. This role also includes the `storage.objects.get` permission, which is required to read or download individual objects in the bucket. See: [Google Cloud Access Control](https://cloud.google.com/storage/docs/access-control/iam-roles) for further information.
26+
27+
</details>

0 commit comments

Comments
 (0)