You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
15
15
16
-
This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
16
+
This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput of around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
17
17
18
18
Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
19
19
20
20
## Data compression {#data-compression}
21
21
22
22
Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance.
23
23
24
-
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
24
+
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allows ClickHouse to compete with and outperform more niche databases, like time-series ones.
25
25
26
26
## Disk storage of data {#disk-storage-of-data}
27
27
@@ -41,9 +41,9 @@ In ClickHouse, data can reside on different shards. Each shard can be a group of
41
41
42
42
## SQL support {#sql-support}
43
43
44
-
ClickHouse supports [SQL language](/sql-reference/) that is mostly compatible with the ANSI SQL standard.
44
+
ClickHouse supports [a declarative query language](/sql-reference/) based on SQL that is mostly compatible with the ANSI SQL standard.
45
45
46
-
Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), [JOIN](../sql-reference/statements/select/join.md) clause, [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
46
+
Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), the [JOIN](../sql-reference/statements/select/join.md) clause, the[IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
47
47
48
48
Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future.
49
49
@@ -67,7 +67,7 @@ Unlike other database management systems, secondary indexes in ClickHouse do not
67
67
68
68
Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later").
69
69
70
-
In ClickHouse "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment as the user interface page is loading. In other words, online.
70
+
In ClickHouse, "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the moment when the user interface page is loading — in other words, *online*.
71
71
72
72
## Support for approximated calculations {#support-for-approximated-calculations}
73
73
@@ -79,7 +79,7 @@ ClickHouse provides various ways to trade accuracy for performance:
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
82
+
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hashjoin and falling back to merge join if there's more than one large table.
83
83
84
84
## Data replication and data integrity support {#data-replication-and-data-integrity-support}
85
85
@@ -89,7 +89,7 @@ For more information, see the section [Data replication](../engines/table-engine
89
89
90
90
## Role-Based Access Control {#role-based-access-control}
91
91
92
-
ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in ANSI SQL standard and popular relational database management systems.
92
+
ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in the ANSI SQL standard and popular relational database management systems.
93
93
94
94
## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}
Copy file name to clipboardExpand all lines: docs/concepts/olap.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,17 +11,17 @@ keywords: ['OLAP']
11
11
12
12
[OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. At the highest level, you can just read these words backward:
13
13
14
-
**Processing**some source data is processed…
14
+
**Processing**— Some source data is processed…
15
15
16
-
**Analytical** …to produce some analytical reports and insights…
16
+
**Analytical**— …to produce some analytical reports and insights…
17
17
18
-
**Online** …in real-time.
18
+
**Online**— …in real-time.
19
19
20
20
## OLAP from the business perspective {#olap-from-the-business-perspective}
21
21
22
-
In recent years business people started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in.
22
+
In recent years business people have started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in.
23
23
24
-
In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence).
24
+
In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI (business intelligence) applications.
25
25
26
26
ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and so an in-house data warehouse scenario is also viable.
27
27
@@ -35,5 +35,5 @@ Even if a DBMS started out as a pure OLAP or pure OLTP, it is forced to move in
35
35
36
36
The fundamental trade-off between OLAP and OLTP systems remains:
37
37
38
-
- To build analytical reports efficiently it's crucial to be able to read columns separately, thus most OLAP databases are [columnar](https://clickhouse.com/engineering-resources/what-is-columnar-database),
38
+
- To build analytical reports efficiently it's crucial to be able to read columns separately, thus most OLAP databases are [columnar](https://clickhouse.com/engineering-resources/what-is-columnar-database);
39
39
- While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows.
Copy file name to clipboardExpand all lines: docs/concepts/why-clickhouse-is-so-fast.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -135,12 +135,12 @@ Algorithms that rely on data characteristics often perform better than their gen
135
135
## VLDB 2024 paper {#vldb-2024-paper}
136
136
137
137
In August 2024, we had our first research paper accepted and published at VLDB.
138
-
VLDB in an international conference on very large databases, and is widely regarded as one of the leading conferences in the field of data management.
138
+
VLDB is an international conference on very large databases, and is widely regarded as one of the leading conferences in the field of data management.
139
139
Among the hundreds of submissions, VLDB generally has an acceptance rate of ~20%.
140
140
141
141
You can read a [PDF of the paper](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) or our [web version](/docs/academic_overview) of it, which gives a concise description of ClickHouse's most interesting architectural and system design components that make it so fast.
142
142
143
143
Alexey Milovidov, our CTO and the creator of ClickHouse, presented the paper (slides [here](https://raw.githubusercontent.com/ClickHouse/clickhouse-presentations/master/2024-vldb/VLDB_2024_presentation.pdf)), followed by a Q&A (that quickly ran out of time!).
144
144
You can catch the recorded presentation here:
145
145
146
-
<iframewidth="1024"height="576"src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
146
+
<iframewidth="1024"height="576"src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
Copy file name to clipboardExpand all lines: docs/deployment-modes.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,7 @@ The combination of remote table functions and access to the local file system ma
67
67
68
68
## chDB {#chdb}
69
69
70
-
[chDB](/chdb) is ClickHouse embedded as an in-process database engine,, with Python being the primary implementation, though it's also available for Go, Rust, NodeJS, and Bun. This deployment option brings ClickHouse's powerful OLAP capabilities directly into your application's process, eliminating the need for a separate database installation.
70
+
[chDB](/chdb) is ClickHouse embedded as an in-process database engine, with Python being the primary implementation, though it's also available for Go, Rust, NodeJS, and Bun. This deployment option brings ClickHouse's powerful OLAP capabilities directly into your application's process, eliminating the need for a separate database installation.
Copy file name to clipboardExpand all lines: docs/getting-started/install/advanced.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,14 +36,14 @@ interesting for users.
36
36
37
37
:::note
38
38
Since ClickHouse's CI is evolving over time, the exact steps to download CI-generated builds may vary.
39
-
Also, CI may delete too old build artifacts, making them unavailable for download.
39
+
Also, CI may delete old build artifacts, making them unavailable for download.
40
40
:::
41
41
42
-
For example, to download a aarch64 binary for ClickHouse v23.4, follow these steps:
42
+
For example, to download an aarch64 binary for ClickHouse v23.4, follow these steps:
43
43
44
44
- Find the GitHub pull request for release v23.4: [Release pull request for branch 23.4](https://github.com/ClickHouse/ClickHouse/pull/49238)
45
-
- Click "Commits", then click a commit similar to "Update autogenerated version to 23.4.2.1 and contributors" for the particular version you like to install.
45
+
- Click "Commits", then click on a commit similar to "Update autogenerated version to 23.4.2.1 and contributors" for the particular version you'd like to install.
46
46
- Click the green check / yellow dot / red cross to open the list of CI checks.
47
-
- Click "Details" next to "Builds" in the list, it will open a page similar to [this page](https://s3.amazonaws.com/clickhouse-test-reports/46793/b460eb70bf29b19eadd19a1f959b15d186705394/clickhouse_build_check/report.html)
48
-
- Find the rows with compiler = "clang-*-aarch64" - there are multiple rows.
47
+
- Click "Details" next to "Builds" in the list; it will open a page similar to [this page](https://s3.amazonaws.com/clickhouse-test-reports/46793/b460eb70bf29b19eadd19a1f959b15d186705394/clickhouse_build_check/report.html).
48
+
- Find the rows with compiler = "clang-*-aarch64" — there are multiple rows.
Copy file name to clipboardExpand all lines: docs/guides/developer/mutations.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -89,7 +89,7 @@ View the [`DELETE` statement](/sql-reference/statements/delete.md) docs page for
89
89
90
90
## Lightweight deletes {#lightweight-deletes}
91
91
92
-
Another option for deleting rows it to use the `DELETE FROM` command, which is referred to as a **lightweight delete**. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries, so you do not have to wait for a merging of parts or use the `FINAL` keyword. Cleanup of data happens asynchronously in the background.
92
+
Another option for deleting rows is to use the `DELETE FROM` command, which is referred to as a **lightweight delete**. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries, so you do not have to wait for a merging of parts or use the `FINAL` keyword. Cleanup of data happens asynchronously in the background.
93
93
94
94
``` sql
95
95
DELETE FROM [db.]table [ON CLUSTER cluster] [WHERE expr]
Copy file name to clipboardExpand all lines: docs/guides/inserting-data.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,7 +137,7 @@ Unlike many traditional databases, ClickHouse supports an HTTP interface.
137
137
Users can use this for both inserting and querying data, using any of the above formats.
138
138
This is often preferable to ClickHouse's native protocol as it allows traffic to be easily switched with load balancers.
139
139
We expect small differences in insert performance with the native protocol, which incurs a little less overhead.
140
-
Existing clients use either of these protocols (in some cases both e.g. the Go client).
140
+
Existing clients use either of these protocols (in some cases both e.g. the Go client).
141
141
The native protocol does allow query progress to be easily tracked.
142
142
143
143
See [HTTP Interface](/interfaces/http) for further details.
@@ -149,7 +149,7 @@ For loading data from Postgres, users can use:
149
149
-`PeerDB by ClickHouse`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
150
150
- ClickHouse Cloud - available through our [new connector](/integrations/clickpipes/postgres) in ClickPipes, our managed ingestion service.
151
151
- Self-managed - via the [open-source project](https://github.com/PeerDB-io/peerdb).
152
-
- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
152
+
- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's of millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
153
153
- Data can be exported from PostgreSQL in CSV format. This can then be inserted into ClickHouse from either local files or via object storage using table functions.
0 commit comments