Skip to content

Commit 4cb5b0d

Browse files
authored
Merge pull request #4286 from ClickHouse/Blargian-patch-99
Improvement: add explainer about wide vs compact parts to "Compression in ClickHouse"
2 parents e43ee3d + fb28e6a commit 4cb5b0d

File tree

1 file changed

+73
-0
lines changed

1 file changed

+73
-0
lines changed

docs/data-compression/compression-in-clickhouse.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,79 @@ GROUP BY name
6262
└───────────────────────┴─────────────────┴───────────────────┴────────────┘
6363
```
6464

65+
<details>
66+
67+
<summary>A note on compact versus wide parts</summary>
68+
69+
If you are seeing `compressed_size` or `uncompressed_size` values equal to `0`, this could be because the type of the
70+
parts are `compact` and not `wide` (see description for `part_type` in [`system.parts`](/operations/system-tables/parts)).
71+
The part format is controlled by settings [`min_bytes_for_wide_part`](/operations/settings/merge-tree-settings#min_bytes_for_wide_part)
72+
and [`min_rows_for_wide_part`](/operations/settings/merge-tree-settings#min_rows_for_wide_part) meaning that if the inserted
73+
data results in a part which does not exceed the values of the aforementioned settings, the part will be compact rather
74+
than wide and you will not see the values for `compressed_size` or `uncompressed_size`.
75+
76+
To demonstrate:
77+
78+
```sql title="Query"
79+
-- Create a table with compact parts
80+
CREATE TABLE compact (
81+
number UInt32
82+
)
83+
ENGINE = MergeTree()
84+
ORDER BY number
85+
AS SELECT * FROM numbers(100000); -- Not big enough to exceed default of min_bytes_for_wide_part = 10485760
86+
87+
-- Check the type of the parts
88+
SELECT table, name, part_type from system.parts where table = 'compact';
89+
90+
-- Get the compressed and uncompressed column sizes for the compact table
91+
SELECT name,
92+
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
93+
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
94+
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
95+
FROM system.columns
96+
WHERE table = 'compact'
97+
GROUP BY name;
98+
99+
-- Create a table with wide parts
100+
CREATE TABLE wide (
101+
number UInt32
102+
)
103+
ENGINE = MergeTree()
104+
ORDER BY number
105+
SETTINGS min_bytes_for_wide_part=0
106+
AS SELECT * FROM numbers(100000);
107+
108+
-- Check the type of the parts
109+
SELECT table, name, part_type from system.parts where table = 'wide';
110+
111+
-- Get the compressed and uncompressed sizes for the wide table
112+
SELECT name,
113+
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
114+
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
115+
round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio
116+
FROM system.columns
117+
WHERE table = 'wide'
118+
GROUP BY name;
119+
```
120+
121+
```response title="Response"
122+
┌─table───┬─name──────┬─part_type─┐
123+
1. │ compact │ all_1_1_0 │ Compact │
124+
└─────────┴───────────┴───────────┘
125+
┌─name───┬─compressed_size─┬─uncompressed_size─┬─ratio─┐
126+
1. │ number │ 0.00 B │ 0.00 B │ nan │
127+
└────────┴─────────────────┴───────────────────┴───────┘
128+
┌─table─┬─name──────┬─part_type─┐
129+
1. │ wide │ all_1_1_0 │ Wide │
130+
└───────┴───────────┴───────────┘
131+
┌─name───┬─compressed_size─┬─uncompressed_size─┬─ratio─┐
132+
1. │ number │ 392.31 KiB │ 390.63 KiB │ 1 │
133+
└────────┴─────────────────┴───────────────────┴───────┘
134+
```
135+
136+
</details>
137+
65138
We show both a compressed and uncompressed size here. Both are important. The compressed size equates to what we will need to read off disk - something we want to minimize for query performance (and storage cost). This data will need to be decompressed prior to reading. The size of this uncompressed size will be dependent on the data type used in this case. Minimizing this size will reduce memory overhead of queries and the amount of data which has to be processed by the query, improving utilization of caches and ultimately query times.
66139

67140
> The above query relies on the table `columns` in the system database. This database is managed by ClickHouse and is a treasure trove of useful information, from query performance metrics to background cluster logs. We recommend ["System Tables and a Window into the Internals of ClickHouse"](https://clickhouse.com/blog/clickhouse-debugging-issues-with-system-tables) and accompanying articles[[1]](https://clickhouse.com/blog/monitoring-troubleshooting-insert-queries-clickhouse)[[2]](https://clickhouse.com/blog/monitoring-troubleshooting-select-queries-clickhouse) for the curious reader.

0 commit comments

Comments
 (0)