Skip to content

Commit d596fa9

Browse files
authored
Merge pull request #4460 from Username46786/patch-2
Add Knowledge Base tutorial: rollups with Materialized Views for time-series analytics
2 parents df69d1f + f2594ff commit d596fa9

File tree

9 files changed

+262
-12
lines changed

9 files changed

+262
-12
lines changed
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
---
2+
title: 'Build a rollup with materialized views for fast time-series analytics'
3+
slug: /knowledgebase/materialized-view-rollup-timeseries
4+
description: 'End-to-end example creating a raw events table, a rollup table, and a materialized view for low-latency analytics.'
5+
keywords: ['materialized view', 'rollup', 'aggregate', 'timeseries', 'tutorial']
6+
doc_type: 'guide'
7+
---
8+
9+
import Tabs from '@theme/Tabs';
10+
import TabItem from '@theme/TabItem';
11+
12+
> This tutorial shows you how to maintain pre-aggregated roll-ups from a high-volume events table using [**materialized views**](/materialized-views).
13+
You’ll create three objects: a raw table, a rollup table, and the materialized view that writes into the rollup automatically.
14+
15+
## When to use this pattern {#when-to-use}
16+
17+
Use this pattern when:
18+
- You have an **append-only events stream** (clicks, pageviews, IoT, logs).
19+
- Most queries are **aggregations** over time ranges (per minute/hour/day).
20+
- You want **consistent sub-second reads** without re-scanning all raw rows.
21+
22+
<VerticalStepper headerLevel="h2">
23+
24+
## Create the raw events table {#create-raw-events-table}
25+
26+
```sql
27+
CREATE TABLE events_raw
28+
(
29+
event_time DateTime,
30+
user_id UInt64,
31+
country LowCardinality(String),
32+
event_type LowCardinality(String),
33+
value Float64
34+
)
35+
ENGINE = MergeTree
36+
PARTITION BY toYYYYMM(event_time)
37+
ORDER BY (event_time, user_id)
38+
TTL event_time + INTERVAL 90 DAY DELETE
39+
```
40+
41+
**Notes**
42+
43+
- `PARTITION BY toYYYYMM(event_time)` keeps partitions small and easy to drop.
44+
- `ORDER BY (event_time, user_id)` supports time-bounded queries + secondary filter.
45+
- `LowCardinality(String)` saves memory for categorical dimensions.
46+
- `TTL` cleans up raw data after 90 days (tune to your retention requirements).
47+
48+
## Design the rollup (aggregated) table {#design-rollup}
49+
50+
We’ll pre-aggregate to **hourly** granularity.
51+
Choose your grain to match the most common analysis window.
52+
53+
```sql
54+
CREATE TABLE events_rollup_1h
55+
(
56+
bucket_start DateTime, -- start of the hour
57+
country LowCardinality(String),
58+
event_type LowCardinality(String),
59+
users_uniq AggregateFunction(uniqExact, UInt64),
60+
value_sum AggregateFunction(sum, Float64),
61+
value_avg AggregateFunction(avg, Float64),
62+
events_count AggregateFunction(count)
63+
)
64+
ENGINE = AggregatingMergeTree
65+
PARTITION BY toYYYYMM(bucket_start)
66+
ORDER BY (bucket_start, country, event_type)
67+
```
68+
69+
We store **aggregate states** (e.g., `AggregateFunction(sum, ...)`) which compactly represent partial aggregates and can be merged or finalized later.
70+
71+
## Create a materialized view that populates the rollup {#create-materialized-view-to-populate-rollup}
72+
73+
This materialized view fires automatically on inserts into `events_raw` and writes **aggregate states** into the rollup.
74+
75+
```sql
76+
CREATE MATERIALIZED VIEW mv_events_rollup_1h
77+
TO events_rollup_1h
78+
AS
79+
SELECT
80+
toStartOfHour(event_time) AS bucket_start,
81+
country,
82+
event_type,
83+
uniqExactState(user_id) AS users_uniq,
84+
sumState(value) AS value_sum,
85+
avgState(value) AS value_avg,
86+
countState() AS events_count
87+
FROM events_raw
88+
GROUP BY bucket_start, country, event_type;
89+
```
90+
91+
## Insert some sample data {#insert-some-sample-data}
92+
93+
Insert some sample data:
94+
95+
```sql
96+
INSERT INTO events_raw VALUES
97+
(now() - INTERVAL 4 SECOND, 101, 'US', 'view', 1),
98+
(now() - INTERVAL 3 SECOND, 101, 'US', 'click', 1),
99+
(now() - INTERVAL 2 SECOND, 202, 'DE', 'view', 1),
100+
(now() - INTERVAL 1 SECOND, 101, 'US', 'view', 1);
101+
```
102+
103+
## Querying the rollup {#querying-the-rollup}
104+
105+
You can either **merge** states at read time, or **finalize** them:
106+
107+
<Tabs groupId="finalize">
108+
<TabItem value="merge" label="Merge at read time">
109+
110+
```sql
111+
SELECT
112+
bucket_start,
113+
country,
114+
event_type,
115+
uniqExactMerge(users_uniq) AS users,
116+
sumMerge(value_sum) AS value_sum,
117+
avgMerge(value_avg) AS value_avg,
118+
countMerge(events_count) AS events
119+
FROM events_rollup_1h
120+
WHERE bucket_start >= now() - INTERVAL 1 DAY
121+
GROUP BY ALL
122+
ORDER BY bucket_start, country, event_type;
123+
```
124+
125+
</TabItem>
126+
<TabItem value="finalize" label="Finalize with -Final">
127+
128+
```sql
129+
SELECT
130+
bucket_start,
131+
country,
132+
event_type,
133+
uniqExactMerge(users_uniq) AS users,
134+
sumMerge(value_sum) AS value_sum,
135+
avgMerge(value_avg) AS value_avg,
136+
countMerge(events_count) AS events
137+
FROM events_rollup_1h
138+
WHERE bucket_start >= now() - INTERVAL 1 DAY
139+
GROUP BY ALL
140+
ORDER BY bucket_start, country, event_type
141+
SETTINGS final = 1; -- or use SELECT ... FINAL
142+
```
143+
144+
</TabItem>
145+
</Tabs>
146+
147+
<br/>
148+
:::tip
149+
If you expect reads to always hit the rollup, you can create a **second materialized view** that writes *finalized* numbers to a “plain” `MergeTree` table at the same 1h grain.
150+
States give more flexibility while finalized numbers give slightly simpler reads.
151+
:::
152+
153+
## Filter on fields in the primary key for best performance {#filtering-performance}
154+
155+
You can use the `EXPLAIN` command to see how the index is used to prune data:
156+
157+
```sql title="Query"
158+
EXPLAIN indexes=1
159+
SELECT *
160+
FROM events_rollup_1h
161+
WHERE bucket_start BETWEEN now() - INTERVAL 3 DAY AND now()
162+
AND country = 'US';
163+
```
164+
165+
```response title="Response"
166+
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
167+
1. │ Expression ((Project names + Projection)) │
168+
2. │ Expression │
169+
3. │ ReadFromMergeTree (default.events_rollup_1h) │
170+
4. │ Indexes: │
171+
5. │ MinMax │
172+
6. │ Keys: │
173+
7. │ bucket_start │
174+
8. │ Condition: and((bucket_start in (-Inf, 1758550242]), (bucket_start in [1758291042, +Inf))) │
175+
9. │ Parts: 1/1 │
176+
10. │ Granules: 1/1 │
177+
11. │ Partition │
178+
12. │ Keys: │
179+
13. │ toYYYYMM(bucket_start) │
180+
14. │ Condition: and((toYYYYMM(bucket_start) in (-Inf, 202509]), (toYYYYMM(bucket_start) in [202509, +Inf))) │
181+
15. │ Parts: 1/1 │
182+
16. │ Granules: 1/1 │
183+
17. │ PrimaryKey │
184+
18. │ Keys: │
185+
19. │ bucket_start │
186+
20. │ country │
187+
21. │ Condition: and((country in ['US', 'US']), and((bucket_start in (-Inf, 1758550242]), (bucket_start in [1758291042, +Inf)))) │
188+
22. │ Parts: 1/1 │
189+
23. │ Granules: 1/1 │
190+
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
191+
```
192+
193+
The query execution plan above shows three types of indexes being used:
194+
a MinMax index, a partition index, and a primary key Index.
195+
Each index makes use of fields specified in our primary key: `(bucket_start, country, event_type)`.
196+
For best filtering performance you will want to make sure that your queries are making use of primary key fields to prune data.
197+
198+
## Common variations {#common-variations}
199+
200+
- **Different grains**: add a daily rollup:
201+
202+
```sql
203+
CREATE TABLE events_rollup_1d
204+
(
205+
bucket_start Date,
206+
country LowCardinality(String),
207+
event_type LowCardinality(String),
208+
users_uniq AggregateFunction(uniqExact, UInt64),
209+
value_sum AggregateFunction(sum, Float64),
210+
value_avg AggregateFunction(avg, Float64),
211+
events_count AggregateFunction(count)
212+
)
213+
ENGINE = AggregatingMergeTree
214+
PARTITION BY toYYYYMM(bucket_start)
215+
ORDER BY (bucket_start, country, event_type);
216+
```
217+
218+
Then a second materialized view:
219+
220+
```sql
221+
CREATE MATERIALIZED VIEW mv_events_rollup_1d
222+
TO events_rollup_1d
223+
AS
224+
SELECT
225+
toDate(event_time) AS bucket_start,
226+
country,
227+
event_type,
228+
uniqExactState(user_id),
229+
sumState(value),
230+
avgState(value),
231+
countState()
232+
FROM events_raw
233+
GROUP BY ALL;
234+
```
235+
236+
- **Compression**: apply codecs to big columns (example: `Codec(ZSTD(3))`) on the raw table.
237+
- **Cost control**: push heavy retention to the raw table and keep long-lived roll-ups.
238+
- **Backfilling**: when loading historical data, insert into `events_raw` and let the materialized view build roll-ups automatically. For existing rows, use `POPULATE` on materialized view creation if suitable, or `INSERT SELECT`.
239+
240+
## Clean-up and retention {#clean-up-and-retention}
241+
242+
- Increase raw TTL (e.g., 30/90 days) but keep roll-ups for longer (e.g., 1 year).
243+
- You can also use **TTL to move** old parts to cheaper storage if tiering is enabled.
244+
245+
## Troubleshooting {#troubleshooting}
246+
247+
- Materialized view not updating? Check that inserts go to the **events_raw** (not the roll-up table), and that the materialized view target is correct (`TO events_rollup_1h`).
248+
- Slow queries? Confirm they hit the rollup (query the rollup table directly) and that the time filters align to the rollup grain.
249+
- Backfill mismatches? Use `SYSTEM FLUSH LOGS` and check `system.query_log` / `system.parts` to confirm inserts and merges.
250+
251+
</VerticalStepper>

docs/use-cases/time-series/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ ClickHouse excels at handling time-series data, offering powerful features for b
1515

1616
This guide walks you through everything you need to know about working with time-series data in ClickHouse, from basic concepts to advanced optimization techniques. You'll learn how to:
1717

18-
* [Choose the right date and time data types for your use case](./date-time-data-types.md)
19-
* [Perform common time-series operations and aggregations](./basic-operations.md)
20-
* [Apply specialized analysis functions for time-based data](./analysis-functions.md)
21-
* [Optimize storage efficiency for temporal data](./storage-efficiency.md)
22-
* [Tune query performance for time-series workloads](./query-performance.md)
18+
* [Choose the right date and time data types for your use case](/use-cases/time-series/date-time-data-types)
19+
* [Perform common time-series operations and aggregations](/use-cases/time-series/basic-operations)
20+
* [Apply specialized analysis functions for time-based data](/use-cases/time-series/analysis-functions)
21+
* [Optimize storage efficiency for temporal data](/use-cases/time-series/storage-efficiency)
22+
* [Tune query performance for time-series workloads](/use-cases/time-series/query-performance)
2323

2424
Whether you're new to time-series analysis or looking to optimize an existing implementation, this guide will help you make the most of ClickHouse's time-series capabilities.

scripts/aspell-dict-file.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -963,9 +963,9 @@ pecreate
963963
drilldown
964964
--docs/cloud/get-started/cloud-quick-start.md--
965965
createservice
966-
--docs/use-cases/time-series/query-performance.md--
966+
--docs/use-cases/time-series/05_query-performance.md--
967967
subproject
968-
--docs/use-cases/time-series/analysis-functions.md--
968+
--docs/use-cases/time-series/03_analysis-functions.md--
969969
Yankovic's
970970
--docs/migrations/bigquery/migrating-to-clickhouse-cloud.md--
971971
bigquery

sidebars.js

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -160,11 +160,10 @@ const sidebars = {
160160
collapsible: true,
161161
link: { type: "doc", id: "use-cases/time-series/index" },
162162
items: [
163-
"use-cases/time-series/date-time-data-types",
164-
"use-cases/time-series/basic-operations",
165-
"use-cases/time-series/analysis-functions",
166-
"use-cases/time-series/storage-efficiency",
167-
"use-cases/time-series/query-performance"
163+
{
164+
type: "autogenerated",
165+
dirName: "use-cases/time-series",
166+
}
168167
]
169168
},
170169
{

0 commit comments

Comments
 (0)