Skip to content

Commit b69c684

Browse files
docs: add tutorial for rollups with Materialized Views
1 parent cba9548 commit b69c684

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
title: Build a rollup with Materialized Views for fast time-series analytics
3+
slug: /knowledgebase/materialized-view-rollup-timeseries
4+
description: End-to-end example creating a raw events table, a rollup table, and a materialized view for low-latency analytics.
5+
keywords: [materialized view, rollup, aggregate, timeseries, tutorial]
6+
---
7+
8+
import Tabs from '@theme/Tabs';
9+
import TabItem from '@theme/TabItem';
10+
11+
> This tutorial shows how to maintain pre-aggregated rollups from a high-volume events table using **Materialized Views**. You’ll create three objects: a raw table, a rollup table, and the MV that writes into the rollup automatically.
12+
13+
## When to use this pattern
14+
15+
- You have an **append-only events stream** (clicks, pageviews, IoT, logs).
16+
- Most queries are **aggregations** over time ranges (per minute/hour/day).
17+
- You want **consistent sub-second reads** without re-scanning all raw rows.
18+
19+
## 1) Create a raw events table
20+
21+
```sql
22+
CREATE TABLE events_raw
23+
(
24+
event_time DateTime,
25+
user_id UInt64,
26+
country LowCardinality(String),
27+
event_type LowCardinality(String),
28+
value Float64
29+
)
30+
ENGINE = MergeTree
31+
PARTITION BY toYYYYMM(event_time)
32+
ORDER BY (event_time, user_id)
33+
TTL event_time + INTERVAL 90 DAY DELETE
34+
SETTINGS index_granularity = 8192;
35+
```
36+
37+
**Notes**
38+
39+
- `PARTITION BY toYYYYMM(event_time)` keeps partitions small and easy to drop.
40+
- `ORDER BY (event_time, user_id)` supports time-bounded queries + secondary filter.
41+
- `LowCardinality(String)` saves memory for categorical dimensions.
42+
- `TTL` cleans up raw data after 90 days (tune to your retention).
43+
44+
## 2) Design the rollup (aggregated) table
45+
46+
We’ll pre-aggregate to **hourly** grain. Choose your grain to match the most common analysis window.
47+
48+
```sql
49+
CREATE TABLE events_rollup_1h
50+
(
51+
bucket_start DateTime, -- start of the hour
52+
country LowCardinality(String),
53+
event_type LowCardinality(String),
54+
users_uniq AggregateFunction(uniqExact, UInt64),
55+
value_sum AggregateFunction(sum, Float64),
56+
value_avg AggregateFunction(avg, Float64),
57+
events_count AggregateFunction(count)
58+
)
59+
ENGINE = AggregatingMergeTree
60+
PARTITION BY toYYYYMM(bucket_start)
61+
ORDER BY (bucket_start, country, event_type)
62+
SETTINGS index_granularity = 8192;
63+
```
64+
65+
We store **aggregate states** (e.g., `AggregateFunction(sum, ...)`) which compactly represent partial aggregates and can be merged or finalized later.
66+
67+
## 3) Create a Materialized View that populates the rollup
68+
69+
```sql
70+
CREATE MATERIALIZED VIEW mv_events_rollup_1h
71+
TO events_rollup_1h
72+
AS
73+
SELECT
74+
toStartOfHour(event_time) AS bucket_start,
75+
country,
76+
event_type,
77+
uniqExactState(user_id) AS users_uniq,
78+
sumState(value) AS value_sum,
79+
avgState(value) AS value_avg,
80+
countState() AS events_count
81+
FROM events_raw
82+
GROUP BY
83+
bucket_start, country, event_type;
84+
```
85+
86+
This MV fires automatically on inserts into `events_raw`. It writes **aggregate states** into the rollup.
87+
88+
## 4) Insert some sample data
89+
90+
```sql
91+
INSERT INTO events_raw VALUES
92+
('2025-09-18 10:01:00', 101, 'US', 'view', 1),
93+
('2025-09-18 10:02:00', 101, 'US', 'click', 1),
94+
('2025-09-18 10:03:00', 202, 'DE', 'view', 1),
95+
('2025-09-18 10:40:00', 101, 'US', 'view', 1);
96+
```
97+
98+
## 5) Querying the rollup
99+
100+
You can either **merge** states at read time, or **finalize** them:
101+
102+
<Tabs groupId="finalize">
103+
<TabItem value="merge" label="Merge at read time">
104+
105+
```sql
106+
SELECT
107+
bucket_start,
108+
country,
109+
event_type,
110+
uniqExactMerge(users_uniq) AS users,
111+
sumMerge(value_sum) AS value_sum,
112+
avgMerge(value_avg) AS value_avg,
113+
countMerge(events_count) AS events
114+
FROM events_rollup_1h
115+
WHERE bucket_start >= now() - INTERVAL 1 DAY
116+
GROUP BY ALL
117+
ORDER BY bucket_start, country, event_type;
118+
```
119+
120+
</TabItem>
121+
<TabItem value="finalize" label="Finalize with -Final">
122+
123+
```sql
124+
SELECT
125+
bucket_start,
126+
country,
127+
event_type,
128+
uniqExactMerge(users_uniq) AS users,
129+
sumMerge(value_sum) AS value_sum,
130+
avgMerge(value_avg) AS value_avg,
131+
countMerge(events_count) AS events
132+
FROM events_rollup_1h
133+
WHERE bucket_start >= now() - INTERVAL 1 DAY
134+
GROUP BY ALL
135+
ORDER BY bucket_start, country, event_type
136+
SETTINGS final = 1; -- or use SELECT ... FINAL
137+
```
138+
139+
</TabItem>
140+
</Tabs>
141+
142+
> **Tip:** If you expect reads to always hit the rollup, you can create a **second MV** that writes *finalized* numbers to a “plain” `MergeTree` table at the same 1h grain. States give more flexibility; finalized numbers give slightly simpler reads.
143+
144+
## 6) Filtering performance: use the primary key
145+
146+
```sql
147+
EXPLAIN indexes=1
148+
SELECT *
149+
FROM events_rollup_1h
150+
WHERE bucket_start BETWEEN now() - INTERVAL 3 DAY AND now()
151+
AND country = 'US';
152+
```
153+
154+
You should see the index (on `(bucket_start, country, event_type)`) used to prune data.
155+
156+
## 7) Common variations
157+
158+
- **Different grains**: add a daily rollup:
159+
160+
```sql
161+
CREATE TABLE events_rollup_1d
162+
(
163+
bucket_start Date,
164+
country LowCardinality(String),
165+
event_type LowCardinality(String),
166+
users_uniq AggregateFunction(uniqExact, UInt64),
167+
value_sum AggregateFunction(sum, Float64),
168+
value_avg AggregateFunction(avg, Float64),
169+
events_count AggregateFunction(count)
170+
)
171+
ENGINE = AggregatingMergeTree
172+
PARTITION BY toYYYYMM(bucket_start)
173+
ORDER BY (bucket_start, country, event_type);
174+
```
175+
176+
Then a second MV:
177+
178+
```sql
179+
CREATE MATERIALIZED VIEW mv_events_rollup_1d
180+
TO events_rollup_1d
181+
AS
182+
SELECT
183+
toDate(event_time) AS bucket_start,
184+
country,
185+
event_type,
186+
uniqExactState(user_id),
187+
sumState(value),
188+
avgState(value),
189+
countState()
190+
FROM events_raw
191+
GROUP BY ALL;
192+
```
193+
194+
- **Compression**: apply codecs to big columns (example: `Codec(ZSTD(3))`) on raw table.
195+
- **Cost control**: push heavy retention to raw and keep long-lived rollups.
196+
- **Backfilling**: when loading historical data, insert into `events_raw` and let the MVs build rollups automatically; for existing rows, use `POPULATE` on MV creation if suitable, or `INSERT SELECT`.
197+
198+
## 8) Clean-up and retention
199+
200+
- Increase raw TTL (e.g., 30/90 days) but keep rollups for longer (e.g., 1 year).
201+
- You can also use **TTL to move** old parts to cheaper storage if tiering is enabled.
202+
203+
## 9) Troubleshooting
204+
205+
- MV not updating? Check that inserts go to **events_raw** (not the rollup), and that the MV target is correct (`TO events_rollup_1h`).
206+
- Slow queries? Confirm they hit the rollup (query the rollup table directly) and that the time filters align to the rollup grain.
207+
- Backfill mismatches? Use `SYSTEM FLUSH LOGS` and check `system.query_log` / `system.parts` to confirm inserts and merges.
208+
209+
---
210+

0 commit comments

Comments
 (0)