|
| 1 | +--- |
| 2 | +title: Build a rollup with Materialized Views for fast time-series analytics |
| 3 | +slug: /knowledgebase/materialized-view-rollup-timeseries |
| 4 | +description: End-to-end example creating a raw events table, a rollup table, and a materialized view for low-latency analytics. |
| 5 | +keywords: [materialized view, rollup, aggregate, timeseries, tutorial] |
| 6 | +--- |
| 7 | + |
| 8 | +import Tabs from '@theme/Tabs'; |
| 9 | +import TabItem from '@theme/TabItem'; |
| 10 | + |
| 11 | +> This tutorial shows how to maintain pre-aggregated rollups from a high-volume events table using **Materialized Views**. You’ll create three objects: a raw table, a rollup table, and the MV that writes into the rollup automatically. |
| 12 | +
|
| 13 | +## When to use this pattern |
| 14 | + |
| 15 | +- You have an **append-only events stream** (clicks, pageviews, IoT, logs). |
| 16 | +- Most queries are **aggregations** over time ranges (per minute/hour/day). |
| 17 | +- You want **consistent sub-second reads** without re-scanning all raw rows. |
| 18 | + |
| 19 | +## 1) Create a raw events table |
| 20 | + |
| 21 | +```sql |
| 22 | +CREATE TABLE events_raw |
| 23 | +( |
| 24 | + event_time DateTime, |
| 25 | + user_id UInt64, |
| 26 | + country LowCardinality(String), |
| 27 | + event_type LowCardinality(String), |
| 28 | + value Float64 |
| 29 | +) |
| 30 | +ENGINE = MergeTree |
| 31 | +PARTITION BY toYYYYMM(event_time) |
| 32 | +ORDER BY (event_time, user_id) |
| 33 | +TTL event_time + INTERVAL 90 DAY DELETE |
| 34 | +SETTINGS index_granularity = 8192; |
| 35 | +``` |
| 36 | + |
| 37 | +**Notes** |
| 38 | + |
| 39 | +- `PARTITION BY toYYYYMM(event_time)` keeps partitions small and easy to drop. |
| 40 | +- `ORDER BY (event_time, user_id)` supports time-bounded queries + secondary filter. |
| 41 | +- `LowCardinality(String)` saves memory for categorical dimensions. |
| 42 | +- `TTL` cleans up raw data after 90 days (tune to your retention). |
| 43 | + |
| 44 | +## 2) Design the rollup (aggregated) table |
| 45 | + |
| 46 | +We’ll pre-aggregate to **hourly** grain. Choose your grain to match the most common analysis window. |
| 47 | + |
| 48 | +```sql |
| 49 | +CREATE TABLE events_rollup_1h |
| 50 | +( |
| 51 | + bucket_start DateTime, -- start of the hour |
| 52 | + country LowCardinality(String), |
| 53 | + event_type LowCardinality(String), |
| 54 | + users_uniq AggregateFunction(uniqExact, UInt64), |
| 55 | + value_sum AggregateFunction(sum, Float64), |
| 56 | + value_avg AggregateFunction(avg, Float64), |
| 57 | + events_count AggregateFunction(count) |
| 58 | +) |
| 59 | +ENGINE = AggregatingMergeTree |
| 60 | +PARTITION BY toYYYYMM(bucket_start) |
| 61 | +ORDER BY (bucket_start, country, event_type) |
| 62 | +SETTINGS index_granularity = 8192; |
| 63 | +``` |
| 64 | + |
| 65 | +We store **aggregate states** (e.g., `AggregateFunction(sum, ...)`) which compactly represent partial aggregates and can be merged or finalized later. |
| 66 | + |
| 67 | +## 3) Create a Materialized View that populates the rollup |
| 68 | + |
| 69 | +```sql |
| 70 | +CREATE MATERIALIZED VIEW mv_events_rollup_1h |
| 71 | +TO events_rollup_1h |
| 72 | +AS |
| 73 | +SELECT |
| 74 | + toStartOfHour(event_time) AS bucket_start, |
| 75 | + country, |
| 76 | + event_type, |
| 77 | + uniqExactState(user_id) AS users_uniq, |
| 78 | + sumState(value) AS value_sum, |
| 79 | + avgState(value) AS value_avg, |
| 80 | + countState() AS events_count |
| 81 | +FROM events_raw |
| 82 | +GROUP BY |
| 83 | + bucket_start, country, event_type; |
| 84 | +``` |
| 85 | + |
| 86 | +This MV fires automatically on inserts into `events_raw`. It writes **aggregate states** into the rollup. |
| 87 | + |
| 88 | +## 4) Insert some sample data |
| 89 | + |
| 90 | +```sql |
| 91 | +INSERT INTO events_raw VALUES |
| 92 | +('2025-09-18 10:01:00', 101, 'US', 'view', 1), |
| 93 | +('2025-09-18 10:02:00', 101, 'US', 'click', 1), |
| 94 | +('2025-09-18 10:03:00', 202, 'DE', 'view', 1), |
| 95 | +('2025-09-18 10:40:00', 101, 'US', 'view', 1); |
| 96 | +``` |
| 97 | + |
| 98 | +## 5) Querying the rollup |
| 99 | + |
| 100 | +You can either **merge** states at read time, or **finalize** them: |
| 101 | + |
| 102 | +<Tabs groupId="finalize"> |
| 103 | + <TabItem value="merge" label="Merge at read time"> |
| 104 | + |
| 105 | +```sql |
| 106 | +SELECT |
| 107 | + bucket_start, |
| 108 | + country, |
| 109 | + event_type, |
| 110 | + uniqExactMerge(users_uniq) AS users, |
| 111 | + sumMerge(value_sum) AS value_sum, |
| 112 | + avgMerge(value_avg) AS value_avg, |
| 113 | + countMerge(events_count) AS events |
| 114 | +FROM events_rollup_1h |
| 115 | +WHERE bucket_start >= now() - INTERVAL 1 DAY |
| 116 | +GROUP BY ALL |
| 117 | +ORDER BY bucket_start, country, event_type; |
| 118 | +``` |
| 119 | + |
| 120 | + </TabItem> |
| 121 | + <TabItem value="finalize" label="Finalize with -Final"> |
| 122 | + |
| 123 | +```sql |
| 124 | +SELECT |
| 125 | + bucket_start, |
| 126 | + country, |
| 127 | + event_type, |
| 128 | + uniqExactMerge(users_uniq) AS users, |
| 129 | + sumMerge(value_sum) AS value_sum, |
| 130 | + avgMerge(value_avg) AS value_avg, |
| 131 | + countMerge(events_count) AS events |
| 132 | +FROM events_rollup_1h |
| 133 | +WHERE bucket_start >= now() - INTERVAL 1 DAY |
| 134 | +GROUP BY ALL |
| 135 | +ORDER BY bucket_start, country, event_type |
| 136 | +SETTINGS final = 1; -- or use SELECT ... FINAL |
| 137 | +``` |
| 138 | + |
| 139 | + </TabItem> |
| 140 | +</Tabs> |
| 141 | + |
| 142 | +> **Tip:** If you expect reads to always hit the rollup, you can create a **second MV** that writes *finalized* numbers to a “plain” `MergeTree` table at the same 1h grain. States give more flexibility; finalized numbers give slightly simpler reads. |
| 143 | +
|
| 144 | +## 6) Filtering performance: use the primary key |
| 145 | + |
| 146 | +```sql |
| 147 | +EXPLAIN indexes=1 |
| 148 | +SELECT * |
| 149 | +FROM events_rollup_1h |
| 150 | +WHERE bucket_start BETWEEN now() - INTERVAL 3 DAY AND now() |
| 151 | + AND country = 'US'; |
| 152 | +``` |
| 153 | + |
| 154 | +You should see the index (on `(bucket_start, country, event_type)`) used to prune data. |
| 155 | + |
| 156 | +## 7) Common variations |
| 157 | + |
| 158 | +- **Different grains**: add a daily rollup: |
| 159 | + |
| 160 | +```sql |
| 161 | +CREATE TABLE events_rollup_1d |
| 162 | +( |
| 163 | + bucket_start Date, |
| 164 | + country LowCardinality(String), |
| 165 | + event_type LowCardinality(String), |
| 166 | + users_uniq AggregateFunction(uniqExact, UInt64), |
| 167 | + value_sum AggregateFunction(sum, Float64), |
| 168 | + value_avg AggregateFunction(avg, Float64), |
| 169 | + events_count AggregateFunction(count) |
| 170 | +) |
| 171 | +ENGINE = AggregatingMergeTree |
| 172 | +PARTITION BY toYYYYMM(bucket_start) |
| 173 | +ORDER BY (bucket_start, country, event_type); |
| 174 | +``` |
| 175 | + |
| 176 | +Then a second MV: |
| 177 | + |
| 178 | +```sql |
| 179 | +CREATE MATERIALIZED VIEW mv_events_rollup_1d |
| 180 | +TO events_rollup_1d |
| 181 | +AS |
| 182 | +SELECT |
| 183 | + toDate(event_time) AS bucket_start, |
| 184 | + country, |
| 185 | + event_type, |
| 186 | + uniqExactState(user_id), |
| 187 | + sumState(value), |
| 188 | + avgState(value), |
| 189 | + countState() |
| 190 | +FROM events_raw |
| 191 | +GROUP BY ALL; |
| 192 | +``` |
| 193 | + |
| 194 | +- **Compression**: apply codecs to big columns (example: `Codec(ZSTD(3))`) on raw table. |
| 195 | +- **Cost control**: push heavy retention to raw and keep long-lived rollups. |
| 196 | +- **Backfilling**: when loading historical data, insert into `events_raw` and let the MVs build rollups automatically; for existing rows, use `POPULATE` on MV creation if suitable, or `INSERT SELECT`. |
| 197 | + |
| 198 | +## 8) Clean-up and retention |
| 199 | + |
| 200 | +- Increase raw TTL (e.g., 30/90 days) but keep rollups for longer (e.g., 1 year). |
| 201 | +- You can also use **TTL to move** old parts to cheaper storage if tiering is enabled. |
| 202 | + |
| 203 | +## 9) Troubleshooting |
| 204 | + |
| 205 | +- MV not updating? Check that inserts go to **events_raw** (not the rollup), and that the MV target is correct (`TO events_rollup_1h`). |
| 206 | +- Slow queries? Confirm they hit the rollup (query the rollup table directly) and that the time filters align to the rollup grain. |
| 207 | +- Backfill mismatches? Use `SYSTEM FLUSH LOGS` and check `system.query_log` / `system.parts` to confirm inserts and merges. |
| 208 | + |
| 209 | +--- |
| 210 | + |
0 commit comments