|
| 1 | +--- |
| 2 | +title: 'Build a rollup with materialized views for fast time-series analytics' |
| 3 | +slug: /knowledgebase/materialized-view-rollup-timeseries |
| 4 | +description: 'End-to-end example creating a raw events table, a rollup table, and a materialized view for low-latency analytics.' |
| 5 | +keywords: ['materialized view', 'rollup', 'aggregate', 'timeseries', 'tutorial'] |
| 6 | +doc_type: 'guide' |
| 7 | +--- |
| 8 | + |
| 9 | +import Tabs from '@theme/Tabs'; |
| 10 | +import TabItem from '@theme/TabItem'; |
| 11 | + |
| 12 | +> This tutorial shows you how to maintain pre-aggregated roll-ups from a high-volume events table using [**materialized views**](/materialized-views). |
| 13 | +You’ll create three objects: a raw table, a rollup table, and the materialized view that writes into the rollup automatically. |
| 14 | + |
| 15 | +## When to use this pattern {#when-to-use} |
| 16 | + |
| 17 | +Use this pattern when: |
| 18 | +- You have an **append-only events stream** (clicks, pageviews, IoT, logs). |
| 19 | +- Most queries are **aggregations** over time ranges (per minute/hour/day). |
| 20 | +- You want **consistent sub-second reads** without re-scanning all raw rows. |
| 21 | + |
| 22 | +<VerticalStepper headerLevel="h2"> |
| 23 | + |
| 24 | +## Create the raw events table {#create-raw-events-table} |
| 25 | + |
| 26 | +```sql |
| 27 | +CREATE TABLE events_raw |
| 28 | +( |
| 29 | + event_time DateTime, |
| 30 | + user_id UInt64, |
| 31 | + country LowCardinality(String), |
| 32 | + event_type LowCardinality(String), |
| 33 | + value Float64 |
| 34 | +) |
| 35 | +ENGINE = MergeTree |
| 36 | +PARTITION BY toYYYYMM(event_time) |
| 37 | +ORDER BY (event_time, user_id) |
| 38 | +TTL event_time + INTERVAL 90 DAY DELETE |
| 39 | +``` |
| 40 | + |
| 41 | +**Notes** |
| 42 | + |
| 43 | +- `PARTITION BY toYYYYMM(event_time)` keeps partitions small and easy to drop. |
| 44 | +- `ORDER BY (event_time, user_id)` supports time-bounded queries + secondary filter. |
| 45 | +- `LowCardinality(String)` saves memory for categorical dimensions. |
| 46 | +- `TTL` cleans up raw data after 90 days (tune to your retention requirements). |
| 47 | + |
| 48 | +## Design the rollup (aggregated) table {#design-rollup} |
| 49 | + |
| 50 | +We’ll pre-aggregate to **hourly** granularity. |
| 51 | +Choose your grain to match the most common analysis window. |
| 52 | + |
| 53 | +```sql |
| 54 | +CREATE TABLE events_rollup_1h |
| 55 | +( |
| 56 | + bucket_start DateTime, -- start of the hour |
| 57 | + country LowCardinality(String), |
| 58 | + event_type LowCardinality(String), |
| 59 | + users_uniq AggregateFunction(uniqExact, UInt64), |
| 60 | + value_sum AggregateFunction(sum, Float64), |
| 61 | + value_avg AggregateFunction(avg, Float64), |
| 62 | + events_count AggregateFunction(count) |
| 63 | +) |
| 64 | +ENGINE = AggregatingMergeTree |
| 65 | +PARTITION BY toYYYYMM(bucket_start) |
| 66 | +ORDER BY (bucket_start, country, event_type) |
| 67 | +``` |
| 68 | + |
| 69 | +We store **aggregate states** (e.g., `AggregateFunction(sum, ...)`) which compactly represent partial aggregates and can be merged or finalized later. |
| 70 | + |
| 71 | +## Create a materialized view that populates the rollup {#create-materialized-view-to-populate-rollup} |
| 72 | + |
| 73 | +This materialized view fires automatically on inserts into `events_raw` and writes **aggregate states** into the rollup. |
| 74 | + |
| 75 | +```sql |
| 76 | +CREATE MATERIALIZED VIEW mv_events_rollup_1h |
| 77 | +TO events_rollup_1h |
| 78 | +AS |
| 79 | +SELECT |
| 80 | + toStartOfHour(event_time) AS bucket_start, |
| 81 | + country, |
| 82 | + event_type, |
| 83 | + uniqExactState(user_id) AS users_uniq, |
| 84 | + sumState(value) AS value_sum, |
| 85 | + avgState(value) AS value_avg, |
| 86 | + countState() AS events_count |
| 87 | +FROM events_raw |
| 88 | +GROUP BY bucket_start, country, event_type; |
| 89 | +``` |
| 90 | + |
| 91 | +## Insert some sample data {#insert-some-sample-data} |
| 92 | + |
| 93 | +Insert some sample data: |
| 94 | + |
| 95 | +```sql |
| 96 | +INSERT INTO events_raw VALUES |
| 97 | + (now() - INTERVAL 4 SECOND, 101, 'US', 'view', 1), |
| 98 | + (now() - INTERVAL 3 SECOND, 101, 'US', 'click', 1), |
| 99 | + (now() - INTERVAL 2 SECOND, 202, 'DE', 'view', 1), |
| 100 | + (now() - INTERVAL 1 SECOND, 101, 'US', 'view', 1); |
| 101 | +``` |
| 102 | + |
| 103 | +## Querying the rollup {#querying-the-rollup} |
| 104 | + |
| 105 | +You can either **merge** states at read time, or **finalize** them: |
| 106 | + |
| 107 | +<Tabs groupId="finalize"> |
| 108 | + <TabItem value="merge" label="Merge at read time"> |
| 109 | + |
| 110 | +```sql |
| 111 | +SELECT |
| 112 | + bucket_start, |
| 113 | + country, |
| 114 | + event_type, |
| 115 | + uniqExactMerge(users_uniq) AS users, |
| 116 | + sumMerge(value_sum) AS value_sum, |
| 117 | + avgMerge(value_avg) AS value_avg, |
| 118 | + countMerge(events_count) AS events |
| 119 | +FROM events_rollup_1h |
| 120 | +WHERE bucket_start >= now() - INTERVAL 1 DAY |
| 121 | +GROUP BY ALL |
| 122 | +ORDER BY bucket_start, country, event_type; |
| 123 | +``` |
| 124 | + |
| 125 | + </TabItem> |
| 126 | + <TabItem value="finalize" label="Finalize with -Final"> |
| 127 | + |
| 128 | +```sql |
| 129 | +SELECT |
| 130 | + bucket_start, |
| 131 | + country, |
| 132 | + event_type, |
| 133 | + uniqExactMerge(users_uniq) AS users, |
| 134 | + sumMerge(value_sum) AS value_sum, |
| 135 | + avgMerge(value_avg) AS value_avg, |
| 136 | + countMerge(events_count) AS events |
| 137 | +FROM events_rollup_1h |
| 138 | +WHERE bucket_start >= now() - INTERVAL 1 DAY |
| 139 | +GROUP BY ALL |
| 140 | +ORDER BY bucket_start, country, event_type |
| 141 | +SETTINGS final = 1; -- or use SELECT ... FINAL |
| 142 | +``` |
| 143 | + |
| 144 | + </TabItem> |
| 145 | +</Tabs> |
| 146 | + |
| 147 | +<br/> |
| 148 | +:::tip |
| 149 | +If you expect reads to always hit the rollup, you can create a **second materialized view** that writes *finalized* numbers to a “plain” `MergeTree` table at the same 1h grain. |
| 150 | +States give more flexibility while finalized numbers give slightly simpler reads. |
| 151 | +::: |
| 152 | + |
| 153 | +## Filter on fields in the primary key for best performance {#filtering-performance} |
| 154 | + |
| 155 | +You can use the `EXPLAIN` command to see how the index is used to prune data: |
| 156 | + |
| 157 | +```sql title="Query" |
| 158 | +EXPLAIN indexes=1 |
| 159 | +SELECT * |
| 160 | +FROM events_rollup_1h |
| 161 | +WHERE bucket_start BETWEEN now() - INTERVAL 3 DAY AND now() |
| 162 | + AND country = 'US'; |
| 163 | +``` |
| 164 | + |
| 165 | +```response title="Response" |
| 166 | + ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ |
| 167 | + 1. │ Expression ((Project names + Projection)) │ |
| 168 | + 2. │ Expression │ |
| 169 | + 3. │ ReadFromMergeTree (default.events_rollup_1h) │ |
| 170 | + 4. │ Indexes: │ |
| 171 | + 5. │ MinMax │ |
| 172 | + 6. │ Keys: │ |
| 173 | + 7. │ bucket_start │ |
| 174 | + 8. │ Condition: and((bucket_start in (-Inf, 1758550242]), (bucket_start in [1758291042, +Inf))) │ |
| 175 | + 9. │ Parts: 1/1 │ |
| 176 | + 10. │ Granules: 1/1 │ |
| 177 | + 11. │ Partition │ |
| 178 | + 12. │ Keys: │ |
| 179 | + 13. │ toYYYYMM(bucket_start) │ |
| 180 | + 14. │ Condition: and((toYYYYMM(bucket_start) in (-Inf, 202509]), (toYYYYMM(bucket_start) in [202509, +Inf))) │ |
| 181 | + 15. │ Parts: 1/1 │ |
| 182 | + 16. │ Granules: 1/1 │ |
| 183 | + 17. │ PrimaryKey │ |
| 184 | + 18. │ Keys: │ |
| 185 | + 19. │ bucket_start │ |
| 186 | + 20. │ country │ |
| 187 | + 21. │ Condition: and((country in ['US', 'US']), and((bucket_start in (-Inf, 1758550242]), (bucket_start in [1758291042, +Inf)))) │ |
| 188 | + 22. │ Parts: 1/1 │ |
| 189 | + 23. │ Granules: 1/1 │ |
| 190 | + └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
| 191 | +``` |
| 192 | + |
| 193 | +The query execution plan above shows three types of indexes being used: |
| 194 | +a MinMax index, a partition index, and a primary key Index. |
| 195 | +Each index makes use of fields specified in our primary key: `(bucket_start, country, event_type)`. |
| 196 | +For best filtering performance you will want to make sure that your queries are making use of primary key fields to prune data. |
| 197 | + |
| 198 | +## Common variations {#common-variations} |
| 199 | + |
| 200 | +- **Different grains**: add a daily rollup: |
| 201 | + |
| 202 | +```sql |
| 203 | +CREATE TABLE events_rollup_1d |
| 204 | +( |
| 205 | + bucket_start Date, |
| 206 | + country LowCardinality(String), |
| 207 | + event_type LowCardinality(String), |
| 208 | + users_uniq AggregateFunction(uniqExact, UInt64), |
| 209 | + value_sum AggregateFunction(sum, Float64), |
| 210 | + value_avg AggregateFunction(avg, Float64), |
| 211 | + events_count AggregateFunction(count) |
| 212 | +) |
| 213 | +ENGINE = AggregatingMergeTree |
| 214 | +PARTITION BY toYYYYMM(bucket_start) |
| 215 | +ORDER BY (bucket_start, country, event_type); |
| 216 | +``` |
| 217 | + |
| 218 | +Then a second materialized view: |
| 219 | + |
| 220 | +```sql |
| 221 | +CREATE MATERIALIZED VIEW mv_events_rollup_1d |
| 222 | +TO events_rollup_1d |
| 223 | +AS |
| 224 | +SELECT |
| 225 | + toDate(event_time) AS bucket_start, |
| 226 | + country, |
| 227 | + event_type, |
| 228 | + uniqExactState(user_id), |
| 229 | + sumState(value), |
| 230 | + avgState(value), |
| 231 | + countState() |
| 232 | +FROM events_raw |
| 233 | +GROUP BY ALL; |
| 234 | +``` |
| 235 | + |
| 236 | +- **Compression**: apply codecs to big columns (example: `Codec(ZSTD(3))`) on the raw table. |
| 237 | +- **Cost control**: push heavy retention to the raw table and keep long-lived roll-ups. |
| 238 | +- **Backfilling**: when loading historical data, insert into `events_raw` and let the materialized view build roll-ups automatically. For existing rows, use `POPULATE` on materialized view creation if suitable, or `INSERT SELECT`. |
| 239 | + |
| 240 | +## Clean-up and retention {#clean-up-and-retention} |
| 241 | + |
| 242 | +- Increase raw TTL (e.g., 30/90 days) but keep roll-ups for longer (e.g., 1 year). |
| 243 | +- You can also use **TTL to move** old parts to cheaper storage if tiering is enabled. |
| 244 | + |
| 245 | +## Troubleshooting {#troubleshooting} |
| 246 | + |
| 247 | +- Materialized view not updating? Check that inserts go to the **events_raw** (not the roll-up table), and that the materialized view target is correct (`TO events_rollup_1h`). |
| 248 | +- Slow queries? Confirm they hit the rollup (query the rollup table directly) and that the time filters align to the rollup grain. |
| 249 | +- Backfill mismatches? Use `SYSTEM FLUSH LOGS` and check `system.query_log` / `system.parts` to confirm inserts and merges. |
| 250 | + |
| 251 | +</VerticalStepper> |
0 commit comments