docs: add tutorial for rollups with Materialized Views

Username46786 · web-flow · commit b69c6849f100 · 2025-09-19T18:48:18.000-07:00
diff --git a/knowledgebase/materialized-view-rollup-timeseries.mdx b/knowledgebase/materialized-view-rollup-timeseries.mdx
@@ -0,0 +1,210 @@
+---
+title: Build a rollup with Materialized Views for fast time-series analytics
+slug: /knowledgebase/materialized-view-rollup-timeseries
+description: End-to-end example creating a raw events table, a rollup table, and a materialized view for low-latency analytics.
+keywords: [materialized view, rollup, aggregate, timeseries, tutorial]
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+> This tutorial shows how to maintain pre-aggregated rollups from a high-volume events table using **Materialized Views**. You’ll create three objects: a raw table, a rollup table, and the MV that writes into the rollup automatically.
+
+## When to use this pattern
+
+- You have an **append-only events stream** (clicks, pageviews, IoT, logs).
+- Most queries are **aggregations** over time ranges (per minute/hour/day).
+- You want **consistent sub-second reads** without re-scanning all raw rows.
+
+## 1) Create a raw events table
+
+```sql
+CREATE TABLE events_raw
+(
+    event_time   DateTime,
+    user_id      UInt64,
+    country      LowCardinality(String),
+    event_type   LowCardinality(String),
+    value        Float64
+)
+ENGINE = MergeTree
+PARTITION BY toYYYYMM(event_time)
+ORDER BY (event_time, user_id)
+TTL event_time + INTERVAL 90 DAY DELETE
+SETTINGS index_granularity = 8192;
+```
+
+**Notes**
+
+- `PARTITION BY toYYYYMM(event_time)` keeps partitions small and easy to drop.
+- `ORDER BY (event_time, user_id)` supports time-bounded queries + secondary filter.
+- `LowCardinality(String)` saves memory for categorical dimensions.
+- `TTL` cleans up raw data after 90 days (tune to your retention).
+
+## 2) Design the rollup (aggregated) table
+
+We’ll pre-aggregate to **hourly** grain. Choose your grain to match the most common analysis window.
+
+```sql
+CREATE TABLE events_rollup_1h
+(
+    bucket_start  DateTime,            -- start of the hour
+    country       LowCardinality(String),
+    event_type    LowCardinality(String),
+    users_uniq    AggregateFunction(uniqExact, UInt64),
+    value_sum     AggregateFunction(sum, Float64),
+    value_avg     AggregateFunction(avg, Float64),
+    events_count  AggregateFunction(count)
+)
+ENGINE = AggregatingMergeTree
+PARTITION BY toYYYYMM(bucket_start)
+ORDER BY (bucket_start, country, event_type)
+SETTINGS index_granularity = 8192;
+```
+
+We store **aggregate states** (e.g., `AggregateFunction(sum, ...)`) which compactly represent partial aggregates and can be merged or finalized later.
+
+## 3) Create a Materialized View that populates the rollup
+
+```sql
+CREATE MATERIALIZED VIEW mv_events_rollup_1h
+TO events_rollup_1h
+AS
+SELECT
+    toStartOfHour(event_time) AS bucket_start,
+    country,
+    event_type,
+    uniqExactState(user_id)   AS users_uniq,
+    sumState(value)           AS value_sum,
+    avgState(value)           AS value_avg,
+    countState()              AS events_count
+FROM events_raw
+GROUP BY
+    bucket_start, country, event_type;
+```
+
+This MV fires automatically on inserts into `events_raw`. It writes **aggregate states** into the rollup.
+
+## 4) Insert some sample data
+
+```sql
+INSERT INTO events_raw VALUES
+('2025-09-18 10:01:00', 101, 'US', 'view', 1),
+('2025-09-18 10:02:00', 101, 'US', 'click', 1),
+('2025-09-18 10:03:00', 202, 'DE', 'view', 1),
+('2025-09-18 10:40:00', 101, 'US', 'view', 1);
+```
+
+## 5) Querying the rollup
+
+You can either **merge** states at read time, or **finalize** them:
+
+<Tabs groupId="finalize">
+  <TabItem value="merge" label="Merge at read time">
+
+```sql
+SELECT
+    bucket_start,
+    country,
+    event_type,
+    uniqExactMerge(users_uniq) AS users,
+    sumMerge(value_sum)        AS value_sum,
+    avgMerge(value_avg)        AS value_avg,
+    countMerge(events_count)   AS events
+FROM events_rollup_1h
+WHERE bucket_start >= now() - INTERVAL 1 DAY
+GROUP BY ALL
+ORDER BY bucket_start, country, event_type;
+```
+
+  </TabItem>
+  <TabItem value="finalize" label="Finalize with -Final">
+
+```sql
+SELECT
+    bucket_start,
+    country,
+    event_type,
+    uniqExactMerge(users_uniq) AS users,
+    sumMerge(value_sum)        AS value_sum,
+    avgMerge(value_avg)        AS value_avg,
+    countMerge(events_count)   AS events
+FROM events_rollup_1h
+WHERE bucket_start >= now() - INTERVAL 1 DAY
+GROUP BY ALL
+ORDER BY bucket_start, country, event_type
+SETTINGS final = 1;  -- or use SELECT ... FINAL
+```
+
+  </TabItem>
+</Tabs>
+
+> **Tip:** If you expect reads to always hit the rollup, you can create a **second MV** that writes *finalized* numbers to a “plain” `MergeTree` table at the same 1h grain. States give more flexibility; finalized numbers give slightly simpler reads.
+
+## 6) Filtering performance: use the primary key
+
+```sql
+EXPLAIN indexes=1
+SELECT *
+FROM events_rollup_1h
+WHERE bucket_start BETWEEN now() - INTERVAL 3 DAY AND now()
+  AND country = 'US';
+```
+
+You should see the index (on `(bucket_start, country, event_type)`) used to prune data.
+
+## 7) Common variations
+
+- **Different grains**: add a daily rollup:
+
+```sql
+CREATE TABLE events_rollup_1d
+(
+    bucket_start Date,
+    country      LowCardinality(String),
+    event_type   LowCardinality(String),
+    users_uniq   AggregateFunction(uniqExact, UInt64),
+    value_sum    AggregateFunction(sum, Float64),
+    value_avg    AggregateFunction(avg, Float64),
+    events_count AggregateFunction(count)
+)
+ENGINE = AggregatingMergeTree
+PARTITION BY toYYYYMM(bucket_start)
+ORDER BY (bucket_start, country, event_type);
+```
+
+Then a second MV:
+
+```sql
+CREATE MATERIALIZED VIEW mv_events_rollup_1d
+TO events_rollup_1d
+AS
+SELECT
+    toDate(event_time)        AS bucket_start,
+    country,
+    event_type,
+    uniqExactState(user_id),
+    sumState(value),
+    avgState(value),
+    countState()
+FROM events_raw
+GROUP BY ALL;
+```
+
+- **Compression**: apply codecs to big columns (example: `Codec(ZSTD(3))`) on raw table.
+- **Cost control**: push heavy retention to raw and keep long-lived rollups.  
+- **Backfilling**: when loading historical data, insert into `events_raw` and let the MVs build rollups automatically; for existing rows, use `POPULATE` on MV creation if suitable, or `INSERT SELECT`.
+
+## 8) Clean-up and retention
+
+- Increase raw TTL (e.g., 30/90 days) but keep rollups for longer (e.g., 1 year).
+- You can also use **TTL to move** old parts to cheaper storage if tiering is enabled.
+
+## 9) Troubleshooting
+
+- MV not updating? Check that inserts go to **events_raw** (not the rollup), and that the MV target is correct (`TO events_rollup_1h`).
+- Slow queries? Confirm they hit the rollup (query the rollup table directly) and that the time filters align to the rollup grain.
+- Backfill mismatches? Use `SYSTEM FLUSH LOGS` and check `system.query_log` / `system.parts` to confirm inserts and merges.
+
+---
+