Testing metricflow, dbt exposure and collibri - dbt on duckdb
- Switch to the root directory of the generated sample project (e.g.
cd duckdb-dbt-metricflow) - This enables use of the tutorial project and associated connection profile in later steps
- Run
cd database && pwdto get the full path to your database directory - Update the path in
profiles-example.ymlto point to your DuckDB database location
🤓 Please run the following steps:
-
Run
dbt debugto test the connection -
Run
dbt buildto seed tables and produce artifacts. -
Try validating your data model:
mf validate-configs
-
Check out your metrics:
mf list metrics ✔ 🎉 Successfully parsed manifest from dbt project ✔ 🎉 Successfully validated the semantics of built manifest (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 2) ✔ 🎉 Successfully validated semantic models against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0) ✔ 🎉 Successfully validated dimensions against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0) ✔ 🎉 Successfully validated entities against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0) ✔ 🎉 Successfully validated measures against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0) ✔ 🎉 Successfully validated metrics against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0) (.venv) ┌─(~/personal/project/duckdb-dbt-metricflow)───
-
Check out dimensions for your metric:
mf list dimensions --metrics transactions ✔ 🌱 We ve found 13 metrics. The list below shows metrics in the format of "metric_name: list of available dimensions" • alterations: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • cancellation_rate: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • cancellations: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • cancellations_mx: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • cumulative_transactions: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • cumulative_transactions_in_last_7_days: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • new_customers: country__region, customer__customer_country, customer__ds, metric_time • quick_buy_amount_usd: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • quick_buy_transactions: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • revenue_usd: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • transaction_amount: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • transaction_usd_na: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more • transactions: customer__country__region, customer__customer_country, customer__ds, metric_time, transaction__ds and 2 more
-
Query your first metric:
mf query --metrics transactions --group-by metric_time --order metric_time ✔ Success 🦄 - query completed after 0.03 seconds metric_time__day transactions ------------------- -------------- 2022-03-07T00:00:00 2 2022-03-08T00:00:00 2 2022-03-09T00:00:00 1 2022-03-10T00:00:00 2 2022-03-11T00:00:00 1 2022-03-12T00:00:00 1 2022-03-13T00:00:00 1 2022-03-14T00:00:00 1 2022-03-15T00:00:00 3 2022-03-16T00:00:00 2 2022-03-17T00:00:00 1 2022-03-18T00:00:00 1 2022-03-21T00:00:00 3 2022-03-22T00:00:00 3 2022-03-23T00:00:00 2 2022-03-25T00:00:00 3 2022-03-26T00:00:00 2 2022-03-27T00:00:00 2 2022-03-28T00:00:00 1 2022-03-29T00:00:00 2 2022-03-30T00:00:00 4 2022-03-31T00:00:00 5 2022-04-01T00:00:00 1 2022-04-02T00:00:00 2 2022-04-03T00:00:00 1 2022-04-04T00:00:00 1
-
Show the SQL MetricFlow generates:
mf query --metrics transactions --group-by metric_time --order metric_time --explain
✔ Success 🦄 - query completed after 0.02 seconds 🔎 SQL (remove --explain to see data or add --show-dataflow-plan to see the generated dataflow plan): SELECT metric_time__day , SUM(transactions) AS transactions FROM ( SELECT DATE_TRUNC('day', ds) AS metric_time__day , 1 AS transactions FROM "mf_tutorial"."main"."transactions" transactions_src_10000 ) subq_2 GROUP BY metric_time__day ORDER BY metric_time__day '''
-
Visualize the plan (if you have graphviz installed - see README):
mf query --metrics transactions --group-by metric_time --order metric_time --explain --display-plans
-
Add another dimension:
mf query --metrics transactions --group-by metric_time,customer__customer_country --order metric_time
-
Add a coarser time granularity:
mf query --metrics transactions --group-by metric_time__week --order metric_time__week ✔ Success 🦄 - query completed after 0.04 seconds metric_time__day customer__customer_country transactions ------------------- ---------------------------- -------------- 2022-03-07T00:00:00 FR 1 2022-03-07T00:00:00 CA 1 2022-03-08T00:00:00 CA 1 2022-03-08T00:00:00 MX 1 2022-03-09T00:00:00 CA 1 2022-03-10T00:00:00 CA 1 2022-03-10T00:00:00 US 1 2022-03-11T00:00:00 US 1 2022-03-12T00:00:00 US 1 2022-03-13T00:00:00 US 1 2022-03-14T00:00:00 US 1 2022-03-15T00:00:00 US 2 2022-03-15T00:00:00 FR 1 2022-03-16T00:00:00 CA 1 2022-03-16T00:00:00 FR 1 2022-03-17T00:00:00 CA 1 2022-03-18T00:00:00 MX 1 2022-03-21T00:00:00 CA 1 2022-03-21T00:00:00 FR 1 2022-03-21T00:00:00 US 1 2022-03-22T00:00:00 CA 1 2022-03-22T00:00:00 MX 1 2022-03-22T00:00:00 FR 1 2022-03-23T00:00:00 CA 1 2022-03-23T00:00:00 FR 1 2022-03-25T00:00:00 CA 1 2022-03-25T00:00:00 FR 2 2022-03-26T00:00:00 US 1 2022-03-26T00:00:00 FR 1 2022-03-27T00:00:00 US 2 2022-03-28T00:00:00 CA 1 2022-03-29T00:00:00 FR 2 2022-03-30T00:00:00 US 1 2022-03-30T00:00:00 CA 1 2022-03-30T00:00:00 MX 2 2022-03-31T00:00:00 US 1 2022-03-31T00:00:00 MX 2 2022-03-31T00:00:00 CA 2 2022-04-01T00:00:00 US 1 2022-04-02T00:00:00 CA 1 2022-04-02T00:00:00 MX 1 2022-04-03T00:00:00 MX 1 2022-04-04T00:00:00 MX 1
-
Try a more complicated query:
mf query \ --metrics transactions,transaction_usd_na \ --group-by metric_time,transaction__is_large \ --order metric_time \ --start-time 2022-03-20 --end-time 2022-04-01 ✔ Success 🦄 - query completed after 0.08 seconds metric_time__day transaction__is_large transactions transaction_usd_na ------------------- ----------------------- -------------- -------------------- 2022-03-21T00:00:00 True 3 639.87 2022-03-22T00:00:00 True 3 522.68 2022-03-23T00:00:00 True 2 215.14 2022-03-25T00:00:00 True 3 66.6 2022-03-26T00:00:00 True 2 179.91 2022-03-27T00:00:00 True 2 316.98 2022-03-28T00:00:00 True 1 266.9 2022-03-29T00:00:00 True 2 2022-03-30T00:00:00 True 4 534.25 2022-03-31T00:00:00 True 5 1170.96 2022-04-01T00:00:00 True 1 100.4
-
Before integrating metrics into your project, read up on adding a time spine.
If you found MetricFlow to be helpful, consider adding a Github star to promote the project.
dbt exposure is a feature that allows you to document and track downstream use cases of your dbt project, such as dashboards, ML models, or applications that depend on your dbt models. It helps maintain visibility into how your data is being used across the organization.
File Location: models/exposures.yml
exposures.yml
├── exposures: (root key)
├── exposure 1: executive_dashboard
│ ├── name: "executive_dashboard"
│ ├── description: "Executive dashboard showing key business metrics and KPIs"
│ ├── type: "dashboard"
│ ├── url: "https://company.com/dashboards/executive - demo url"
│ ├── maturity: "high"
│ ├── owner:
│ │ ├── name: "Data Team"
│ │ └── email: "data-team@company.com"
│ ├── depends_on:
│ │ ├── ref('transactions')
│ │ ├── ref('customers')
│ │ └── ref('countries')
│ └── tags: ['executive', 'dashboard', 'kpi']
│
├── exposure 2: transaction_analytics_report
│ ├── name: "transaction_analytics_report"
│ ├── description: "Detailed transaction analytics report for business analysts"
│ ├── type: "notebook"
│ ├── url: "https://company.com/notebooks/transaction-analytics"
│ ├── maturity: "medium"
│ ├── owner:
│ │ ├── name: "Analytics Team"
│ │ └── email: "analytics@company.com"
│ ├── depends_on:
│ │ ├── ref('transactions')
│ │ └── ref('customers')
│ └── tags: ['analytics', 'transactions', 'report']
│
└── ... (6 more exposures with similar structure)
-
name- Unique identifier for the exposure -
description- Human-readable description of what the exposure does -
type- Category of downstream tool:dashboard- BI dashboards (Tableau, Looker, etc.)application- Web apps or APIsnotebook- Jupyter notebooks or analysis toolsreport- Static reports or documents
-
url- Link to the actual downstream tool/report -
maturity- Development stage:high- Production-ready, stablemedium- In development, some stabilitylow- Experimental, early stage
-
owner- Responsible team/person:name- Team or person nameemail- Contact information
-
depends_on- Which dbt models this exposure uses:ref('model_name')- References to your dbt models- Creates data lineage tracking
-
tags- Categorization for filtering/organization
Dbt exposures help you:
- Track downstream usage of your data models
- Maintain data lineage from source to consumption
- Identify impact when models change
- Document ownership and responsibility
- Organize by maturity and team
This creates a complete picture of how your dbt models are consumed across the organization!
To generate and view dbt documentation locally:
-
Compile your project:
dbt compile
This creates the compiled SQL and manifest files.
-
Generate the documentation:
dbt docs generate
This creates the catalog and website artifacts.
-
Start the docs server:
dbt docs serve --port 8081
This launches the documentation site at http://localhost:8081
The documentation includes your exposures, models, sources, and their relationships!