Skip to content

Commit 2138fa5

Browse files
author
Dat Nguyen
committed
chore: better readme
1 parent a19e0d9 commit 2138fa5

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,10 @@ Data-diff solution for dbt-ers with Snowflake ❄️ 🚀
2828
`dbt-data-diff` package provides the diff results into 3 categories or 3 levels of the diff as follows:
2929

3030
- 🥉 **Key diff** ([models](./models/01_key_diff/)): Compare the Primary Key (`pk`) only
31-
- 🥈 **Schema diff** ([models](./models/02_schema_diff/)): Compare the List of columns and their Data types
32-
- 🥇 **Content diff** (aka Data diff) ([models](./models/03_content_diff/)): Compare all column values. The columns will be filtered by each table's configuration (`include_columns` and `exclude_columns`), and the data can be also filtered by the `where` config. Behind the scenes, this operation does not require the Primary Key (PK) config, it will perform Bulk Operation (`INTERCEPT` or `MINUS`) and make an aggregation to make up the column level's match percentage
31+
- 🥈 **Schema diff** ([models](./models/02_schema_diff/)): Compare the list of column's Names and Data Types
32+
- 🥇 **Content diff** (aka Data diff) ([models](./models/03_content_diff/)): Compare all cell values. The columns will be filtered by each table's configuration (`include_columns` and `exclude_columns`), and the data can be also filtered by the `where` config. Behind the scenes, this operation does not require the Primary Key (PK) config, it will perform Bulk Operation (`INTERCEPT` or `MINUS`) and make an aggregation to make up the column level's match percentage
3333

34-
In behind the scenes, this package leverages the ❄️ [Scripting Stored Procedure](https://docs.snowflake.com/en/developer-guide/stored-procedure/stored-procedures-snowflake-scripting) which provides the 3 ones correspondingly with 3 diff categories. Moreover, it utilizes the [DAG of Tasks](https://docs.snowflake.com/en/user-guide/tasks-intro?utm_source=legacy&utm_medium=serp&utm_term=task+DAG#label-task-dag) to optimize the speed with the parallelism once enabled by configuration 🚀
34+
Behind the scenes, this package leverages the ❄️ [Scripting Stored Procedure](https://docs.snowflake.com/en/developer-guide/stored-procedure/stored-procedures-snowflake-scripting) which provides the 3 ones correspondingly with 3 categories as above. Moreover, it utilizes the [DAG of Tasks](https://docs.snowflake.com/en/user-guide/tasks-intro?utm_source=legacy&utm_medium=serp&utm_term=task+DAG#label-task-dag) to optimize the speed with the parallelism once enabled by configuration 🚀
3535

3636
## Installation
3737

@@ -119,7 +119,7 @@ dbt run-operation data_diff__run_async --args '{is_polling_status: true}'
119119
> `use role accountadmin;`<br>
120120
> `grant execute task on account to role {{ target.role }};`</br>
121121

122-
<details> <!-- markdownlint-disable no-inline-html -->
122+
<details>
123123
<summary>📖 Or via dbt hook by default (it will run an incremental load for all models)</summary>
124124

125125
```yaml
@@ -164,17 +164,17 @@ dbt run -s data_diff --vars '{data_diff__on_run_hook: true}'
164164

165165
## Features comparison to the alternative packages
166166

167-
| Feature| Supported Package | Notes |
168-
|:-------|:------------------|:------|
169-
| Key diff | <ul><li>`dbt-data-diff`</li><li>[`data-diff`](https://github.com/datafold/data-diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt-audit-helper)</li></ul> | ✅ |
170-
| Schema diff | <ul><li>`dbt-data-diff`</li><li>[`data-diff`(*)](https://github.com/datafold/data-diff)</li><li>[`dbt-audit-helper`](https://github.com/dbt-labs/dbt-audit-helper)</li></ul> | (*): Only available in the paid-version 💰 |
171-
| Content diff | <ul><li>`dbt-data-diff`</li><li>[`data-diff`(*)](https://github.com/datafold/data-diff)</li><li>[`dbt-audit-helper`](https://github.com/dbt-labs/dbt-audit-helper)</li></ul> | (*): Only available in the paid-version 💰 |
172-
| Yaml Configuration | <ul><li>`dbt-data-diff`</li></ul> | `data-diff` will use the `toml` file, `dbt-audit-helper` will require to create new models for each comparison |
173-
| Query & Execution log | <ul><li>`dbt-data-diff`</li></ul> | Except for dbt's log, this package to be very transparent on which diff queries executed which are exposed in [`log_for_validation`](./models/log_for_validation.yml) model |
174-
| Snowflake-native Stored Proc | <ul><li>`dbt-data-diff`</li></ul> | Purely built as Snowflake SQL native stored procedures |
175-
| Parallelism | <ul><li>`dbt-data-diff`</li><li>[`data-diff`](https://github.com/datafold/data-diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt-audit-helper)</li></ul> | `dbt-data-diff` leverages Snowflake Task DAG, the others use python threading |
176-
| Asynchronous | <ul><li>`dbt-data-diff`</li></ul> | Trigger run and decide to poll the run status when needed |
177-
| Multi-warehouse supported | <ul><li>`dbt-data-diff`(*)</li><li>[`data-diff`](https://github.com/datafold/data-diff)</li><li>[`dbt-audit-helper`](https://github.com/dbt-labs/dbt-audit-helper)</li></ul> | (*): Future Consideration 🏃 |
167+
| Feature | Supported Package | Notes |
168+
|:----------------------|:-----------------------------------------------------------|:--------------------------------------|
169+
| Key diff | <ul><li>`dbt_data_diff`</li><li>[`data_diff`](https://github.com/datafold/data_diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt_audit_helper)</li></ul> | ✅ |
170+
| Schema diff | <ul><li>`dbt_data_diff`</li><li>[`data_diff`(*)](https://github.com/datafold/data_diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt_audit_helper)</li></ul> | (*): Only available in the paid-version 💰 |
171+
| Content diff | <ul><li>`dbt_data_diff`</li><li>[`data_diff`(*)](https://github.com/datafold/data_diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt_audit_helper)</li></ul> | (*): Only available in the paid-version 💰 |
172+
| Yaml Configuration | <ul><li>`dbt_data_diff`</li></ul> | `data_diff` will use the `toml` file, `dbt_audit_helper` will require to create new models for each comparison |
173+
| Query & Execution log | <ul><li>`dbt_data_diff`</li></ul> | Except for dbt's log, this package to be very transparent on which diff queries executed which are exposed in [`log_for_validation`](./models/log_for_validation.yml) model |
174+
| Snowflake-native Stored Proc | <ul><li>`dbt_data_diff`</li></ul> | Purely built as Snowflake SQL native stored procedures |
175+
| Parallelism | <ul><li>`dbt_data_diff`</li><li>[`data_diff`](https://github.com/datafold/data_diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt_audit_helper)</li></ul> | `dbt_data_diff` leverages Snowflake Task DAG, the others use python threading |
176+
| Asynchronous | <ul><li>`dbt_data_diff`</li></ul> | Trigger run & go away. Decide to continously poll the run status and waiting until finished if needed |
177+
| Multi-warehouse supported | <ul><li>`dbt_data_diff`(*)</li><li>[`data_diff`](https://github.com/datafold/data_diff)</li><li>[`dbt_audit_helper`](https://github.com/dbt-labs/dbt_audit_helper)</li></ul> | (*): Future Consideration 🏃 |
178178

179179
## About Infinite Lambda
180180

0 commit comments

Comments
 (0)