|
1 | | -<p align="center"> |
2 | | - <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="50%" /> |
| 1 | +<p align="left"> |
| 2 | + <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="30%" /> |
3 | 3 | </p> |
4 | 4 |
|
5 | | -<h1 align="center"> |
6 | | -data-diff |
| 5 | +<h1 align="left"> |
| 6 | +data-diff: compare datasets fast, within or across SQL databases |
7 | 7 | </h1> |
8 | 8 |
|
9 | | -<h2 align="center"> |
10 | | -Develop dbt models faster by testing as you code. |
11 | | -</h2> |
12 | | -<h4 align="center"> |
13 | | -See how every change to dbt code affects the data produced in the modified model and downstream. |
14 | | -</h4> |
15 | 9 | <br> |
16 | 10 |
|
17 | | -## What is `data-diff`? |
18 | 11 |
|
19 | | -data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code. |
| 12 | +# Use cases |
20 | 13 |
|
21 | | -<div align="center"> |
| 14 | +## Data Migration & Replication Testing |
| 15 | +Compare source to target and check for discrepancies when moving data between systems: |
| 16 | +- Migrating to a new data warehouse (e.g., Oracle > Snowflake) |
| 17 | +- Converting SQL to a new transformation framework (e.g., stored procedures > dbt) |
| 18 | +- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift) |
22 | 19 |
|
23 | | - |
24 | 20 |
|
25 | | -</div> |
| 21 | +Install `data-diff` with specific database adapters, e.g.: |
26 | 22 |
|
27 | | -<br> |
28 | | - |
29 | | -:eyes: **Watch 4-min demo video [here](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)** |
30 | | - |
31 | | -## Getting Started |
32 | | - |
33 | | -**Install `data-diff`** |
34 | | - |
35 | | -Install `data-diff` with the command that is specific to the database you use with dbt. |
36 | | - |
37 | | -### Snowflake |
38 | 23 | ``` |
39 | | -pip install data-diff 'data-diff[snowflake,dbt]' -U |
| 24 | +pip install data-diff 'data-diff[postgresql,snowflake ]' -U |
40 | 25 | ``` |
41 | | - |
42 | | -### BigQuery |
| 26 | +Run `data-diff` with connection URIs to compare tables: |
43 | 27 | ``` |
44 | | -pip install data-diff 'data-diff[dbt]' google-cloud-bigquery -U |
| 28 | +data-diff \ |
| 29 | + postgresql://<username>:'<password>'@localhost:5432/<database> \ |
| 30 | + <table> \ |
| 31 | + "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \ |
| 32 | + <TABLE> \ |
| 33 | + -k activity_id \ |
| 34 | + -c activity \ |
| 35 | + -w "event_timestamp < '2022-10-10'" |
45 | 36 | ``` |
| 37 | +Check out [documentation](https://docs.datafold.com/reference/open_source/cli) for full command reference. |
46 | 38 |
|
47 | | -### Redshift |
48 | | -``` |
49 | | -pip install data-diff 'data-diff[redshift,dbt]' -U |
50 | | -``` |
| 39 | +## Data Development Testing |
| 40 | +Test SQL code and preview changes by comparing development/staging environment data to production: |
| 41 | +1. Make a change to some SQL code |
| 42 | +2. Run the SQL code to create a new dataset |
| 43 | +3. Compare the dataset with its production version or another iteration |
51 | 44 |
|
52 | | -### Postgres |
53 | | -``` |
54 | | -pip install data-diff 'data-diff[postgres,dbt]' -U |
55 | | -``` |
| 45 | + <p align="left"> |
| 46 | + <img alt="dbt" src="https://seeklogo.com/images/D/dbt-logo-E4B0ED72A2-seeklogo.com.png" width="10%" /> |
| 47 | + </p> |
| 48 | + |
| 49 | +`data-diff` integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets. |
56 | 50 |
|
57 | | -### Databricks |
58 | | -``` |
59 | | -pip install data-diff 'data-diff[databricks,dbt]' -U |
60 | | -``` |
61 | | - |
62 | | -### DuckDB |
63 | | -``` |
64 | | -pip install data-diff 'data-diff[duckdb,dbt]' -U |
65 | | -``` |
66 | | - |
67 | | -**Update a few lines in your `dbt_project.yml`**. |
68 | | -``` |
69 | | -#dbt_project.yml |
70 | | -vars: |
71 | | - data_diff: |
72 | | - prod_database: my_database |
73 | | - prod_schema: my_default_schema |
74 | | -``` |
75 | | - |
76 | | -**Run your first data diff!** |
77 | | - |
78 | | -``` |
79 | | -dbt run && data-diff --dbt |
80 | | -``` |
| 51 | +:eyes: **Watch [4-min demo video](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)** |
81 | 52 |
|
82 | | -We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details. |
| 53 | +**[Get started with data-diff & dbt](https://docs.datafold.com/development_testing/open_source)** |
83 | 54 |
|
84 | | -Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started! |
| 55 | +Reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) for advice and support |
85 | 56 |
|
86 | | -<br><br> |
| 57 | +## Supported databases |
87 | 58 |
|
88 | | -### Diffing between databases |
| 59 | +- PostgreSQL >=10 |
| 60 | +- MySQL |
| 61 | +- Snowflake |
| 62 | +- BigQuery |
| 63 | +- Redshift |
| 64 | +- Oracle |
| 65 | +- Presto |
| 66 | +- Databricks |
| 67 | +- Trino |
| 68 | +- Clickhouse |
| 69 | +- Vertica |
| 70 | +- DuckDB >=0.6 |
| 71 | +- SQLite (coming soon) |
89 | 72 |
|
90 | | -Check out our [documentation](https://docs.datafold.com/reference/open_source/cli) if you're looking to compare data across databases (for example, between Postgres and Snowflake). |
91 | 73 |
|
92 | 74 | <br> |
93 | 75 |
|
|
0 commit comments