You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: This page provides an overview of using Assertion Notes
3
+
---
4
+
5
+
import FeatureAvailability from '@site/src/components/FeatureAvailability';
6
+
7
+
# Assertion Notes
8
+
9
+
<FeatureAvailabilitysaasOnly />
10
+
11
+
> The **Assertion Notes** feature is available as part of the **DataHub Cloud Observe** module of DataHub Cloud.
12
+
> If you are interested in learning more about **DataHub Cloud Observe** or trying it out, please [visit our website](https://datahub.com/products/data-observability/).
13
+
14
+
## Introduction
15
+
16
+
The Assertion notes feature aims to solve two key use cases:
17
+
18
+
1. Surfacing useful tips for engineers to troubleshoot and resolve data quality failures
19
+
2. Documenting the purpose of a given check, and implications of its failiure; for instance, some checks may circuit-break pipelines.
20
+
21
+
### For Troubleshooting
22
+
23
+
As you scale your data quality coverage across a large data landscape, you will often find that the engineers who are troubleshooting and resolving an assertion failure are not the same people who created the check.
24
+
Oftentimes, it's useful to provide troubleshooting instructions or notes with context about how to resolve the problem when a check fails.
25
+
26
+
- If the check was manually set up, it may be worthwhile for the creator to add notes for future on-call engineers
27
+
- If it was an AI check, whoever is first to investigate the failure may want to document what they did to fix it.
28
+
29
+
### For Documenting
30
+
31
+
Adding notes to Assertions is useful for documenting your Assertions. This is particularly relevant for Custom SQL checks, where understanding the logic from the query statements can be difficult. By adding documentation in the notes tab, others can understand exactly what is being monitored and how to resolve issues in event of failure.
Copy file name to clipboardExpand all lines: docs/managed-datahub/observe/assertions.md
+21-10Lines changed: 21 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,9 @@
1
1
# Assertions
2
2
3
-
:::note Contract Monitoring Support
4
-
Currently we support Snowflake, Redshift, BigQuery, and Databricks for out-of-the-box contract monitoring as part of DataHub Cloud Observe.
3
+
:::note Supported Data Platforms
4
+
Currently we support monitoring data on Snowflake, Redshift, BigQuery, and Databricks as part of DataHub Cloud Observe.
5
+
For other data platforms, DataHub Cloud Observe can monitor assertions against dataset metrics (such as volume, or column nullness) and dataset freshenss by using the ingested statistics for each asset.
6
+
Column Value and Custom SQL Assertions are not currently supported for other data platforms.
5
7
:::
6
8
7
9
An assertion is **a data quality test that finds data that violates a specified rule.**
@@ -11,30 +13,39 @@ Assertions serve as the building blocks of [Data Contracts](/docs/managed-datahu
11
13
12
14
Data quality tests (a.k.a. assertions) can be created and run by DataHub Cloud or ingested from a 3rd party tool.
13
15
14
-
### DataHub Cloud Observe
16
+
### DataHub Cloud Assertions
15
17
16
18
For DataHub-provided assertion runners, we can deploy an agent in your environment to hit your sources and DataHub. DataHub Cloud Observe offers out-of-the-box evaluation of the following kinds of assertions:
You can bulk create Freshness and Volume [Smart Assertions](/docs/managed-datahub/observe/smart-assertions.md) (AI Anomaly Monitors) across several tables at once via the [Data Health Dashboard](/docs/managed-datahub/observe/data-health-dashboard.md):
To bulk create column metric assertions on a given dataset, follow the steps under the **Anomaly Detection** section of [Column Assertion](https://docs.datahub.com/docs/managed-datahub/observe/column-assertions#anomaly-detection-with-smart-assertions-).
33
+
34
+
### AI Anomaly Detection (Smart Assertions)
24
35
25
36
There are many cases where either you do not have the time to figure out what a good rule for an assertion is, or strict rules simply do not suffice for your data validation needs. Traditional rule-based assertions can become inadequate when dealing with complex data patterns or large-scale operations.
26
37
27
38
**Common Scenarios**
28
39
29
40
Here are some typical situations where manual assertion rules fall short:
30
41
31
-
•**Seasonal data patterns** - A table whose row count changes exhibit weekly seasonality may need a different set of assertions for each day of the week, making static rules impractical to maintain.
42
+
-**Seasonal data patterns** - A table whose row count changes exhibit weekly seasonality may need a different set of assertions for each day of the week, making static rules impractical to maintain.
32
43
33
-
•**Statistical complexity across large datasets** - Figuring out what the expected standard deviation is for each column can be incredibly time consuming and not feasible across hundreds of tables, especially when each table has unique characteristics.
44
+
-**Statistical complexity across large datasets** - Figuring out what the expected standard deviation is for each column can be incredibly time consuming and not feasible across hundreds of tables, especially when each table has unique characteristics.
34
45
35
-
•**Dynamic data environments** - When data patterns evolve over time, manually updating assertion rules becomes a maintenance burden that can lead to false positives or missed anomalies.
46
+
-**Dynamic data environments** - When data patterns evolve over time, manually updating assertion rules becomes a maintenance burden that can lead to false positives or missed anomalies.
36
47
37
-
## The Smart Assertion Solution
48
+
###The Smart Assertion Solution
38
49
39
50
In these scenarios, you may want to consider creating a [Smart Assertion](./smart-assertions.md) to let machine learning automatically detect the normal patterns in your data and alert you when anomalies occur. This approach allows for more flexible and adaptive data quality monitoring without the overhead of manual rule maintenance.
40
51
@@ -77,8 +88,8 @@ There are a few ways DataHub Cloud assertions can be executed:
77
88
a. `Information Schema` tables are used by default to power cheap, fast checks on a table's freshness or row count.
78
89
b. `Audit log` or `Operation log` tables can be used to granularly monitor table operations.
79
90
c. The table itself can also be queried directly. This is useful for freshness checks referencing `last_updated` columns, row count checks targetting a subset of the data, and column value checks. We offer several optimizations to reduce query costs for these checks.
80
-
2. Reference DataHub profiling information
81
-
a. `Operation`s that are reported via ingestion or our SDKs can power monitoring table freshness.
91
+
2. Reference DataHub metadata
92
+
a. [Operations](/docs/api/tutorials/operations.md) that are reported via ingestion or our SDKs can power monitoring table freshness.
82
93
b. `DatasetProfile` and `SchemaFieldProfile` ingested or reported via SDKs can power monitoring table metrics and column metrics.
83
94
84
95
### Privacy: Execute In-Network, avoid exposing data externally
_Coming soon: we're making it easier to create Smart Assertions for multiple fields on a table, across multiple metrics, all in one go. If you're interested in this today, please let your DataHub representative know._
240
+
**Bulk Creating for Multiple Columns**
241
+
242
+
To select several columns on a table to monitor at once, you can use the **Bulk-Create Smart Assertions** button below the column selector in the Column Metric Assertion authoring UI.
**Coming soon:** in the upcoming releases we will be including the filters in the url parameters. This will make it incredibly easy for you to bookmark your specifi c
70
+
## Bulk Create Smart Assertions
71
+
72
+
[Smart Assertions](./smart-assertions.md) are AI Anomaly Checks that can be used to quickly 'strap a seatbelt' across your data landscape. You can hit the 'Bulk Create' button in the top right corner of the data health dashboard to quickly set up anomaly detection across your most important assets:
Copy file name to clipboardExpand all lines: docs/managed-datahub/observe/freshness-assertions.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -145,6 +145,8 @@ Freshness Assertions also have an off switch: they can be started or stopped at
145
145
146
146
Once these are in place, you're ready to create your Freshness Assertions!
147
147
148
+
You can also **Bulk Create Smart Assertions** via the [Data Health Page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions)
149
+
148
150
### Steps
149
151
150
152
1. Navigate to the Table that to monitor for freshness
You can also create Freshness & Volume Smart Assertions in bulk on the [Data Health page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions):
Copy file name to clipboardExpand all lines: docs/managed-datahub/observe/volume-assertions.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -139,6 +139,8 @@ Volume Assertions also have an off switch: they can be started or stopped at any
139
139
140
140
Once these are in place, you're ready to create your Volume Assertions!
141
141
142
+
You can also **Bulk Create Smart Assertions** via the [Data Health Page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions)
143
+
142
144
### Steps
143
145
144
146
1. Navigate to the Table that to monitor for volume
0 commit comments