Skip to content

Commit 66f36c2

Browse files
jayacryljjoyce0510
andauthored
feat(docs): 3.13 Observe docs (datahub-project#14265)
Co-authored-by: John Joyce <john@acryl.io>
1 parent e7ae66c commit 66f36c2

File tree

8 files changed

+83
-12
lines changed

8 files changed

+83
-12
lines changed

docs-website/sidebars.js

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,11 @@ module.exports = {
4848
type: "category",
4949
link: { type: "doc", id: "docs/managed-datahub/observe/assertions" },
5050
items: [
51+
{
52+
label: "Overview",
53+
type: "doc",
54+
id: "docs/managed-datahub/observe/assertions",
55+
},
5156
{
5257
label: "Column Assertions",
5358
type: "doc",
@@ -90,6 +95,12 @@ module.exports = {
9095
id: "docs/managed-datahub/observe/data-health-dashboard",
9196
className: "saasOnly",
9297
},
98+
{
99+
label: "Assertion Notes (Troubleshooting & Documentation)",
100+
type: "doc",
101+
id: "docs/managed-datahub/observe/assertion-notes",
102+
className: "saasOnly",
103+
},
93104
{
94105
label: "Open Assertions Specification",
95106
type: "category",
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
description: This page provides an overview of using Assertion Notes
3+
---
4+
5+
import FeatureAvailability from '@site/src/components/FeatureAvailability';
6+
7+
# Assertion Notes
8+
9+
<FeatureAvailability saasOnly />
10+
11+
> The **Assertion Notes** feature is available as part of the **DataHub Cloud Observe** module of DataHub Cloud.
12+
> If you are interested in learning more about **DataHub Cloud Observe** or trying it out, please [visit our website](https://datahub.com/products/data-observability/).
13+
14+
## Introduction
15+
16+
The Assertion notes feature aims to solve two key use cases:
17+
18+
1. Surfacing useful tips for engineers to troubleshoot and resolve data quality failures
19+
2. Documenting the purpose of a given check, and implications of its failiure; for instance, some checks may circuit-break pipelines.
20+
21+
### For Troubleshooting
22+
23+
As you scale your data quality coverage across a large data landscape, you will often find that the engineers who are troubleshooting and resolving an assertion failure are not the same people who created the check.
24+
Oftentimes, it's useful to provide troubleshooting instructions or notes with context about how to resolve the problem when a check fails.
25+
26+
- If the check was manually set up, it may be worthwhile for the creator to add notes for future on-call engineers
27+
- If it was an AI check, whoever is first to investigate the failure may want to document what they did to fix it.
28+
29+
### For Documenting
30+
31+
Adding notes to Assertions is useful for documenting your Assertions. This is particularly relevant for Custom SQL checks, where understanding the logic from the query statements can be difficult. By adding documentation in the notes tab, others can understand exactly what is being monitored and how to resolve issues in event of failure.
32+
33+
<iframe width="516" height="342" src="https://www.loom.com/embed/a6cb07d33e8440acafacea381912f904?sid=32918cd5-9ebf-4aa0-90bc-37fae84d1841" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

docs/managed-datahub/observe/assertions.md

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# Assertions
22

3-
:::note Contract Monitoring Support
4-
Currently we support Snowflake, Redshift, BigQuery, and Databricks for out-of-the-box contract monitoring as part of DataHub Cloud Observe.
3+
:::note Supported Data Platforms
4+
Currently we support monitoring data on Snowflake, Redshift, BigQuery, and Databricks as part of DataHub Cloud Observe.
5+
For other data platforms, DataHub Cloud Observe can monitor assertions against dataset metrics (such as volume, or column nullness) and dataset freshenss by using the ingested statistics for each asset.
6+
Column Value and Custom SQL Assertions are not currently supported for other data platforms.
57
:::
68

79
An assertion is **a data quality test that finds data that violates a specified rule.**
@@ -11,30 +13,39 @@ Assertions serve as the building blocks of [Data Contracts](/docs/managed-datahu
1113

1214
Data quality tests (a.k.a. assertions) can be created and run by DataHub Cloud or ingested from a 3rd party tool.
1315

14-
### DataHub Cloud Observe
16+
### DataHub Cloud Assertions
1517

1618
For DataHub-provided assertion runners, we can deploy an agent in your environment to hit your sources and DataHub. DataHub Cloud Observe offers out-of-the-box evaluation of the following kinds of assertions:
1719

1820
- [Freshness](/docs/managed-datahub/observe/freshness-assertions.md) (SLAs)
1921
- [Volume](/docs/managed-datahub/observe/volume-assertions.md)
2022
- [Custom SQL](/docs/managed-datahub/observe/custom-sql-assertions.md)
2123
- [Column](/docs/managed-datahub/observe/column-assertions.md)
24+
- [Schema](/docs/managed-datahub/observe/schema-assertions.md)
2225

23-
### Anomaly detection
26+
#### Bulk Creating Assertions
27+
28+
You can bulk create Freshness and Volume [Smart Assertions](/docs/managed-datahub/observe/smart-assertions.md) (AI Anomaly Monitors) across several tables at once via the [Data Health Dashboard](/docs/managed-datahub/observe/data-health-dashboard.md):
29+
30+
<div align="center"><iframe width="560" height="315" src="https://www.loom.com/embed/f6720541914645aab6b28cdff8695d9f?sid=58dff84d-bb88-4f02-b814-17fb4986ad1f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></div>
31+
32+
To bulk create column metric assertions on a given dataset, follow the steps under the **Anomaly Detection** section of [Column Assertion](https://docs.datahub.com/docs/managed-datahub/observe/column-assertions#anomaly-detection-with-smart-assertions-).
33+
34+
### AI Anomaly Detection (Smart Assertions)
2435

2536
There are many cases where either you do not have the time to figure out what a good rule for an assertion is, or strict rules simply do not suffice for your data validation needs. Traditional rule-based assertions can become inadequate when dealing with complex data patterns or large-scale operations.
2637

2738
**Common Scenarios**
2839

2940
Here are some typical situations where manual assertion rules fall short:
3041

31-
**Seasonal data patterns** - A table whose row count changes exhibit weekly seasonality may need a different set of assertions for each day of the week, making static rules impractical to maintain.
42+
- **Seasonal data patterns** - A table whose row count changes exhibit weekly seasonality may need a different set of assertions for each day of the week, making static rules impractical to maintain.
3243

33-
**Statistical complexity across large datasets** - Figuring out what the expected standard deviation is for each column can be incredibly time consuming and not feasible across hundreds of tables, especially when each table has unique characteristics.
44+
- **Statistical complexity across large datasets** - Figuring out what the expected standard deviation is for each column can be incredibly time consuming and not feasible across hundreds of tables, especially when each table has unique characteristics.
3445

35-
**Dynamic data environments** - When data patterns evolve over time, manually updating assertion rules becomes a maintenance burden that can lead to false positives or missed anomalies.
46+
- **Dynamic data environments** - When data patterns evolve over time, manually updating assertion rules becomes a maintenance burden that can lead to false positives or missed anomalies.
3647

37-
## The Smart Assertion Solution
48+
### The Smart Assertion Solution
3849

3950
In these scenarios, you may want to consider creating a [Smart Assertion](./smart-assertions.md) to let machine learning automatically detect the normal patterns in your data and alert you when anomalies occur. This approach allows for more flexible and adaptive data quality monitoring without the overhead of manual rule maintenance.
4051

@@ -77,8 +88,8 @@ There are a few ways DataHub Cloud assertions can be executed:
7788
a. `Information Schema` tables are used by default to power cheap, fast checks on a table's freshness or row count.
7889
b. `Audit log` or `Operation log` tables can be used to granularly monitor table operations.
7990
c. The table itself can also be queried directly. This is useful for freshness checks referencing `last_updated` columns, row count checks targetting a subset of the data, and column value checks. We offer several optimizations to reduce query costs for these checks.
80-
2. Reference DataHub profiling information
81-
a. `Operation`s that are reported via ingestion or our SDKs can power monitoring table freshness.
91+
2. Reference DataHub metadata
92+
a. [Operations](/docs/api/tutorials/operations.md) that are reported via ingestion or our SDKs can power monitoring table freshness.
8293
b. `DatasetProfile` and `SchemaFieldProfile` ingested or reported via SDKs can power monitoring table metrics and column metrics.
8394

8495
### Privacy: Execute In-Network, avoid exposing data externally

docs/managed-datahub/observe/column-assertions.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,11 @@ You can create smart assertions by simply selecting the column and the metric yo
237237
<img width="40%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/column/column-smart-assertion.png"/>
238238
</p>
239239

240-
_Coming soon: we're making it easier to create Smart Assertions for multiple fields on a table, across multiple metrics, all in one go. If you're interested in this today, please let your DataHub representative know._
240+
**Bulk Creating for Multiple Columns**
241+
242+
To select several columns on a table to monitor at once, you can use the **Bulk-Create Smart Assertions** button below the column selector in the Column Metric Assertion authoring UI.
243+
244+
<iframe width="560" height="343" src="https://www.loom.com/embed/e71598c4394c4d8dba0770b8fc67ff06?sid=25326338-8a72-4382-98b5-026486233ef9" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
241245

242246
## Stopping a Column Assertion
243247

docs/managed-datahub/observe/data-health-dashboard.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,8 @@ In addition, both the `By Tables` tab and the `Incidents` tab will apply your gl
6767
<img width="80%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/data-health/view-applied.png"/>
6868
</p>
6969

70-
**Coming soon:** in the upcoming releases we will be including the filters in the url parameters. This will make it incredibly easy for you to bookmark your specifi c
70+
## Bulk Create Smart Assertions
71+
72+
[Smart Assertions](./smart-assertions.md) are AI Anomaly Checks that can be used to quickly 'strap a seatbelt' across your data landscape. You can hit the 'Bulk Create' button in the top right corner of the data health dashboard to quickly set up anomaly detection across your most important assets:
73+
74+
<div align="center"><iframe width="560" height="315" src="https://www.loom.com/embed/f6720541914645aab6b28cdff8695d9f?sid=58dff84d-bb88-4f02-b814-17fb4986ad1f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></div>

docs/managed-datahub/observe/freshness-assertions.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,8 @@ Freshness Assertions also have an off switch: they can be started or stopped at
145145

146146
Once these are in place, you're ready to create your Freshness Assertions!
147147

148+
You can also **Bulk Create Smart Assertions** via the [Data Health Page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions)
149+
148150
### Steps
149151

150152
1. Navigate to the Table that to monitor for freshness

docs/managed-datahub/observe/smart-assertions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ Today, you can create Smart Assertions for 3 types of assertions. To learn more
2424
2. [Freshness](./freshness-assertions.md#anomaly-detection-with-smart-assertions-)
2525
3. [Column Metrics](./column-assertions.md#anomaly-detection-with-smart-assertions-)
2626

27+
You can also create Freshness & Volume Smart Assertions in bulk on the [Data Health page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions):
28+
29+
<div align="center"><iframe width="560" height="315" src="https://www.loom.com/embed/f6720541914645aab6b28cdff8695d9f?sid=58dff84d-bb88-4f02-b814-17fb4986ad1f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></div>
30+
2731
## Improving Smart assertion quality
2832

2933
You can improve predictions through two key levers:

docs/managed-datahub/observe/volume-assertions.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,8 @@ Volume Assertions also have an off switch: they can be started or stopped at any
139139

140140
Once these are in place, you're ready to create your Volume Assertions!
141141

142+
You can also **Bulk Create Smart Assertions** via the [Data Health Page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions)
143+
142144
### Steps
143145

144146
1. Navigate to the Table that to monitor for volume

0 commit comments

Comments
 (0)