You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/data_jobs/airflow.md
+56-8Lines changed: 56 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,54 @@ To get started, follow the instructions below.
46
46
openlineage-airflow
47
47
```
48
48
49
-
2. Configure `openlineage` provider. The simplest option is to set the following environment variables and make them available to pods where you run Airflow schedulers and Airflow workers:
49
+
2. Configure `openlineage` provider. Choose one of the following configuration options and set the environment variables, making them available to pods where you run Airflow schedulers and Airflow workers:
50
+
51
+
**Option 1: Datadog Transport (Recommended)**
52
+
53
+
**Requirements**: Requires `apache-airflow-providers-openlineage` version 2.7.3 or later and `openlineage-python` version 1.37.0 or later.
54
+
55
+
```shell
56
+
export DD_API_KEY=<DD_API_KEY>
57
+
export DD_SITE=<DD_SITE>
58
+
export OPENLINEAGE__TRANSPORT__TYPE=datadog
59
+
# OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
60
+
export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
61
+
```
62
+
* Replace `<DD_API_KEY>` with your valid [Datadog API key][4].
63
+
* Replace `<DD_SITE>` with your Datadog site (for example, {{< region-param key="dd_site" code="true" >}}).
64
+
65
+
**Option 2: Composite Transport**
66
+
67
+
**Requirements**: Requires `apache-airflow-providers-openlineage` version 1.11.0 or later and `openlineage-python` version 1.37.0 or later.
68
+
69
+
Use this option if you're already using OpenLineage with another system and want to add Datadog as an additional destination. The composite transport sends events to all configured transports.
70
+
71
+
For example, if you're using an HTTP transport to send events to another system:
# OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
86
+
export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
87
+
```
88
+
* Replace `<DD_API_KEY>` with your valid [Datadog API key][4].
89
+
* Replace `<DD_SITE>` with your Datadog site (for example, {{< region-param key="dd_site" code="true" >}}).
90
+
* Replace `<YOUR_EXISTING_URL>` and `<YOUR_EXISTING_API_KEY>` with your existing OpenLineage transport configuration.
91
+
92
+
In this example, OpenLineage events are sent to both your existing system and Datadog. You can configure multiple transports by giving each one a unique name (like `EXISTING` and `DATADOG` in the example above).
93
+
94
+
**Option 3: Simple Configuration**
95
+
96
+
This option uses the URL-based configuration and works with all versions of the OpenLineage provider:
@@ -56,6 +103,7 @@ To get started, follow the instructions below.
56
103
```
57
104
* Replace `<DD_DATA_OBSERVABILITY_INTAKE>` with `https://data-obs-intake.`{{< region-param key="dd_site" code="true" >}}.
58
105
* Replace `<DD_API_KEY>` with your valid [Datadog API key][4].
106
+
59
107
* If you're using **Airflow v2.7 or v2.8**, also add these two environment variables along with the previous ones. This fixes an OpenLinage config issue fixed at `apache-airflow-providers-openlineage` v1.7, while Airflow v2.7 and v2.8 use previous versions.
60
108
```shell
61
109
#!/bin/sh
@@ -68,21 +116,21 @@ To get started, follow the instructions below.
68
116
69
117
3. Trigger an update to your Airflow pods and waitfor them to finish.
70
118
71
-
4. Optionally, set up log collection forcorrelating task logs to DAG run executionsin Data Jobs Monitoring. Correlation requires the logs directory to follow the [default log filename format][6].
119
+
4. Optionally, set up log collection forcorrelating task logs to DAG run executionsin Data Jobs Monitoring. Correlation requires the logs directory to follow the [default log filename format][6].
72
120
73
121
The `PATH_TO_AIRFLOW_LOGS` value is `$AIRFLOW_HOME/logs`in standard deployments, but may differ if customized. Add the following annotation to your pod:
Adding `"source": "airflow"` enables the extraction of the correlation-required attributes by the [Airflow integration][8] logs pipeline.
79
-
127
+
80
128
These file paths are relative to the Agent container. Mount the directory containing the log file into both the application and Agent containers so the Agent can access it. For details, see [Collect logs from a container local log file][10].
81
129
82
130
**Note**: Log collection requires the Datadog agent to already be installed on your Kubernetes cluster. If you haven't installed it yet, see the [Kubernetes installation documentation][9].
83
131
84
132
For more methods to set up log collection on Kubernetes, see the [Kubernetes and Integrations configuration section][7].
@@ -270,18 +318,18 @@ To get started, follow the instructions below.
270
318
271
319
```text
272
320
{
273
-
"type": "http",
274
-
"url": "<DD_DATA_OBSERVABILITY_INTAKE>",
321
+
"type": "http",
322
+
"url": "<DD_DATA_OBSERVABILITY_INTAKE>",
275
323
"auth": {
276
-
"type": "api_key",
324
+
"type": "api_key",
277
325
"api_key": "<DD_API_KEY>"
278
326
}
279
327
}
280
328
```
281
329
282
330
* Replace `<DD_DATA_OBSERVABILITY_INTAKE>` fully with `https://data-obs-intake.`{{< region-param key="dd_site" code="true" >}}.
283
331
* Replace `<DD_API_KEY>` fully with your valid [Datadog API key][5].
284
-
332
+
285
333
286
334
Check official [Airflow][4] and [Composer][3] documentation pages for other supported configurations of the `openlineage` provider in Google Cloud Composer.
0 commit comments