Skip to content

Commit c8c6465

Browse files
harelsdussault-antoine
authored andcommitted
Update Kubernetes Airflow section to use Datadog Transport and add composite transport option (#32770)
1 parent a5cff15 commit c8c6465

File tree

1 file changed

+56
-8
lines changed

1 file changed

+56
-8
lines changed

content/en/data_jobs/airflow.md

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,54 @@ To get started, follow the instructions below.
4646
openlineage-airflow
4747
```
4848
49-
2. Configure `openlineage` provider. The simplest option is to set the following environment variables and make them available to pods where you run Airflow schedulers and Airflow workers:
49+
2. Configure `openlineage` provider. Choose one of the following configuration options and set the environment variables, making them available to pods where you run Airflow schedulers and Airflow workers:
50+
51+
**Option 1: Datadog Transport (Recommended)**
52+
53+
**Requirements**: Requires `apache-airflow-providers-openlineage` version 2.7.3 or later and `openlineage-python` version 1.37.0 or later.
54+
55+
```shell
56+
export DD_API_KEY=<DD_API_KEY>
57+
export DD_SITE=<DD_SITE>
58+
export OPENLINEAGE__TRANSPORT__TYPE=datadog
59+
# OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
60+
export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
61+
```
62+
* Replace `<DD_API_KEY>` with your valid [Datadog API key][4].
63+
* Replace `<DD_SITE>` with your Datadog site (for example, {{< region-param key="dd_site" code="true" >}}).
64+
65+
**Option 2: Composite Transport**
66+
67+
**Requirements**: Requires `apache-airflow-providers-openlineage` version 1.11.0 or later and `openlineage-python` version 1.37.0 or later.
68+
69+
Use this option if you're already using OpenLineage with another system and want to add Datadog as an additional destination. The composite transport sends events to all configured transports.
70+
71+
For example, if you're using an HTTP transport to send events to another system:
72+
73+
```shell
74+
# Your existing HTTP transport configuration
75+
export OPENLINEAGE__TRANSPORT__TYPE=composite
76+
export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__TYPE=http
77+
export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__URL=<YOUR_EXISTING_URL>
78+
export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__AUTH__TYPE=api_key
79+
export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__AUTH__API_KEY=<YOUR_EXISTING_API_KEY>
80+
81+
# Add Datadog as an additional transport
82+
export DD_API_KEY=<DD_API_KEY>
83+
export DD_SITE=<DD_SITE>
84+
export OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__TYPE=datadog
85+
# OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
86+
export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
87+
```
88+
* Replace `<DD_API_KEY>` with your valid [Datadog API key][4].
89+
* Replace `<DD_SITE>` with your Datadog site (for example, {{< region-param key="dd_site" code="true" >}}).
90+
* Replace `<YOUR_EXISTING_URL>` and `<YOUR_EXISTING_API_KEY>` with your existing OpenLineage transport configuration.
91+
92+
In this example, OpenLineage events are sent to both your existing system and Datadog. You can configure multiple transports by giving each one a unique name (like `EXISTING` and `DATADOG` in the example above).
93+
94+
**Option 3: Simple Configuration**
95+
96+
This option uses the URL-based configuration and works with all versions of the OpenLineage provider:
5097

5198
```shell
5299
export OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
@@ -56,6 +103,7 @@ To get started, follow the instructions below.
56103
```
57104
* Replace `<DD_DATA_OBSERVABILITY_INTAKE>` with `https://data-obs-intake.`{{< region-param key="dd_site" code="true" >}}.
58105
* Replace `<DD_API_KEY>` with your valid [Datadog API key][4].
106+
59107
* If you're using **Airflow v2.7 or v2.8**, also add these two environment variables along with the previous ones. This fixes an OpenLinage config issue fixed at `apache-airflow-providers-openlineage` v1.7, while Airflow v2.7 and v2.8 use previous versions.
60108
```shell
61109
#!/bin/sh
@@ -68,21 +116,21 @@ To get started, follow the instructions below.
68116

69117
3. Trigger an update to your Airflow pods and wait for them to finish.
70118

71-
4. Optionally, set up log collection for correlating task logs to DAG run executions in Data Jobs Monitoring. Correlation requires the logs directory to follow the [default log filename format][6].
119+
4. Optionally, set up log collection for correlating task logs to DAG run executions in Data Jobs Monitoring. Correlation requires the logs directory to follow the [default log filename format][6].
72120

73121
The `PATH_TO_AIRFLOW_LOGS` value is `$AIRFLOW_HOME/logs` in standard deployments, but may differ if customized. Add the following annotation to your pod:
74122
```yaml
75123
ad.datadoghq.com/base.logs: '[{"type": "file", "path": "PATH_TO_AIRFLOW_LOGS/*/*/*/*.log", "source": "airflow"}]'
76124
```
77125

78126
Adding `"source": "airflow"` enables the extraction of the correlation-required attributes by the [Airflow integration][8] logs pipeline.
79-
127+
80128
These file paths are relative to the Agent container. Mount the directory containing the log file into both the application and Agent containers so the Agent can access it. For details, see [Collect logs from a container local log file][10].
81129

82130
**Note**: Log collection requires the Datadog agent to already be installed on your Kubernetes cluster. If you haven't installed it yet, see the [Kubernetes installation documentation][9].
83131
84132
For more methods to set up log collection on Kubernetes, see the [Kubernetes and Integrations configuration section][7].
85-
133+
86134
87135
[1]: https://github.com/apache/airflow/releases/tag/2.5.0
88136
[2]: https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html
@@ -270,18 +318,18 @@ To get started, follow the instructions below.
270318
271319
```text
272320
{
273-
"type": "http",
274-
"url": "<DD_DATA_OBSERVABILITY_INTAKE>",
321+
"type": "http",
322+
"url": "<DD_DATA_OBSERVABILITY_INTAKE>",
275323
"auth": {
276-
"type": "api_key",
324+
"type": "api_key",
277325
"api_key": "<DD_API_KEY>"
278326
}
279327
}
280328
```
281329
282330
* Replace `<DD_DATA_OBSERVABILITY_INTAKE>` fully with `https://data-obs-intake.`{{< region-param key="dd_site" code="true" >}}.
283331
* Replace `<DD_API_KEY>` fully with your valid [Datadog API key][5].
284-
332+
285333
286334
Check official [Airflow][4] and [Composer][3] documentation pages for other supported configurations of the `openlineage` provider in Google Cloud Composer.
287335

0 commit comments

Comments
 (0)