Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions docs/use-cases/data_lake/onelake_catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
slug: /use-cases/data-lake/onelake-catalog
sidebar_label: 'Fabric OneLake'
title: 'Fabric OneLake'
pagination_prev: null
pagination_next: null
description: 'In this guide, we will walk you through the steps to query your data in Microsoft OneLake.'
keywords: ['OneLake', 'Data Lake', 'Fabric']
show_related_blogs: true
doc_type: 'guide'
---

import BetaBadge from '@theme/badges/BetaBadge';

<BetaBadge/>

ClickHouse supports integration with multiple catalogs (OneLake, Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your data stored in Microsoft OneLake using ClickHouse and [OneLake](https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview).

Microsoft OneLake supports multiple table formats for their lakehouse. With ClickHouse, you can query Iceberg tables.

:::note
As this feature is beta, you will need to enable it using:
`SET allow_database_iceberg = 1;`
:::

## Gathering Requirements OneLake {#gathering-requirements}

Before querying your table in Microsoft Fabric, you'll need to collect the following information:

- A OneLake tenant ID (Your Entra ID)
- A client ID
- A client secret
- A warehouse ID and a data item ID

See [Microsoft OneLake's documentation](http://learn.microsoft.com/en-us/fabric/onelake/table-apis/table-apis-overview#prerequisites) for help finding these values.

## Creating a connection between OneLake and ClickHouse {#creating-a-connection-between-unity-catalog-and-clickhouse}

With the required info above you can now create a connection between Microsoft OneLake and ClickHouse, but before that you need to enable catalogs:

```sql
SET allow_database_iceberg=1
```

### Connect to OneLake {#connect-onelake}

```sql
CREATE DATABASE onelake_catalog
ENGINE = DataLakeCatalog('https://onelake.table.fabric.microsoft.com/iceberg')
SETTINGS
catalog_type = 'onelake',
warehouse = 'warehouse_id/data_item_id',
onelake_tenant_id = '<tenant_id>',
oauth_server_uri = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token',
auth_scope = 'https://storage.azure.com/.default',
onelake_client_id = '<client_id>',
onelake_client_secret = '<client_secret>'
```

## Querying OneLake using ClickHouse {#querying-onelake-using-clickhouse}

Now that the connection is in place, you can start querying OneLake:

```sql
SHOW TABLES FROM onelake_catalog

Query id: 8f6124c4-45c2-4351-b49a-89dc13e548a7

┌─name──────────────────────────┐
1. │ year_2017.green_tripdata_2017 │
2. │ year_2018.green_tripdata_2018 │
3. │ year_2019.green_tripdata_2019 │
4. │ year_2020.green_tripdata_2020 │
5. │ year_2022.green_tripdata_2022 │
└───────────────────────────────┘
```

If you're using the Iceberg client, only the Delta tables with Uniform-enabled will be shown:

To query a table:

```sql
SELECT *
FROM onelake_catalog.`year_2017.green_tripdata_2017`
LIMIT 1

Query id: db6b4bda-cc58-4ca1-8891-e0d14f02c890

Row 1:
──────
VendorID: 2
lpep_pickup_datetime: 2017-05-18 16:55:43.000000
lpep_dropoff_datetime: 2017-05-18 18:04:11.000000
store_and_fwd_flag: N
RatecodeID: 2
PULocationID: 130
DOLocationID: 48
passenger_count: 2
trip_distance: 12.43
fare_amount: 52
extra: 4.5
mta_tax: 0.5
tip_amount: 0
tolls_amount: 33
ehail_fee: ᴺᵁᴸᴸ
improvement_surcharge: 0.3
total_amount: 90.3
payment_type: 2
trip_type: 1
congestion_surcharge: ᴺᵁᴸᴸ
source_file: green_tripdata_2017-05.parquet
```

:::note Backticks required
Backticks are required because ClickHouse doesn't support more than one namespace.
:::

To inspect the table DDL:

```sql
SHOW CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017`

Query id: 8bd5bd8e-83be-453e-9a88-32de12ba7f24

┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017` ↴│
│↳( ↴│
│↳ `VendorID` Nullable(Int64), ↴│
│↳ `lpep_pickup_datetime` Nullable(DateTime64(6, 'UTC')), ↴│
│↳ `lpep_dropoff_datetime` Nullable(DateTime64(6, 'UTC')), ↴│
│↳ `store_and_fwd_flag` Nullable(String), ↴│
│↳ `RatecodeID` Nullable(Int64), ↴│
│↳ `PULocationID` Nullable(Int64), ↴│
│↳ `DOLocationID` Nullable(Int64), ↴│
│↳ `passenger_count` Nullable(Int64), ↴│
│↳ `trip_distance` Nullable(Float64), ↴│
│↳ `fare_amount` Nullable(Float64), ↴│
│↳ `extra` Nullable(Float64), ↴│
│↳ `mta_tax` Nullable(Float64), ↴│
│↳ `tip_amount` Nullable(Float64), ↴│
│↳ `tolls_amount` Nullable(Float64), ↴│
│↳ `ehail_fee` Nullable(Float64), ↴│
│↳ `improvement_surcharge` Nullable(Float64), ↴│
│↳ `total_amount` Nullable(Float64), ↴│
│↳ `payment_type` Nullable(Int64), ↴│
│↳ `trip_type` Nullable(Int64), ↴│
│↳ `congestion_surcharge` Nullable(Float64), ↴│
│↳ `source_file` Nullable(String) ↴│
│↳) ↴│
│↳ENGINE = Iceberg('abfss://<warehouse_id>@onelake.dfs.fabric.microsoft.com/<data_item_id>/Tables/year_2017/green_tripdata_2017') │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

## Loading data from your Data Lake into ClickHouse {#loading-data-from-onelake-into-clickhouse}

If you need to load data from OneLake into ClickHouse:

```sql
CREATE TABLE trips
ENGINE = MergeTree
ORDER BY coalesce(VendorID, 0)
AS SELECT *
FROM onelake_catalog.`year_2017.green_tripdata_2017`

Query id: d15983a6-ef6a-40fe-80d5-19274b9fe328

Ok.

0 rows in set. Elapsed: 32.570 sec. Processed 11.74 million rows, 275.37 MB (360.36 thousand rows/s., 8.45 MB/s.)
Peak memory usage: 1.31 GiB.
```
4 changes: 3 additions & 1 deletion scripts/aspell-ignore/en/aspell-dict.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
personal_ws-1.1 en 3818
personal_ws-1.1 en 3820
AArch
ACLs
AICPA
Expand Down Expand Up @@ -3831,3 +3831,5 @@ SpanName
lucene
TrackedLink
eventName
OneLake
OneLake's