Skip to content

Commit d342b0f

Browse files
authored
Merge pull request #4725 from ClickHouse/add-onelake
Create onelake_catalog.md
2 parents 16646f1 + f5d0e1f commit d342b0f

File tree

2 files changed

+174
-1
lines changed

2 files changed

+174
-1
lines changed
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
---
2+
slug: /use-cases/data-lake/onelake-catalog
3+
sidebar_label: 'Fabric OneLake'
4+
title: 'Fabric OneLake'
5+
pagination_prev: null
6+
pagination_next: null
7+
description: 'In this guide, we will walk you through the steps to query your data in Microsoft OneLake.'
8+
keywords: ['OneLake', 'Data Lake', 'Fabric']
9+
show_related_blogs: true
10+
doc_type: 'guide'
11+
---
12+
13+
import BetaBadge from '@theme/badges/BetaBadge';
14+
15+
<BetaBadge/>
16+
17+
ClickHouse supports integration with multiple catalogs (OneLake, Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your data stored in Microsoft OneLake using ClickHouse and [OneLake](https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview).
18+
19+
Microsoft OneLake supports multiple table formats for their lakehouse. With ClickHouse, you can query Iceberg tables.
20+
21+
:::note
22+
As this feature is beta, you will need to enable it using:
23+
`SET allow_database_iceberg = 1;`
24+
:::
25+
26+
## Gathering Requirements OneLake {#gathering-requirements}
27+
28+
Before querying your table in Microsoft Fabric, you'll need to collect the following information:
29+
30+
- A OneLake tenant ID (Your Entra ID)
31+
- A client ID
32+
- A client secret
33+
- A warehouse ID and a data item ID
34+
35+
See [Microsoft OneLake's documentation](http://learn.microsoft.com/en-us/fabric/onelake/table-apis/table-apis-overview#prerequisites) for help finding these values.
36+
37+
## Creating a connection between OneLake and ClickHouse {#creating-a-connection-between-unity-catalog-and-clickhouse}
38+
39+
With the required info above you can now create a connection between Microsoft OneLake and ClickHouse, but before that you need to enable catalogs:
40+
41+
```sql
42+
SET allow_database_iceberg=1
43+
```
44+
45+
### Connect to OneLake {#connect-onelake}
46+
47+
```sql
48+
CREATE DATABASE onelake_catalog
49+
ENGINE = DataLakeCatalog('https://onelake.table.fabric.microsoft.com/iceberg')
50+
SETTINGS
51+
catalog_type = 'onelake',
52+
warehouse = 'warehouse_id/data_item_id',
53+
onelake_tenant_id = '<tenant_id>',
54+
oauth_server_uri = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token',
55+
auth_scope = 'https://storage.azure.com/.default',
56+
onelake_client_id = '<client_id>',
57+
onelake_client_secret = '<client_secret>'
58+
```
59+
60+
## Querying OneLake using ClickHouse {#querying-onelake-using-clickhouse}
61+
62+
Now that the connection is in place, you can start querying OneLake:
63+
64+
```sql
65+
SHOW TABLES FROM onelake_catalog
66+
67+
Query id: 8f6124c4-45c2-4351-b49a-89dc13e548a7
68+
69+
┌─name──────────────────────────┐
70+
1. │ year_2017.green_tripdata_2017
71+
2. │ year_2018.green_tripdata_2018
72+
3. │ year_2019.green_tripdata_2019
73+
4. │ year_2020.green_tripdata_2020
74+
5. │ year_2022.green_tripdata_2022
75+
└───────────────────────────────┘
76+
```
77+
78+
If you're using the Iceberg client, only the Delta tables with Uniform-enabled will be shown:
79+
80+
To query a table:
81+
82+
```sql
83+
SELECT *
84+
FROM onelake_catalog.`year_2017.green_tripdata_2017`
85+
LIMIT 1
86+
87+
Query id: db6b4bda-cc58-4ca1-8891-e0d14f02c890
88+
89+
Row 1:
90+
──────
91+
VendorID: 2
92+
lpep_pickup_datetime: 2017-05-18 16:55:43.000000
93+
lpep_dropoff_datetime: 2017-05-18 18:04:11.000000
94+
store_and_fwd_flag: N
95+
RatecodeID: 2
96+
PULocationID: 130
97+
DOLocationID: 48
98+
passenger_count: 2
99+
trip_distance: 12.43
100+
fare_amount: 52
101+
extra: 4.5
102+
mta_tax: 0.5
103+
tip_amount: 0
104+
tolls_amount: 33
105+
ehail_fee: ᴺᵁᴸᴸ
106+
improvement_surcharge: 0.3
107+
total_amount: 90.3
108+
payment_type: 2
109+
trip_type: 1
110+
congestion_surcharge: ᴺᵁᴸᴸ
111+
source_file: green_tripdata_2017-05.parquet
112+
```
113+
114+
:::note Backticks required
115+
Backticks are required because ClickHouse doesn't support more than one namespace.
116+
:::
117+
118+
To inspect the table DDL:
119+
120+
```sql
121+
SHOW CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017`
122+
123+
Query id: 8bd5bd8e-83be-453e-9a88-32de12ba7f24
124+
125+
┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
126+
1. │ CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017` ↴│
127+
│↳( ↴│
128+
│↳ `VendorID` Nullable(Int64), ↴│
129+
│↳ `lpep_pickup_datetime` Nullable(DateTime64(6, 'UTC')), ↴│
130+
│↳ `lpep_dropoff_datetime` Nullable(DateTime64(6, 'UTC')), ↴│
131+
│↳ `store_and_fwd_flag` Nullable(String), ↴│
132+
│↳ `RatecodeID` Nullable(Int64), ↴│
133+
│↳ `PULocationID` Nullable(Int64), ↴│
134+
│↳ `DOLocationID` Nullable(Int64), ↴│
135+
│↳ `passenger_count` Nullable(Int64), ↴│
136+
│↳ `trip_distance` Nullable(Float64), ↴│
137+
│↳ `fare_amount` Nullable(Float64), ↴│
138+
│↳ `extra` Nullable(Float64), ↴│
139+
│↳ `mta_tax` Nullable(Float64), ↴│
140+
│↳ `tip_amount` Nullable(Float64), ↴│
141+
│↳ `tolls_amount` Nullable(Float64), ↴│
142+
│↳ `ehail_fee` Nullable(Float64), ↴│
143+
│↳ `improvement_surcharge` Nullable(Float64), ↴│
144+
│↳ `total_amount` Nullable(Float64), ↴│
145+
│↳ `payment_type` Nullable(Int64), ↴│
146+
│↳ `trip_type` Nullable(Int64), ↴│
147+
│↳ `congestion_surcharge` Nullable(Float64), ↴│
148+
│↳ `source_file` Nullable(String) ↴│
149+
│↳) ↴│
150+
│↳ENGINE = Iceberg('abfss://<warehouse_id>@onelake.dfs.fabric.microsoft.com/<data_item_id>/Tables/year_2017/green_tripdata_2017') │
151+
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
152+
```
153+
154+
## Loading data from your Data Lake into ClickHouse {#loading-data-from-onelake-into-clickhouse}
155+
156+
If you need to load data from OneLake into ClickHouse:
157+
158+
```sql
159+
CREATE TABLE trips
160+
ENGINE = MergeTree
161+
ORDER BY coalesce(VendorID, 0)
162+
AS SELECT *
163+
FROM onelake_catalog.`year_2017.green_tripdata_2017`
164+
165+
Query id: d15983a6-ef6a-40fe-80d5-19274b9fe328
166+
167+
Ok.
168+
169+
0 rows in set. Elapsed: 32.570 sec. Processed 11.74 million rows, 275.37 MB (360.36 thousand rows/s., 8.45 MB/s.)
170+
Peak memory usage: 1.31 GiB.
171+
```

scripts/aspell-ignore/en/aspell-dict.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
personal_ws-1.1 en 3818
1+
personal_ws-1.1 en 3820
22
AArch
33
ACLs
44
AICPA
@@ -3831,3 +3831,5 @@ SpanName
38313831
lucene
38323832
TrackedLink
38333833
eventName
3834+
OneLake
3835+
OneLake's

0 commit comments

Comments
 (0)