|
| 1 | +--- |
| 2 | +slug: /use-cases/data-lake/onelake-catalog |
| 3 | +sidebar_label: 'Fabric OneLake' |
| 4 | +title: 'Fabric OneLake' |
| 5 | +pagination_prev: null |
| 6 | +pagination_next: null |
| 7 | +description: 'In this guide, we will walk you through the steps to query your data in Microsoft OneLake.' |
| 8 | +keywords: ['OneLake', 'Data Lake', 'Fabric'] |
| 9 | +show_related_blogs: true |
| 10 | +doc_type: 'guide' |
| 11 | +--- |
| 12 | + |
| 13 | +import BetaBadge from '@theme/badges/BetaBadge'; |
| 14 | + |
| 15 | +<BetaBadge/> |
| 16 | + |
| 17 | +ClickHouse supports integration with multiple catalogs (OneLake, Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your data stored in Microsoft OneLake using ClickHouse and [OneLake](https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview). |
| 18 | + |
| 19 | +Microsoft OneLake supports multiple table formats for their lakehouse. With ClickHouse, you can query Iceberg tables. |
| 20 | + |
| 21 | +:::note |
| 22 | +As this feature is beta, you will need to enable it using: |
| 23 | +`SET allow_database_iceberg = 1;` |
| 24 | +::: |
| 25 | + |
| 26 | +## Gathering Requirements OneLake {#gathering-requirements} |
| 27 | + |
| 28 | +Before querying your table in Microsoft Fabric, you'll need to collect the following information: |
| 29 | + |
| 30 | +- A OneLake tenant ID (Your Entra ID) |
| 31 | +- A client ID |
| 32 | +- A client secret |
| 33 | +- A warehouse ID and a data item ID |
| 34 | + |
| 35 | +See [Microsoft OneLake's documentation](http://learn.microsoft.com/en-us/fabric/onelake/table-apis/table-apis-overview#prerequisites) for help finding these values. |
| 36 | + |
| 37 | +## Creating a connection between OneLake and ClickHouse {#creating-a-connection-between-unity-catalog-and-clickhouse} |
| 38 | + |
| 39 | +With the required info above you can now create a connection between Microsoft OneLake and ClickHouse, but before that you need to enable catalogs: |
| 40 | + |
| 41 | +```sql |
| 42 | +SET allow_database_iceberg=1 |
| 43 | +``` |
| 44 | + |
| 45 | +### Connect to OneLake {#connect-onelake} |
| 46 | + |
| 47 | +```sql |
| 48 | +CREATE DATABASE onelake_catalog |
| 49 | +ENGINE = DataLakeCatalog('https://onelake.table.fabric.microsoft.com/iceberg') |
| 50 | +SETTINGS |
| 51 | +catalog_type = 'onelake', |
| 52 | +warehouse = 'warehouse_id/data_item_id', |
| 53 | +onelake_tenant_id = '<tenant_id>', |
| 54 | +oauth_server_uri = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token', |
| 55 | +auth_scope = 'https://storage.azure.com/.default', |
| 56 | +onelake_client_id = '<client_id>', |
| 57 | +onelake_client_secret = '<client_secret>' |
| 58 | +``` |
| 59 | + |
| 60 | +## Querying OneLake using ClickHouse {#querying-onelake-using-clickhouse} |
| 61 | + |
| 62 | +Now that the connection is in place, you can start querying OneLake: |
| 63 | + |
| 64 | +```sql |
| 65 | +SHOW TABLES FROM onelake_catalog |
| 66 | + |
| 67 | +Query id: 8f6124c4-45c2-4351-b49a-89dc13e548a7 |
| 68 | + |
| 69 | + ┌─name──────────────────────────┐ |
| 70 | +1. │ year_2017.green_tripdata_2017 │ |
| 71 | +2. │ year_2018.green_tripdata_2018 │ |
| 72 | +3. │ year_2019.green_tripdata_2019 │ |
| 73 | +4. │ year_2020.green_tripdata_2020 │ |
| 74 | +5. │ year_2022.green_tripdata_2022 │ |
| 75 | + └───────────────────────────────┘ |
| 76 | +``` |
| 77 | + |
| 78 | +If you're using the Iceberg client, only the Delta tables with Uniform-enabled will be shown: |
| 79 | + |
| 80 | +To query a table: |
| 81 | + |
| 82 | +```sql |
| 83 | +SELECT * |
| 84 | +FROM onelake_catalog.`year_2017.green_tripdata_2017` |
| 85 | +LIMIT 1 |
| 86 | + |
| 87 | +Query id: db6b4bda-cc58-4ca1-8891-e0d14f02c890 |
| 88 | + |
| 89 | +Row 1: |
| 90 | +────── |
| 91 | +VendorID: 2 |
| 92 | +lpep_pickup_datetime: 2017-05-18 16:55:43.000000 |
| 93 | +lpep_dropoff_datetime: 2017-05-18 18:04:11.000000 |
| 94 | +store_and_fwd_flag: N |
| 95 | +RatecodeID: 2 |
| 96 | +PULocationID: 130 |
| 97 | +DOLocationID: 48 |
| 98 | +passenger_count: 2 |
| 99 | +trip_distance: 12.43 |
| 100 | +fare_amount: 52 |
| 101 | +extra: 4.5 |
| 102 | +mta_tax: 0.5 |
| 103 | +tip_amount: 0 |
| 104 | +tolls_amount: 33 |
| 105 | +ehail_fee: ᴺᵁᴸᴸ |
| 106 | +improvement_surcharge: 0.3 |
| 107 | +total_amount: 90.3 |
| 108 | +payment_type: 2 |
| 109 | +trip_type: 1 |
| 110 | +congestion_surcharge: ᴺᵁᴸᴸ |
| 111 | +source_file: green_tripdata_2017-05.parquet |
| 112 | +``` |
| 113 | + |
| 114 | +:::note Backticks required |
| 115 | +Backticks are required because ClickHouse doesn't support more than one namespace. |
| 116 | +::: |
| 117 | + |
| 118 | +To inspect the table DDL: |
| 119 | + |
| 120 | +```sql |
| 121 | +SHOW CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017` |
| 122 | + |
| 123 | +Query id: 8bd5bd8e-83be-453e-9a88-32de12ba7f24 |
| 124 | + |
| 125 | + ┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ |
| 126 | +1. │ CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017` ↴│ |
| 127 | + │↳( ↴│ |
| 128 | + │↳ `VendorID` Nullable(Int64), ↴│ |
| 129 | + │↳ `lpep_pickup_datetime` Nullable(DateTime64(6, 'UTC')), ↴│ |
| 130 | + │↳ `lpep_dropoff_datetime` Nullable(DateTime64(6, 'UTC')), ↴│ |
| 131 | + │↳ `store_and_fwd_flag` Nullable(String), ↴│ |
| 132 | + │↳ `RatecodeID` Nullable(Int64), ↴│ |
| 133 | + │↳ `PULocationID` Nullable(Int64), ↴│ |
| 134 | + │↳ `DOLocationID` Nullable(Int64), ↴│ |
| 135 | + │↳ `passenger_count` Nullable(Int64), ↴│ |
| 136 | + │↳ `trip_distance` Nullable(Float64), ↴│ |
| 137 | + │↳ `fare_amount` Nullable(Float64), ↴│ |
| 138 | + │↳ `extra` Nullable(Float64), ↴│ |
| 139 | + │↳ `mta_tax` Nullable(Float64), ↴│ |
| 140 | + │↳ `tip_amount` Nullable(Float64), ↴│ |
| 141 | + │↳ `tolls_amount` Nullable(Float64), ↴│ |
| 142 | + │↳ `ehail_fee` Nullable(Float64), ↴│ |
| 143 | + │↳ `improvement_surcharge` Nullable(Float64), ↴│ |
| 144 | + │↳ `total_amount` Nullable(Float64), ↴│ |
| 145 | + │↳ `payment_type` Nullable(Int64), ↴│ |
| 146 | + │↳ `trip_type` Nullable(Int64), ↴│ |
| 147 | + │↳ `congestion_surcharge` Nullable(Float64), ↴│ |
| 148 | + │↳ `source_file` Nullable(String) ↴│ |
| 149 | + │↳) ↴│ |
| 150 | + │↳ENGINE = Iceberg('abfss://<warehouse_id>@onelake.dfs.fabric.microsoft.com/<data_item_id>/Tables/year_2017/green_tripdata_2017') │ |
| 151 | + └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
| 152 | +``` |
| 153 | + |
| 154 | +## Loading data from your Data Lake into ClickHouse {#loading-data-from-onelake-into-clickhouse} |
| 155 | + |
| 156 | +If you need to load data from OneLake into ClickHouse: |
| 157 | + |
| 158 | +```sql |
| 159 | +CREATE TABLE trips |
| 160 | +ENGINE = MergeTree |
| 161 | +ORDER BY coalesce(VendorID, 0) |
| 162 | +AS SELECT * |
| 163 | +FROM onelake_catalog.`year_2017.green_tripdata_2017` |
| 164 | + |
| 165 | +Query id: d15983a6-ef6a-40fe-80d5-19274b9fe328 |
| 166 | + |
| 167 | +Ok. |
| 168 | + |
| 169 | +0 rows in set. Elapsed: 32.570 sec. Processed 11.74 million rows, 275.37 MB (360.36 thousand rows/s., 8.45 MB/s.) |
| 170 | +Peak memory usage: 1.31 GiB. |
| 171 | +``` |
0 commit comments