Skip to content

Commit 16bd351

Browse files
committed
add page for Marimo notebooks
1 parent 1becfaa commit 16bd351

File tree

4 files changed

+172
-1
lines changed

4 files changed

+172
-1
lines changed

docs/use-cases/AI_ML/jupyter-notebook.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
slug: /use-cases/AI/jupyter-notebook
3-
sidebar_label: 'Exploring data in Jupyter notebooks with chDB'
3+
sidebar_label: 'Exploring data with Jupyter notebooks and chDB'
44
title: 'Exploring data in Jupyter notebooks with chDB'
55
description: 'This guide explains how to setup and use chDB to explore data from ClickHouse Cloud or local files in Jupyer notebooks'
66
keywords: ['ML', 'Jupyer', 'chDB', 'pandas']
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
---
2+
slug: /use-cases/AI/jupyter-notebook
3+
sidebar_label: 'Exploring data with Marimo notebooks and chDB'
4+
title: 'Exploring data with Marimo notebooks and chDB'
5+
description: 'This guide explains how to setup and use chDB to explore data from ClickHouse Cloud or local files in Marimo notebooks'
6+
keywords: ['ML', 'Marimo', 'chDB', 'pandas']
7+
doc_type: 'guide'
8+
---
9+
10+
import Image from '@theme/IdealImage';
11+
import image_1 from '@site/static/images/use-cases/AI_ML/jupyter/1.png';
12+
import image_2 from '@site/static/images/use-cases/AI_ML/jupyter/2.png';
13+
import image_3 from '@site/static/images/use-cases/AI_ML/jupyter/3.png';
14+
import image_4 from '@site/static/images/use-cases/AI_ML/Marimo/4.png';
15+
import image_5 from '@site/static/images/use-cases/AI_ML/Marimo/5.png';
16+
17+
In this guide, you will learn how you can explore a dataset on ClickHouse Cloud data in Marimo notebook with the help of [chDB](/docs/chdb) - a fast in-process SQL OLAP Engine powered by ClickHouse.
18+
19+
**Prerequisites:**
20+
- Python 3.8 or higher
21+
- a virtual environment
22+
- a working ClickHouse Cloud service and your [connection details](/docs/cloud/guides/sql-console/gather-connection-details)
23+
24+
**What you'll learn:**
25+
- Connect to ClickHouse Cloud from Marimo notebooks using chDB
26+
- Query remote datasets and convert results to Pandas DataFrames
27+
- Combine cloud data with local CSV files for analysis
28+
- Visualize data using Plotly in Marimo's reactive environment
29+
- Leverage Marimo's reactive execution model for interactive data exploration
30+
31+
We'll be using the UK Property Price dataset which is available on ClickHouse Cloud as one of the starter datasets.
32+
It contains data about the prices that houses were sold for in the United Kingdom from 1995 to 2024.
33+
34+
## Setup {#setup}
35+
36+
### Loading the dataset {#loading-the-dataset}
37+
38+
To add this dataset to an existing ClickHouse Cloud service, login to [console.clickhouse.cloud](https://console.clickhouse.cloud/) with your account details.
39+
40+
In the left hand menu, click on `Data sources`. Then click `Predefined sample data`:
41+
42+
<Image size="md" img={image_1} alt="Add example data set"/>
43+
44+
Select `Get started` in the UK property price paid data (4GB) card:
45+
46+
<Image size="md" img={image_2} alt="Select UK price paid dataset"/>
47+
48+
Then click `Import dataset`:
49+
50+
<Image size="md" img={image_3} alt="Import UK price paid dataset"/>
51+
52+
ClickHouse will automatically create the `pp_complete` table in the `default` database and fill the table with 28.92 million rows of price point data.
53+
54+
In order to reduce the likelihood of exposing your credentials, we recommend to add your Cloud username and password as environment variables on your local machine.
55+
From a terminal run the following command to add your username and password as environment variables:
56+
57+
### Setting up credentials {#setting-up-credentials}
58+
59+
```bash
60+
export CLICKHOUSE_CLOUD_HOSTNAME=<HOSTNAME>
61+
export CLICKHOUSE_USER=default
62+
export CLICKHOUSE_PASSWORD=your_actual_password
63+
```
64+
65+
:::note
66+
The environment variables above persist only as long as your terminal session.
67+
To set them permanently, add them to your shell configuration file.
68+
:::
69+
70+
### Installing Marimo {#installing-marimo}
71+
72+
Now activate your virtual environment.
73+
From within your virtual environment, install the following packages that we will be using in this guide:
74+
75+
```python
76+
pip install chdb pandas plotly marimo
77+
```
78+
79+
Create a new Marimo notebook with the following command:
80+
81+
```bash
82+
marimo edit clickhouse_exploration.py
83+
```
84+
85+
A new browser window should open with the Marimo interface on localhost:2718:
86+
87+
<Image size="md" img={image_4} alt="Marimo interface"/>
88+
89+
Marimo notebooks are stored as pure Python files, making them easy to version control and share with others.
90+
91+
## Installing dependencies {#installing-dependencies}
92+
93+
In a new cell, import the required packages:
94+
95+
```python
96+
import marimo as mo
97+
import chdb
98+
import pandas as pd
99+
import os
100+
import plotly.express as px
101+
import plotly.graph_objects as go
102+
```
103+
104+
If you hover your mouse over the cell you will see two circles with the "+" symbol appear.
105+
You can click these to add new cells.
106+
107+
Add a new cell and run a simple query to check that everything is set up correctly:
108+
109+
```python
110+
result = chdb.query("SELECT 'Hello ClickHouse from Marimo!'", "DataFrame")
111+
result
112+
```
113+
114+
You should see the result shown underneath the cell you just ran:
115+
116+
<Image size="md" img={image_5} alt="Marimo hello world"/>
117+
118+
## Exploring the data {#exploring-the-data}
119+
120+
With the UK price paid data set up and chDB up and running in a Marimo notebook, we can now get started exploring our data.
121+
122+
Let's imagine we are interested in checking how price has changed with time for a specific area in the UK such as the capital city, London.
123+
124+
ClickHouse's [remoteSecure](/docs/sql-reference/table-functions/remote) function allows you to easily retrieve the data from ClickHouse Cloud.
125+
126+
You can instruct chDB to return this data in process as a Pandas data frame - which is a convenient and familiar way of working with data.
127+
128+
### Querying ClickHouse Cloud data
129+
130+
Create a new cell with the following query to fetch the UK price paid data from your ClickHouse Cloud service and turn it into a `pandas.DataFrame`:
131+
132+
```python
133+
query = f"""
134+
SELECT
135+
toYear(date) AS year,
136+
round(avg(price)) AS price,
137+
bar(price, 0, 1000000, 80)
138+
FROM remoteSecure(
139+
'{os.environ.get("CLICKHOUSE_CLOUD_HOSTNAME")}',
140+
'default.pp_complete',
141+
'{os.environ.get("CLICKHOUSE_CLOUD_USER")}',
142+
'{os.environ.get("CLICKHOUSE_CLOUD_PASSWORD")}'
143+
)
144+
WHERE town = 'LONDON'
145+
GROUP BY year
146+
ORDER BY year
147+
"""
148+
149+
df = chdb.query(query, "DataFrame")
150+
df.head()
151+
```
152+
153+
In the snippet above, `chdb.query(query, "DataFrame")` runs the specified query and outputs the result as a Pandas DataFrame.
154+
155+
In the query we are using the `remoteSecure` function to connect to ClickHouse Cloud.
156+
157+
The `remoteSecure` functions takes as parameters:
158+
- a connection string
159+
- the name of the database and table to use
160+
- your username
161+
- your password
162+
163+
As a security best practice, you should prefer using environment variables for the username and password parameters rather than specifying them directly in the function, although this is possible if you wish.
164+
165+
The `remoteSecure` function connects to the remote ClickHouse Cloud service, runs the query and returns the result.
166+
167+
Depending on the size of your data, this could take a few seconds.
168+
169+
In this case we return an average price point per year, and filter by `town='LONDON'`.
170+
171+
The result is then stored as a DataFrame in a variable called `df`.
483 KB
Loading
510 KB
Loading

0 commit comments

Comments
 (0)