Add Databricks and benchmark results for most SQL warehouse options #683

conormccarter · 2025-11-07T01:03:43Z

Resolves: #24

Add Databricks benchmark script
Add results for most Databricks SQL warehouse sizes

rschu1ze · 2025-11-13T11:44:57Z

Oh, that PR would have been nice to merge :-/

@conormccarter please let us know if you need support from our end to go forward with this PR.

conormccarter · 2025-11-13T15:58:42Z

Hey @rschu1ze, I will reopen once I update the results! (I realized that I failed to turn off the query cache resulting in inaccurate "hot run" times).

rschu1ze

I got a permission error when I try to push to this repository:

remote: Permission to prequel-co/ClickBench.git denied to rschu1ze.
fatal: unable to access 'https://github.com/prequel-co/ClickBench.git/': The requested URL returned error: 403

... therefore leaving some comments for now.

rschu1ze · 2025-11-15T16:18:47Z

databricks/NOTES.md

@@ -0,0 +1,4 @@
+I created each warehouse in the Databricks UI.


Please move the content of this file into README.md

rschu1ze · 2025-11-15T16:23:11Z

databricks/.env.example

+DATABRICKS_SCHEMA=clickbench_schema
+
+# Parquet data location
+DATABRICKS_PARQUET_LOCATION=s3://some/path/hits.parquet


Some questions here: I set my databricks hostname, the databricks HTTP path, the instance type (2X-Small for the free test version) and the token. I didn't touch the CATALOG and the SCHEMA variables.

When I ran benchmark.sh, I got this:

Connecting to Databricks; loading the data into clickbench_catalog.clickbench_schema 16:12:40 [247/341] [WARN] pyarrow is not installed by default since databricks-sql-connector 4.0.0,any arrow specific api (e.g. fetchmany_arrow) and cloud fetch will be disabled.If you n eed these features, please run pip install pyarrow or pip install databricks-sql-connector[pyarrow] to install Creating table and loading data from s3://some/path/hits.parquet... Traceback (most recent call last): File "/data/ClickBench/databricks/./benchmark.py", line 357, in <module> load_data(run_metadata) File "/data/ClickBench/databricks/./benchmark.py", line 289, in load_data cursor.execute(load_query) File "/data/ClickBench/databricks/.venv/lib/python3.12/site-packages/databricks/sql/telemetry/latency_logger.py", line 175, in wrapper result = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/ClickBench/databricks/.venv/lib/python3.12/site-packages/databricks/sql/client.py", line 1260, in execute self.active_result_set = self.backend.execute_command( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/ClickBench/databricks/.venv/lib/python3.12/site-packages/databricks/sql/backend/thrift_backend.py", line 1058, in execute_command execute_response, has_more_rows = self._handle_execute_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/ClickBench/databricks/.venv/lib/python3.12/site-packages/databricks/sql/backend/thrift_backend.py", line 1265, in _handle_execute_response final_operation_state = self._wait_until_command_done( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/ClickBench/databricks/.venv/lib/python3.12/site-packages/databricks/sql/backend/thrift_backend.py", line 957, in _wait_until_command_done self._check_command_not_in_error_or_closed_state(op_handle, poll_resp) File "/data/ClickBench/databricks/.venv/lib/python3.12/site-packages/databricks/sql/backend/thrift_backend.py", line 635, in _check_command_not_in_error_or_closed_st ate raise ServerOperationError( databricks.sql.exc.ServerOperationError: [UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY] Unsupported data source type for direct query on files: parquet SQLSTATE: 0A000; lin e 109 pos 13 Attempt to close session raised a local exception: sys.meta_path is None, Python is likely shutting down

(l. 289 ran the INSERT statement - the prior CREATE TABLE was successful)

Do you have an idea what went wrong? Do I need to set any other variables?

Oh, I should have mentioned as well that I set DATABRICKS_PARQUET_LOCATION to https://clickhouse-public-datasets.s3.eu-central-1.amazonaws.com/hits_compatible/hits.parquet. Is this correct? If yes, I think we can hard-code it as well.

rschu1ze · 2025-11-15T16:23:46Z

databricks/README.md

+# Edit .env with your actual credentials
+```
+
+Required environment variables:


L. 12 - 19 are covered by comments in .env.example already and redundant.

rschu1ze · 2025-11-15T16:24:18Z

databricks/benchmark.sh

@@ -0,0 +1,22 @@
+#!/bin/bash
+


Please add sudo snap install --classic astral-uv here.

rschu1ze · 2025-11-15T16:24:38Z

databricks/create.sql

@@ -0,0 +1,109 @@
+-- This is not used in the setup script, but is included here for reference.


It's slightly confusing to keep this file around, let's delete it.

conormccarter force-pushed the add-databricks branch from d9926ef to f7f02a8 Compare November 7, 2025 01:13

conormccarter closed this Nov 10, 2025

conormccarter reopened this Nov 13, 2025

conormccarter added 2 commits November 14, 2025 09:38

Add Databricks and benchmark results for most SQL warehouse options

c2a3561

Turn off query result cache

062b789

conormccarter force-pushed the add-databricks branch from 07774c2 to 062b789 Compare November 14, 2025 14:38

rschu1ze reviewed Nov 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Databricks and benchmark results for most SQL warehouse options #683

Add Databricks and benchmark results for most SQL warehouse options #683

conormccarter commented Nov 7, 2025 •

edited by rschu1ze

Loading

Uh oh!

rschu1ze commented Nov 13, 2025

Uh oh!

conormccarter commented Nov 13, 2025

Uh oh!

rschu1ze left a comment

Uh oh!

rschu1ze Nov 15, 2025

Uh oh!

rschu1ze Nov 15, 2025

Uh oh!

rschu1ze Nov 15, 2025

Uh oh!

rschu1ze Nov 15, 2025

Uh oh!

rschu1ze Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,4 @@
		I created each warehouse in the Databricks UI.

		@@ -0,0 +1,109 @@
		-- This is not used in the setup script, but is included here for reference.

Add Databricks and benchmark results for most SQL warehouse options #683

Are you sure you want to change the base?

Add Databricks and benchmark results for most SQL warehouse options #683

Conversation

conormccarter commented Nov 7, 2025 • edited by rschu1ze Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rschu1ze commented Nov 13, 2025

Uh oh!

conormccarter commented Nov 13, 2025

Uh oh!

rschu1ze left a comment

Choose a reason for hiding this comment

Uh oh!

rschu1ze Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

rschu1ze Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

rschu1ze Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

rschu1ze Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

rschu1ze Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

conormccarter commented Nov 7, 2025 •

edited by rschu1ze

Loading