You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Creating a ClickHouse Connect client is an expensive operation that involves establishing a connection, retrieving server metadata, and initializing settings. Follow these best practices for optimal performance:
232
232
233
-
Creating a new ClickHouse Connect client is an **expensive operation**. During initialization, the client:
234
-
1. Establishes an HTTP(S) connection to the ClickHouse server
235
-
2. Retrieves server version and timezone (`SELECT version(), timezone()`)
236
-
3. Executes a query to retrieve server settings and metadata (`SELECT name, value, readonly FROM system.settings`)
237
-
4. Parses and caches server configuration for the session
238
-
5. Sets up compression, connection pooling, and other infrastructure
233
+
#### Core principles {#core-principles}
239
234
240
-
This initialization overhead can add significant latency (typically 100ms-2000ms depending on network conditions) to each operation if clients are created and destroyed frequently.
235
+
-**Reuse clients**: Create clients once at application startup and reuse them throughout the application lifetime
236
+
-**Avoid frequent creation**: Don't create a new client for each query or request (this wastes hundreds of milliseconds per operation)
237
+
-**Clean up properly**: Always close clients when shutting down to release connection pool resources
238
+
-**Share when possible**: A single client can handle many concurrent queries through its connection pool (see threading notes below)
241
239
242
-
#### Anti-pattern: Creating a client per request {#anti-pattern-creating-a-client-per-request}
**Important:** Client instances are **NOT thread-safe** when using session IDs. By default, clients have an auto-generated session ID, and concurrent queries within the same session will raise a `ProgrammingError`.
358
271
359
-
```python
360
-
import clickhouse_connect
361
-
362
-
defmain():
363
-
# Create client at the start
364
-
client = clickhouse_connect.get_client(
365
-
host='my-host',
366
-
username='default',
367
-
password='password'
368
-
)
369
-
370
-
try:
371
-
# Use client throughout the application lifetime
372
-
process_data(client)
373
-
generate_reports(client)
374
-
cleanup_old_data(client)
375
-
finally:
376
-
# Always close the client when done
377
-
client.close()
378
-
379
-
defprocess_data(client):
380
-
# Client is passed as a parameter, not created
381
-
data = client.query('SELECT * FROM events WHERE date = today()')
threads = [threading.Thread(target=worker, args=(i,)) for i inrange(10)]
292
+
for t in threads:
414
293
t.start()
415
-
416
294
for t in threads:
417
295
t.join()
418
296
419
297
client.close()
420
298
```
421
299
422
-
**Note on session IDs in multi-threaded environments:** By default, each client has a unique session ID, and ClickHouse does not allow concurrent queries within the same session. The client does not queue concurrent queries within a session; it raises a `ProgrammingError`. To run concurrent queries safely, either:
423
-
1. Disable sessions on the shared client by passing `autogenerate_session_id=False` to `get_client` (or set the common setting before creating clients), or
424
-
2. Provide a unique `session_id` per query via the `settings` argument, or
425
-
3. Use separate clients when you need session isolation (e.g., temporary tables).
426
-
427
-
See [Managing ClickHouse Session IDs](#managing-clickhouse-session-ids) for more details.
428
-
429
-
##### Worker pools and task queues {#worker-pools-and-task-queues}
430
-
431
-
For Celery, RQ, and similar task-queue systems that run multiple worker processes, initialize exactly one ClickHouse client per worker process and reuse it for all tasks handled by that process. Do not share clients across processes. Prefer creating the client on worker-process start and closing it on shutdown; avoid per-task creation. Set sensible connection/read timeouts and, if you expect concurrent queries, either create multiple clients per process or configure the client’s HTTP connection pool accordingly. If the worker model also uses threads, don’t share a single session across concurrent queries.
300
+
**Alternative for sessions:** If you need sessions (e.g., for temporary tables), create a separate client per thread:
432
301
433
302
```python
434
-
# Celery example
435
-
from celery import Celery
436
-
import clickhouse_connect
437
-
438
-
app = Celery('tasks')
439
-
440
-
# Global client for this worker process
441
-
clickhouse_client =None
442
-
443
-
@app.task
444
-
defprocess_event(event_id):
445
-
global clickhouse_client
446
-
447
-
# Lazy initialization: create client on first task
Always close clients at shutdown. Note that `client.close()` disposes the client and closes pooled HTTP connections only when the client owns its pool manager (for example, when created with custom TLS/proxy options). For the default shared pool, use `client.close_connections()`to proactively clear sockets; otherwise, connections are reclaimed automatically via idle expiration and at process exit.
0 commit comments