Skip to content

Commit c89be28

Browse files
simplify and clarify
1 parent a890ea1 commit c89be28

File tree

1 file changed

+52
-239
lines changed
  • docs/integrations/language-clients/python

1 file changed

+52
-239
lines changed

docs/integrations/language-clients/python/index.md

Lines changed: 52 additions & 239 deletions
Original file line numberDiff line numberDiff line change
@@ -228,307 +228,120 @@ Out[2]: 'github'
228228

229229
### Client Lifecycle and Best Practices {#client-lifecycle-and-best-practices}
230230

231-
#### Understanding client creation cost {#understanding-client-creation-cost}
231+
Creating a ClickHouse Connect client is an expensive operation that involves establishing a connection, retrieving server metadata, and initializing settings. Follow these best practices for optimal performance:
232232

233-
Creating a new ClickHouse Connect client is an **expensive operation**. During initialization, the client:
234-
1. Establishes an HTTP(S) connection to the ClickHouse server
235-
2. Retrieves server version and timezone (`SELECT version(), timezone()`)
236-
3. Executes a query to retrieve server settings and metadata (`SELECT name, value, readonly FROM system.settings`)
237-
4. Parses and caches server configuration for the session
238-
5. Sets up compression, connection pooling, and other infrastructure
233+
#### Core principles {#core-principles}
239234

240-
This initialization overhead can add significant latency (typically 100ms-2000ms depending on network conditions) to each operation if clients are created and destroyed frequently.
235+
- **Reuse clients**: Create clients once at application startup and reuse them throughout the application lifetime
236+
- **Avoid frequent creation**: Don't create a new client for each query or request (this wastes hundreds of milliseconds per operation)
237+
- **Clean up properly**: Always close clients when shutting down to release connection pool resources
238+
- **Share when possible**: A single client can handle many concurrent queries through its connection pool (see threading notes below)
241239

242-
#### Anti-pattern: Creating a client per request {#anti-pattern-creating-a-client-per-request}
240+
#### Basic patterns {#basic-patterns}
243241

244-
**❌ DO NOT DO THIS:**
245-
246-
```python
247-
# BAD: Creates a new client for every query
248-
def get_user_count():
249-
client = clickhouse_connect.get_client(host='my-host', username='default', password='password')
250-
result = client.query('SELECT count() FROM users')
251-
client.close()
252-
return result.result_rows[0][0]
253-
254-
# This will create 1000 clients and waste significant time on initialization
255-
for i in range(1000):
256-
count = get_user_count()
257-
```
258-
259-
#### Recommended pattern: Reuse a single client {#recommended-pattern-reuse-a-single-client}
260-
261-
**✅ DO THIS:**
242+
**✅ Good: Reuse a single client**
262243

263244
```python
264245
import clickhouse_connect
265246

266-
# Create the client once at application startup
267-
client = clickhouse_connect.get_client(
268-
host='my-host',
269-
username='default',
270-
password='password',
271-
connect_timeout=10,
272-
send_receive_timeout=300
273-
)
274-
275-
# Reuse the same client for all operations
276-
def get_user_count():
277-
result = client.query('SELECT count() FROM users')
278-
return result.result_rows[0][0]
279-
280-
def get_active_users():
281-
result = client.query('SELECT count() FROM users WHERE active = 1')
282-
return result.result_rows[0][0]
247+
# Create once at startup
248+
client = clickhouse_connect.get_client(host='my-host', username='default', password='password')
283249

284-
# Use the same client many times
250+
# Reuse for all queries
285251
for i in range(1000):
286-
count = get_user_count()
252+
result = client.query('SELECT count() FROM users')
287253

288-
# Close the client when the application shuts down
254+
# Close on shutdown
289255
client.close()
290256
```
291257

292-
#### Client lifecycle in different application types {#client-lifecycle-in-different-application-types}
293-
294-
##### Web applications (Flask, FastAPI, Django) {#web-applications-flask-fastapi-django}
295-
296-
Create a client at application startup and share it across requests:
258+
**❌ Bad: Creating clients repeatedly**
297259

298260
```python
299-
# FastAPI example (using lifespan)
300-
from contextlib import asynccontextmanager
301-
from fastapi import FastAPI, Request
302-
import clickhouse_connect
303-
304-
305-
@asynccontextmanager
306-
async def lifespan(app: FastAPI):
307-
# Create the client once, before serving requests
308-
app.state.clickhouse_client = clickhouse_connect.get_client(
309-
host='my-host',
310-
username='default',
311-
password='password',
312-
autogenerate_session_id=False
313-
)
314-
yield
315-
# Close the client when the application is shutting down
316-
app.state.clickhouse_client.close()
317-
318-
319-
app = FastAPI(lifespan=lifespan)
320-
321-
322-
@app.get("/users/count")
323-
def get_user_count(request: Request):
324-
client = request.app.state.clickhouse_client
261+
# BAD: Creates 1000 clients with expensive initialization overhead
262+
for i in range(1000):
263+
client = clickhouse_connect.get_client(host='my-host', username='default', password='password')
325264
result = client.query('SELECT count() FROM users')
326-
return {"count": result.result_rows[0][0]}
327-
```
328-
329-
```python
330-
# Flask example
331-
from flask import Flask
332-
import clickhouse_connect
333-
334-
app = Flask(__name__)
335-
336-
# Create client once when the module loads
337-
clickhouse_client = clickhouse_connect.get_client(
338-
host='my-host',
339-
username='default',
340-
password='password',
341-
autogenerate_session_id=False
342-
)
343-
344-
@app.route('/users/count')
345-
def get_user_count():
346-
result = clickhouse_client.query('SELECT count() FROM users')
347-
return {"count": result.result_rows[0][0]}
348-
349-
@app.teardown_appcontext
350-
def close_connection(exception):
351-
# Global client is shared across requests; don't close per request
352-
pass
265+
client.close()
353266
```
354267

355-
##### Long-running applications and scripts {#long-running-applications-and-scripts}
268+
#### Multi-threaded applications {#multi-threaded-applications}
356269

357-
Create the client once at the start of execution:
270+
**Important:** Client instances are **NOT thread-safe** when using session IDs. By default, clients have an auto-generated session ID, and concurrent queries within the same session will raise a `ProgrammingError`.
358271

359-
```python
360-
import clickhouse_connect
361-
362-
def main():
363-
# Create client at the start
364-
client = clickhouse_connect.get_client(
365-
host='my-host',
366-
username='default',
367-
password='password'
368-
)
369-
370-
try:
371-
# Use client throughout the application lifetime
372-
process_data(client)
373-
generate_reports(client)
374-
cleanup_old_data(client)
375-
finally:
376-
# Always close the client when done
377-
client.close()
378-
379-
def process_data(client):
380-
# Client is passed as a parameter, not created
381-
data = client.query('SELECT * FROM events WHERE date = today()')
382-
# Process data...
383-
384-
if __name__ == '__main__':
385-
main()
386-
```
387-
388-
##### Multi-threaded applications {#multi-threaded-applications}
389-
390-
The ClickHouse Connect client is thread-safe for most operations. You can share a single client across multiple threads:
272+
To share a client across threads safely:
391273

392274
```python
393275
import clickhouse_connect
394276
import threading
395277

396-
# Create one client shared by all threads
278+
# Option 1: Disable sessions (recommended for shared clients)
397279
client = clickhouse_connect.get_client(
398280
host='my-host',
399281
username='default',
400282
password='password',
401-
autogenerate_session_id=False
283+
autogenerate_session_id=False # Required for thread safety
402284
)
403285

404286
def worker(thread_id):
405-
# All threads use the same client
287+
# All threads can now safely use the same client
406288
result = client.query(f'SELECT count() FROM table_{thread_id}')
407289
print(f"Thread {thread_id}: {result.result_rows[0][0]}")
408290

409-
# Spawn multiple threads using the same client
410-
threads = []
411-
for i in range(10):
412-
t = threading.Thread(target=worker, args=(i,))
413-
threads.append(t)
291+
threads = [threading.Thread(target=worker, args=(i,)) for i in range(10)]
292+
for t in threads:
414293
t.start()
415-
416294
for t in threads:
417295
t.join()
418296

419297
client.close()
420298
```
421299

422-
**Note on session IDs in multi-threaded environments:** By default, each client has a unique session ID, and ClickHouse does not allow concurrent queries within the same session. The client does not queue concurrent queries within a session; it raises a `ProgrammingError`. To run concurrent queries safely, either:
423-
1. Disable sessions on the shared client by passing `autogenerate_session_id=False` to `get_client` (or set the common setting before creating clients), or
424-
2. Provide a unique `session_id` per query via the `settings` argument, or
425-
3. Use separate clients when you need session isolation (e.g., temporary tables).
426-
427-
See [Managing ClickHouse Session IDs](#managing-clickhouse-session-ids) for more details.
428-
429-
##### Worker pools and task queues {#worker-pools-and-task-queues}
430-
431-
For Celery, RQ, and similar task-queue systems that run multiple worker processes, initialize exactly one ClickHouse client per worker process and reuse it for all tasks handled by that process. Do not share clients across processes. Prefer creating the client on worker-process start and closing it on shutdown; avoid per-task creation. Set sensible connection/read timeouts and, if you expect concurrent queries, either create multiple clients per process or configure the client’s HTTP connection pool accordingly. If the worker model also uses threads, don’t share a single session across concurrent queries.
300+
**Alternative for sessions:** If you need sessions (e.g., for temporary tables), create a separate client per thread:
432301

433302
```python
434-
# Celery example
435-
from celery import Celery
436-
import clickhouse_connect
437-
438-
app = Celery('tasks')
439-
440-
# Global client for this worker process
441-
clickhouse_client = None
442-
443-
@app.task
444-
def process_event(event_id):
445-
global clickhouse_client
446-
447-
# Lazy initialization: create client on first task
448-
if clickhouse_client is None:
449-
clickhouse_client = clickhouse_connect.get_client(
450-
host='my-host',
451-
username='default',
452-
password='password'
453-
)
454-
455-
# Reuse client for all tasks in this worker
456-
clickhouse_client.insert('events', [[event_id, 'processed']], column_names=['id', 'status'])
303+
def worker(thread_id):
304+
# Each thread gets its own client with isolated session
305+
client = clickhouse_connect.get_client(host='my-host', username='default', password='password')
306+
client.command('CREATE TEMPORARY TABLE temp (id UInt32) ENGINE = Memory')
307+
# ... use temp table ...
308+
client.close()
457309
```
458310

459-
#### Proper client cleanup {#proper-client-cleanup}
311+
#### Proper cleanup {#proper-cleanup}
460312

461-
Always close clients to release resources:
313+
Always close clients at shutdown. Note that `client.close()` disposes the client and closes pooled HTTP connections only when the client owns its pool manager (for example, when created with custom TLS/proxy options). For the default shared pool, use `client.close_connections()` to proactively clear sockets; otherwise, connections are reclaimed automatically via idle expiration and at process exit.
462314

463315
```python
464-
import clickhouse_connect
465-
466316
client = clickhouse_connect.get_client(host='my-host', username='default', password='password')
467-
468317
try:
469-
# Use the client
470318
result = client.query('SELECT 1')
471319
finally:
472-
# Always close, even if an exception occurs
473320
client.close()
474321
```
475322

476-
Or use a context manager for automatic cleanup:
323+
To immediately drop HTTP connections (e.g., before forking or reconfiguring networking), call:
477324

478325
```python
479-
import clickhouse_connect
326+
client.close_connections()
327+
```
328+
329+
Or use a context manager:
480330

481-
def one_time_query():
482-
with clickhouse_connect.get_client(host='my-host', username='default', password='password') as client:
483-
return client.query('SELECT * FROM large_table LIMIT 10')
331+
```python
332+
with clickhouse_connect.get_client(host='my-host', username='default', password='password') as client:
333+
result = client.query('SELECT 1')
484334
```
485335

486-
#### When multiple clients are appropriate {#when-multiple-clients-are-appropriate}
487-
488-
There are legitimate cases where multiple clients make sense:
489-
490-
1. **Different ClickHouse servers**: One client per server/cluster
491-
```python
492-
prod_client = clickhouse_connect.get_client(host='prod-server')
493-
staging_client = clickhouse_connect.get_client(host='staging-server')
494-
```
495-
496-
2. **Different credentials or databases**: Separate clients for different access patterns
497-
```python
498-
read_client = clickhouse_connect.get_client(host='my-host', username='reader', database='analytics')
499-
write_client = clickhouse_connect.get_client(host='my-host', username='writer', database='logs')
500-
```
501-
502-
3. **Isolated sessions with temporary tables**: Each session needs its own client
503-
```python
504-
# Client 1 with its own session for temp tables
505-
client1 = clickhouse_connect.get_client(host='my-host', settings={'session_id': 'session_1'})
506-
client1.command('CREATE TEMPORARY TABLE temp1 (id UInt32) ENGINE = Memory')
507-
508-
# Client 2 with different session
509-
client2 = clickhouse_connect.get_client(host='my-host', settings={'session_id': 'session_2'})
510-
client2.command('CREATE TEMPORARY TABLE temp2 (id UInt32) ENGINE = Memory')
511-
```
512-
513-
4. **Process pools with fork()**: Each forked process needs its own client (connections aren't fork-safe)
514-
515-
#### Troubleshooting connection issues {#troubleshooting-connection-issues}
516-
517-
If you experience connection timeout errors during client creation:
518-
519-
1. **Check if you're creating clients too frequently**: Use the recommended patterns above
520-
2. **Verify network connectivity**: Test with `ping` or `curl` to the ClickHouse HTTP endpoint
521-
3. **Increase timeouts** if network latency is high:
522-
```python
523-
client = clickhouse_connect.get_client(
524-
host='my-host',
525-
connect_timeout=30, # Increase from default 10s
526-
send_receive_timeout=600 # Increase from default 300s
527-
)
528-
```
529-
4. **Check connection pool settings**: See [Customizing the HTTP connection pool](#customizing-the-http-connection-pool)
530-
5. **Monitor server load**: High server load can slow down the initialization query
531-
6. **Review firewall/NAT rules**: Long-lived connections may be terminated by network infrastructure
336+
#### When to use multiple clients {#when-to-use-multiple-clients}
337+
338+
Multiple clients are appropriate for:
339+
340+
- **Different servers**: One client per ClickHouse server or cluster
341+
- **Different credentials**: Separate clients for different users or access levels
342+
- **Different databases**: When you need to work with multiple databases
343+
- **Isolated sessions**: When you need separate sessions for temporary tables or session-specific settings
344+
- **Per-thread isolation**: When threads need independent sessions (as shown above)
532345

533346
### Common method arguments {#common-method-arguments}
534347

0 commit comments

Comments
 (0)