Skip to content

Commit cd418c7

Browse files
convert huge file to sub sections
1 parent 6bbd190 commit cd418c7

File tree

6 files changed

+1190
-1
lines changed

6 files changed

+1190
-1
lines changed
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
sidebar_label: 'Additional Options'
3+
sidebar_position: 3
4+
keywords: ['clickhouse', 'python', 'options', 'settings']
5+
description: 'Additional Options for ClickHouse Connect'
6+
slug: /integrations/language-clients/python/additional-options
7+
title: 'Additional Options'
8+
---
9+
10+
# Additional options {#additional-options}
11+
12+
ClickHouse Connect provides a number of additional options for advanced use cases.
13+
14+
## Global settings {#global-settings}
15+
16+
There are a small number of settings that control ClickHouse Connect behavior globally. They are accessed from the top level `common` package:
17+
18+
```python
19+
from clickhouse_connect import common
20+
21+
common.set_setting('autogenerate_session_id', False)
22+
common.get_setting('invalid_setting_action')
23+
'drop'
24+
```
25+
26+
:::note
27+
These common settings `autogenerate_session_id`, `product_name`, and `readonly` should _always_ be modified before creating a client with the `clickhouse_connect.get_client` method. Changing these settings after client creation does not affect the behavior of existing clients.
28+
:::
29+
30+
The following global settings are currently defined:
31+
32+
| Setting Name | Default | Options | Description |
33+
|-------------------------------------|---------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
34+
| autogenerate_session_id | True | True, False | Autogenerate a new UUID(1) session ID (if not provided) for each client session. If no session ID is provided (either at the client or query level), ClickHouse will generate a random internal ID for each query. |
35+
| dict_parameter_format | 'json' | 'json', 'map' | This controls whether parameterized queries convert a Python dictionary to JSON or ClickHouse Map syntax. `json` should be used for inserts into JSON columns, `map` for ClickHouse Map columns. |
36+
| invalid_setting_action | 'error' | 'drop', 'send', 'error' | Action to take when an invalid or readonly setting is provided (either for the client session or query). If `drop`, the setting will be ignored, if `send`, the setting will be sent to ClickHouse, if `error` a client side ProgrammingError will be raised. |
37+
| max_connection_age | 600 | | Maximum seconds that an HTTP Keep Alive connection will be kept open/reused. This prevents bunching of connections against a single ClickHouse node behind a load balancer/proxy. Defaults to 10 minutes. |
38+
| product_name | | | A string that is passed with the query to ClickHouse for tracking the app using ClickHouse Connect. Should be in the form <product name;&gl/<product version>. |
39+
| readonly | 0 | 0, 1 | Implied "read_only" ClickHouse settings for versions prior to 19.17. Can be set to match the ClickHouse "read_only" value for settings to allow operation with very old ClickHouse versions. |
40+
| send_os_user | True | True, False | Include the detected operating system user in client information sent to ClickHouse (HTTP User-Agent string). |
41+
| send_integration_tags | True | True, False | Include the used integration libraries/version (e.g. Pandas/SQLAlchemy/etc.) in client information sent to ClickHouse (HTTP User-Agent string). |
42+
| use_protocol_version | True | True, False | Use the client protocol version. This is needed for `DateTime` timezone columns but breaks with the current version of chproxy. |
43+
| max_error_size | 1024 | | Maximum number of characters that will be returned in a client error messages. Use 0 for this setting to get the full ClickHouse error message. Defaults to 1024 characters. |
44+
| http_buffer_size | 10MB | | Size (in bytes) of the "in-memory" buffer used for HTTP streaming queries. |
45+
| preserve_pandas_datetime_resolution | False | True, False | When True and using pandas 2.x, preserves the datetime64/timedelta64 dtype resolution (e.g., 's', 'ms', 'us', 'ns'). If False (or on pandas <2.x), coerces to nanosecond ('ns') resolution for compatibility. |
46+
47+
## Compression {#compression}
48+
49+
ClickHouse Connect supports lz4, zstd, brotli, and gzip compression for both query results and inserts. Always keep in mind that using compression usually involves a tradeoff between network bandwidth/transfer speed against CPU usage (both on the client and the server.)
50+
51+
To receive compressed data, the ClickHouse server `enable_http_compression` must be set to 1, or the user must have permission to change the setting on a "per query" basis.
52+
53+
Compression is controlled by the `compress` parameter when calling the `clickhouse_connect.get_client` factory method. By default, `compress` is set to `True`, which will trigger the default compression settings. For queries executed with the `query`, `query_np`, and `query_df` client methods, ClickHouse Connect will add the `Accept-Encoding` header with
54+
the `lz4`, `zstd`, `br` (brotli, if the brotli library is installed), `gzip`, and `deflate` encodings to queries executed with the `query` client method (and indirectly, `query_np` and `query_df`). (For the majority of requests the ClickHouse
55+
server will return with a `zstd` compressed payload.) For inserts, by default ClickHouse Connect will compress insert blocks with `lz4` compression, and send the `Content-Encoding: lz4` HTTP header.
56+
57+
The `get_client` `compress` parameter can also be set to a specific compression method, one of `lz4`, `zstd`, `br`, or `gzip`. That method will then be used for both inserts and query results (if supported by the ClickHouse server.) The required `zstd` and `lz4` compression libraries are now installed by default with ClickHouse Connect. If `br`/brotli is specified, the brotli library must be installed separately.
58+
59+
Note that the `raw*` client methods don't use the compression specified by the client configuration.
60+
61+
We also recommend against using `gzip` compression, as it is significantly slower than the alternatives for both compressing and decompressing data.
62+
63+
## HTTP proxy support {#http-proxy-support}
64+
65+
ClickHouse Connect adds basic HTTP proxy support using the `urllib3` library. It recognizes the standard `HTTP_PROXY` and `HTTPS_PROXY` environment variables. Note that using these environment variables will apply to any client created with the `clickhouse_connect.get_client` method. Alternatively, to configure per client, you can use the `http_proxy` or `https_proxy` arguments to the get_client method. For details on the implementation of HTTP Proxy support, see the [urllib3](https://urllib3.readthedocs.io/en/stable/advanced-usage.html#http-and-https-proxies) documentation.
66+
67+
To use a SOCKS proxy, you can send a `urllib3` `SOCKSProxyManager` as the `pool_mgr` argument to `get_client`. Note that this will require installing the PySocks library either directly or using the `[socks]` option for the `urllib3` dependency.
68+
69+
## "Old" JSON data type {#old-json-data-type}
70+
71+
The experimental `Object` (or `Object('json')`) data type is deprecated and should be avoided in a production environment. ClickHouse Connect continues to provide limited support for the data type for backward compatibility. Note that this support does not include queries that are expected to return "top level" or "parent" JSON values as dictionaries or the equivalent, and such queries will result in an exception.
72+
73+
## "New" Variant/Dynamic/JSON datatypes (experimental feature) {#new-variantdynamicjson-datatypes-experimental-feature}
74+
75+
Beginning with the 0.8.0 release, `clickhouse-connect` provides experimental support for the new (also experimental) ClickHouse types Variant, Dynamic, and JSON.
76+
77+
### Usage notes {#usage-notes}
78+
- JSON data can be inserted as either a Python dictionary or a JSON string containing a JSON object `{}`. Other forms of JSON data are not supported.
79+
- Queries using subcolumns/paths for these types will return the type of the sub column.
80+
- See the main ClickHouse [documentation](https://clickhouse.com/docs) for other usage notes.
81+
82+
### Known limitations {#known-limitations}
83+
- Each of these types must be enabled in the ClickHouse settings before using.
84+
- The "new" JSON type is available starting with the ClickHouse 24.8 release
85+
- Due to internal format changes, `clickhouse-connect` is only compatible with Variant types beginning with the ClickHouse 24.7 release
86+
- Returned JSON objects will only return the `max_dynamic_paths` number of elements (which defaults to 1024). This will be fixed in a future release.
87+
- Inserts into `Dynamic` columns will always be the String representation of the Python value. This will be fixed in a future release, once https://github.com/ClickHouse/ClickHouse/issues/70395 has been fixed.
88+
- The implementation for the new types has not been optimized in C code, so performance may be somewhat slower than for simpler, established data types.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
sidebar_label: 'Advanced Inserting'
3+
sidebar_position: 5
4+
keywords: ['clickhouse', 'python', 'insert', 'advanced']
5+
description: 'Advanced Inserting with ClickHouse Connect'
6+
slug: /integrations/language-clients/python/advanced-inserting
7+
title: 'Advanced Inserting'
8+
---
9+
10+
## Inserting data with ClickHouse Connect: Advanced usage {#inserting-data-with-clickhouse-connect--advanced-usage}
11+
12+
### InsertContexts {#insertcontexts}
13+
14+
ClickHouse Connect executes all inserts within an `InsertContext`. The `InsertContext` includes all the values sent as arguments to the client `insert` method. In addition, when an `InsertContext` is originally constructed, ClickHouse Connect retrieves the data types for the insert columns required for efficient Native format inserts. By reusing the `InsertContext` for multiple inserts, this "pre-query" is avoided and inserts are executed more quickly and efficiently.
15+
16+
An `InsertContext` can be acquired using the client `create_insert_context` method. The method takes the same arguments as the `insert` function. Note that only the `data` property of `InsertContext`s should be modified for reuse. This is consistent with its intended purpose of providing a reusable object for repeated inserts of new data to the same table.
17+
18+
```python
19+
test_data = [[1, 'v1', 'v2'], [2, 'v3', 'v4']]
20+
ic = test_client.create_insert_context(table='test_table', data='test_data')
21+
client.insert(context=ic)
22+
assert client.command('SELECT count() FROM test_table') == 2
23+
new_data = [[3, 'v5', 'v6'], [4, 'v7', 'v8']]
24+
ic.data = new_data
25+
client.insert(context=ic)
26+
qr = test_client.query('SELECT * FROM test_table ORDER BY key DESC')
27+
assert qr.row_count == 4
28+
assert qr[0][0] == 4
29+
```
30+
31+
`InsertContext`s include mutable state that is updated during the insert process, so they are not thread safe.
32+
33+
### Write formats {#write-formats}
34+
Write formats are currently implemented for limited number of types. In most cases ClickHouse Connect will attempt to automatically determine the correct write format for a column by checking the type of the first (non-null) data value. For example, if inserting into a `DateTime` column, and the first insert value of the column is a Python integer, ClickHouse Connect will directly insert the integer value under the assumption that it's actually an epoch second.
35+
36+
In most cases, it is unnecessary to override the write format for a data type, but the associated methods in the `clickhouse_connect.datatypes.format` package can be used to do so at a global level.
37+
38+
#### Write format options {#write-format-options}
39+
40+
| ClickHouse Type | Native Python Type | Write Formats | Comments |
41+
|-----------------------|-------------------------|-------------------|-------------------------------------------------------------------------------------------------------------|
42+
| Int[8-64], UInt[8-32] | int | - | |
43+
| UInt64 | int | | |
44+
| [U]Int[128,256] | int | | |
45+
| BFloat16 | float | | |
46+
| Float32 | float | | |
47+
| Float64 | float | | |
48+
| Decimal | decimal.Decimal | | |
49+
| String | string | | |
50+
| FixedString | bytes | string | If inserted as a string, additional bytes will be set to zeros |
51+
| Enum[8,16] | string | | |
52+
| Date | datetime.date | int | ClickHouse stores Dates as days since 01/01/1970. int types will be assumed to be this "epoch date" value |
53+
| Date32 | datetime.date | int | Same as Date, but for a wider range of dates |
54+
| DateTime | datetime.datetime | int | ClickHouse stores DateTime in epoch seconds. int types will be assumed to be this "epoch second" value |
55+
| DateTime64 | datetime.datetime | int | Python datetime.datetime is limited to microsecond precision. The raw 64 bit int value is available |
56+
| Time | datetime.timedelta | int, string, time | ClickHouse stores DateTime in epoch seconds. int types will be assumed to be this "epoch second" value |
57+
| Time64 | datetime.timedelta | int, string, time | Python datetime.timedelta is limited to microsecond precision. The raw 64 bit int value is available |
58+
| IPv4 | `ipaddress.IPv4Address` | string | Properly formatted strings can be inserted as IPv4 addresses |
59+
| IPv6 | `ipaddress.IPv6Address` | string | Properly formatted strings can be inserted as IPv6 addresses |
60+
| Tuple | dict or tuple | | |
61+
| Map | dict | | |
62+
| Nested | Sequence[dict] | | |
63+
| UUID | uuid.UUID | string | Properly formatted strings can be inserted as ClickHouse UUIDs |
64+
| JSON/Object('json') | dict | string | Either dictionaries or JSON strings can be inserted into JSON Columns (note `Object('json')` is deprecated) |
65+
| Variant | object | | At this time on all variants are inserted as Strings and parsed by the ClickHouse server |
66+
| Dynamic | object | | Warning -- at this time any inserts into a Dynamic column are persisted as a ClickHouse String |
67+
68+

0 commit comments

Comments
 (0)