You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -329,35 +329,330 @@ ClickHouse Kafka Connect reports the following metrics:
329
329
- Batch size is inherited from the Kafka Consumer properties.
330
330
- When using KeeperMap for exactly-once and the offset is changed or re-wound, you need to delete the content from KeeperMap for that specific topic. (See troubleshooting guide below for more details)
331
331
332
-
### Tuning performance {#tuning-performance}
332
+
### Performance tuning and throughput optimization {#tuning-performance}
333
333
334
-
If you've ever though to yourself "I would like to adjust the batch size for the sink connector", then this is the section for you.
334
+
This section covers performance tuning strategies for the ClickHouse Kafka Connect Sink. Performance tuning is essential when dealing with high-throughput use cases or when you need to optimize resource utilization and minimize lag.
335
335
336
-
##### Connect fetch vs connector poll {#connect-fetch-vs-connector-poll}
336
+
####When is performance tuning needed? {#when-is-performance-tuning-needed}
337
337
338
-
Kafka Connect (the framework our sink connector is built on) will fetch messages from kafka topics in the background (independent of the connector).
338
+
Performance tuning is typically required in the following scenarios:
339
339
340
-
You can control this process using `fetch.min.bytes` and `fetch.max.bytes` - while `fetch.min.bytes` sets the minimum amount required before the framework will pass values to the connector (up to a time limit set by `fetch.max.wait.ms`), `fetch.max.bytes` sets the upper size limit. If you wanted to pass larger batches to the connector, an option could be to increase the minimum fetch or maximum wait to build bigger data bundles.
340
+
-**High-throughput workloads**: When processing millions of events per second from Kafka topics
341
+
-**Consumer lag**: When your connector can't keep up with the rate of data production, causing increasing lag
342
+
-**Resource constraints**: When you need to optimize CPU, memory, or network usage
343
+
-**Multiple topics**: When consuming from multiple high-volume topics simultaneously
344
+
-**Small message sizes**: When dealing with many small messages that would benefit from server-side batching
341
345
342
-
This fetched data is then consumed by the connector client polling for messages, where the amount for each poll is controlled by `max.poll.records` - please note that fetch is independent of poll, though!
346
+
Performance tuning is **NOT typically needed** when:
343
347
344
-
When tuning these settings, users should aim so their fetch size produces multiple batches of `max.poll.records` (and keep in mind, the settings `fetch.min.bytes` and `fetch.max.bytes` represent compressed data) - that way, each connector task is inserting as large a batch as possible.
348
+
- You're processing low to moderate volumes (< 10,000 messages/second)
349
+
- Consumer lag is stable and acceptable for your use case
350
+
- Default connector settings already meet your throughput requirements
351
+
- Your ClickHouse cluster can easily handle the incoming load
345
352
346
-
ClickHouse is optimized for larger batches, even at a slight delay, rather than frequent but smaller batches - the larger the batch, the better.
353
+
#### Understanding the data flow {#understanding-the-data-flow}
354
+
355
+
Before tuning, it's important to understand how data flows through the connector:
356
+
357
+
1.**Kafka Connect Framework** fetches messages from Kafka topics in the background
358
+
2.**Connector polls** for messages from the framework's internal buffer
359
+
3.**Connector batches** messages based on poll size
360
+
4.**ClickHouse receives** the batched insert via HTTP/S
361
+
5.**ClickHouse processes** the insert (synchronously or asynchronously)
362
+
363
+
Performance can be optimized at each of these stages.
The first level of optimization is controlling how much data the connector receives per batch from Kafka.
368
+
369
+
##### Fetch settings {#fetch-settings}
370
+
371
+
Kafka Connect (the framework) fetches messages from Kafka topics in the background, independent of the connector:
372
+
373
+
-**`fetch.min.bytes`**: Minimum amount of data before the framework passes values to the connector (default: 1 byte)
374
+
-**`fetch.max.bytes`**: Maximum amount of data to fetch in a single request (default: 52428800 / 50 MB)
375
+
-**`fetch.max.wait.ms`**: Maximum time to wait before returning data if `fetch.min.bytes` is not met (default: 500 ms)
376
+
377
+
##### Poll settings {#poll-settings}
378
+
379
+
The connector polls for messages from the framework's buffer:
380
+
381
+
-**`max.poll.records`**: Maximum number of records returned in a single poll (default: 500)
382
+
-**`max.partition.fetch.bytes`**: Maximum amount of data per partition (default: 1048576 / 1 MB)
383
+
384
+
##### Recommended settings for high throughput {#recommended-batch-settings}
385
+
386
+
For optimal performance with ClickHouse, aim for larger batches:
347
387
348
388
```properties
389
+
# Increase the number of records per poll
349
390
consumer.max.poll.records=5000
391
+
392
+
# Increase the partition fetch size (5 MB)
350
393
consumer.max.partition.fetch.bytes=5242880
394
+
395
+
# Optional: Increase minimum fetch size to wait for more data (1 MB)
396
+
consumer.fetch.min.bytes=1048576
397
+
398
+
# Optional: Reduce wait time if latency is critical
399
+
consumer.fetch.max.wait.ms=300
400
+
```
401
+
402
+
**Important**: Kafka Connect fetch settings represent compressed data, while ClickHouse receives uncompressed data. Balance these settings based on your compression ratio.
-**Too large batches** = Risk of timeouts, OutOfMemory errors, or exceeding `max.poll.interval.ms`
408
+
409
+
More details: [Confluent documentation](https://docs.confluent.io/platform/current/connect/references/allconfigs.html#override-the-worker-configuration) | [Kafka documentation](https://kafka.apache.org/documentation/#consumerconfigs)
410
+
411
+
#### Asynchronous inserts {#asynchronous-inserts}
412
+
413
+
Asynchronous inserts are a powerful feature when the connector sends relatively small batches or when you want to further optimize ingestion by shifting batching responsibility to ClickHouse.
414
+
415
+
##### When to use async inserts {#when-to-use-async-inserts}
416
+
417
+
Consider enabling async inserts when:
418
+
419
+
-**Many small batches**: Your connector sends frequent small batches (< 1000 rows per batch)
420
+
-**High concurrency**: Multiple connector tasks are writing to the same table
421
+
-**Distributed deployment**: Running many connector instances across different hosts
422
+
-**Part creation overhead**: You're experiencing "too many parts" errors
423
+
-**Mixed workload**: Combining real-time ingestion with query workloads
424
+
425
+
Do **NOT** use async inserts when:
426
+
427
+
- You're already sending large batches (> 10,000 rows per batch) with controlled frequency
428
+
- You require immediate data visibility (queries must see data instantly)
429
+
- Exactly-once semantics with `wait_for_async_insert=0` conflicts with your requirements
430
+
- Your use case can benefit from client-side batching improvements instead
431
+
432
+
##### How async inserts work {#how-async-inserts-work}
433
+
434
+
With asynchronous inserts enabled, ClickHouse:
435
+
436
+
1. Receives the insert query from the connector
437
+
2. Writes data to an in-memory buffer (instead of immediately to disk)
438
+
3. Returns success to the connector (if `wait_for_async_insert=0`)
439
+
4. Flushes the buffer to disk when one of these conditions is met:
-**`wait_for_async_insert=1`** (recommended): Connector waits for data to be flushed to ClickHouse storage before acknowledging. Provides delivery guarantees.
465
+
-**`wait_for_async_insert=0`**: Connector acknowledges immediately after buffering. Better performance but data may be lost on server crash before flush.
**Important**: Always use `wait_for_async_insert=1` with exactly-once to ensure offset commits happen only after data is persisted.
502
+
503
+
For more information about async inserts, see the [ClickHouse async inserts documentation](/best-practices/selecting-an-insert-strategy#asynchronous-inserts).
More details can be found in the [Confluent documentation](https://docs.confluent.io/platform/current/connect/references/allconfigs.html#override-the-worker-configuration)
354
-
or in the [Kafka documentation](https://kafka.apache.org/documentation/#consumerconfigs).
515
+
Each task processes a subset of topic partitions. More tasks = more parallelism, but:
516
+
517
+
- Maximum effective tasks = number of topic partitions
518
+
- Each task maintains its own connection to ClickHouse
519
+
- More tasks = higher overhead and potential resource contention
520
+
521
+
**Recommendation**: Start with `tasks.max` equal to the number of topic partitions, then adjust based on CPU and throughput metrics.
522
+
523
+
##### Ignoring partitions when batching {#ignoring-partitions}
524
+
525
+
By default, the connector batches messages per partition. For higher throughput, you can batch across partitions:
526
+
527
+
```json
528
+
"ignorePartitionsWhenBatching": "true"
529
+
```
530
+
531
+
** Warning**: Only use when `exactlyOnce=false`. This setting can improve throughput by creating larger batches but loses per-partition ordering guarantees.
355
532
356
533
#### Multiple high throughput topics {#multiple-high-throughput-topics}
357
534
358
-
If your connector is configured to subscribe to multiple topics, you're using `topic2TableMap` to map topics to tables, and you're experiencing a bottleneck at insertion resulting in consumer lag, consider creating one connector per topic instead. The main reason why this happens is that currently batches are inserted into every table [serially](https://github.com/ClickHouse/clickhouse-kafka-connect/blob/578ac07e8be1a920aaa3b26e49183595c3edd04b/src/main/java/com/clickhouse/kafka/connect/sink/ProxySinkTask.java#L95-L100).
535
+
If your connector is configured to subscribe to multiple topics, you're using `topic2TableMap` to map topics to tables, and you're experiencing a bottleneck at insertion resulting in consumer lag, consider creating one connector per topic instead.
536
+
537
+
The main reason why this happens is that currently batches are inserted into every table [serially](https://github.com/ClickHouse/clickhouse-kafka-connect/blob/578ac07e8be1a920aaa3b26e49183595c3edd04b/src/main/java/com/clickhouse/kafka/connect/sink/ProxySinkTask.java#L95-L100).
538
+
539
+
**Recommendation**: For multiple high-volume topics, deploy one connector instance per topic to maximize parallel insert throughput.
0 commit comments