Skip to content

Commit aa0c5bb

Browse files
authored
Merge pull request #4565 from amolsr/msk-doc-update
Improve Kafka MSK docs with IAM and connectivity details.
2 parents 6dfa4a9 + 65c93f4 commit aa0c5bb

File tree

2 files changed

+94
-10
lines changed

2 files changed

+94
-10
lines changed

docs/integrations/data-ingestion/kafka/msk/index.md

Lines changed: 85 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ import ConnectionDetails from '@site/docs/_snippets/_gather_your_details_http.md
2424
</iframe>
2525
</div>
2626

27+
> Note: The policy shown in the video is permissive and intended for quick start only. See least‑privilege IAM guidance below.
28+
2729
## Prerequisites {#prerequisites}
2830
We assume:
2931
* you are familiar with [ClickHouse Connector Sink](../kafka-clickhouse-connect-sink.md),Amazon MSK and MSK Connectors. We recommend the Amazon MSK [Getting Started guide](https://docs.aws.amazon.com/msk/latest/developerguide/getting-started.html) and [MSK Connect guide](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect.html).
@@ -61,6 +63,76 @@ username=default
6163
schemas.enable=false
6264
```
6365

66+
## Recommended IAM permissions (least privilege) {#iam-least-privilege}
67+
68+
Use the smallest set of permissions required for your setup. Start with the baseline below and add optional services only if you use them.
69+
70+
```json
71+
{
72+
"Version": "2012-10-17",
73+
"Statement": [
74+
{
75+
"Sid": "MSKClusterAccess",
76+
"Effect": "Allow",
77+
"Action": [
78+
"kafka:DescribeCluster",
79+
"kafka:GetBootstrapBrokers",
80+
"kafka:DescribeClusterV2",
81+
"kafka:ListClusters",
82+
"kafka:ListClustersV2"
83+
],
84+
"Resource": "*"
85+
},
86+
{
87+
"Sid": "KafkaAuthorization",
88+
"Effect": "Allow",
89+
"Action": [
90+
"kafka-cluster:Connect",
91+
"kafka-cluster:DescribeCluster",
92+
"kafka-cluster:DescribeGroup",
93+
"kafka-cluster:DescribeTopic",
94+
"kafka-cluster:ReadData"
95+
],
96+
"Resource": "*"
97+
},
98+
{
99+
"Sid": "OptionalGlueSchemaRegistry",
100+
"Effect": "Allow",
101+
"Action": [
102+
"glue:GetSchema*",
103+
"glue:ListSchemas",
104+
"glue:ListSchemaVersions"
105+
],
106+
"Resource": "*"
107+
},
108+
{
109+
"Sid": "OptionalSecretsManager",
110+
"Effect": "Allow",
111+
"Action": [
112+
"secretsmanager:GetSecretValue"
113+
],
114+
"Resource": [
115+
"arn:aws:secretsmanager:<region>:<account-id>:secret:<your-secret-name>*"
116+
]
117+
},
118+
{
119+
"Sid": "OptionalS3Read",
120+
"Effect": "Allow",
121+
"Action": [
122+
"s3:GetObject"
123+
],
124+
"Resource": "arn:aws:s3:::<your-bucket>/<optional-prefix>/*"
125+
}
126+
]
127+
}
128+
```
129+
130+
- Use the Glue block only if you use AWS Glue Schema Registry.
131+
- Use the Secrets Manager block only if you fetch credentials/truststores from Secrets Manager. Scope the ARN.
132+
- Use the S3 block only if you load artifacts (e.g., truststore) from S3. Scope to bucket/prefix.
133+
134+
See also: [Kafka best practices – IAM](../../clickpipes/kafka/04_best_practices.md#iam).
135+
64136
## Performance tuning {#performance-tuning}
65137
One way of increasing performance is to adjust the batch size and the number of records that are fetched from Kafka by adding the following to the **worker** configuration:
66138
```yml
@@ -85,7 +157,16 @@ In order for MSK Connect to connect to ClickHouse, we recommend your MSK cluster
85157
1. **Create a Private Subnet:** Create a new subnet within your VPC, designating it as a private subnet. This subnet should not have direct access to the internet.
86158
1. **Create a NAT Gateway:** Create a NAT gateway in a public subnet of your VPC. The NAT gateway enables instances in your private subnet to connect to the internet or other AWS services, but prevents the internet from initiating a connection with those instances.
87159
1. **Update the Route Table:** Add a route that directs internet-bound traffic to the NAT gateway
88-
1. **Ensure Security Group(s) and Network ACLs Configuration:** Configure your [security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html) and [network ACLs (Access Control Lists)](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html) to allow relevant traffic to and from your ClickHouse instance.
89-
1. For ClickHouse Cloud, configure your security group to allow inbound traffic on ports 9440 and 8443.
90-
1. For self-hosted ClickHouse, configure your security group to allow inbound traffic on the port in your config file (default is 8123).
91-
1. **Attach Security Group(s) to MSK:** Ensure that these new security groups routed to the NAT gateways are attached to your MSK cluster
160+
1. **Ensure Security Group(s) and Network ACLs Configuration:** Configure your [security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html) and [network ACLs (Access Control Lists)](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html) to allow relevant traffic.
161+
1. From MSK Connect worker ENIs to MSK brokers on TLS port (commonly 9094).
162+
1. From MSK Connect worker ENIs to ClickHouse endpoint: 9440 (native TLS) or 8443 (HTTPS).
163+
1. Allow inbound on broker SG from the MSK Connect worker SG.
164+
1. For self-hosted ClickHouse, open the port configured in your server (default 8123 for HTTP).
165+
1. **Attach Security Group(s) to MSK:** Ensure that these security groups are attached to your MSK cluster and MSK Connect workers.
166+
1. **Connectivity to ClickHouse Cloud:**
167+
1. Public endpoint + IP allowlist: requires NAT egress from private subnets.
168+
1. Private connectivity where available (e.g., VPC peering/PrivateLink/VPN). Ensure VPC DNS hostnames/resolution are enabled and DNS can resolve the private endpoint.
169+
1. **Validate connectivity (quick checklist):**
170+
1. From the connector environment, resolve MSK bootstrap DNS and connect via TLS to broker port.
171+
1. Establish TLS connection to ClickHouse on port 9440 (or 8443 for HTTPS).
172+
1. If using AWS services (Glue/Secrets Manager), allow egress to those endpoints.

scripts/aspell-ignore/en/aspell-dict.txt

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
personal_ws-1.1 en 3736
1+
personal_ws-1.1 en 3746
22
AArch
33
ACLs
44
AICPA
@@ -219,11 +219,10 @@ CloudDetails
219219
CloudFormation
220220
CloudNotSupportedBadge
221221
CloudStorage
222+
CloudTrail
222223
CloudWatch
223224
Cloudera
224225
Cloudflare
225-
CloudTrail
226-
CloudWatch
227226
CodeBlock
228227
CodeLLDB
229228
Codecs
@@ -366,6 +365,7 @@ Durre
366365
ECMA
367366
EDOT
368367
EMQX
368+
ENIs
369369
ETag
370370
EachRow
371371
Easypanel
@@ -379,9 +379,9 @@ EmbeddedRocksDB
379379
Embeddings
380380
Encodings
381381
Encrypter
382+
Entra
382383
Enum
383384
Enums
384-
Entra
385385
Eoan
386386
EphemeralNode
387387
EscapingRule
@@ -881,9 +881,9 @@ O'Reilly
881881
OAuth
882882
ODBCDriver
883883
OFNS
884+
OIDC
884885
OLAP
885886
OLTP
886-
OIDC
887887
OOMs
888888
ORCCompression
889889
OSContextSwitches
@@ -1416,6 +1416,7 @@ UnidirectionalEdgeIsValid
14161416
UniqThetaSketch
14171417
Updatable
14181418
Uppercased
1419+
Upsonic
14191420
Upstash
14201421
Uptime
14211422
Uptrace
@@ -1527,6 +1528,7 @@ aggthrow
15271528
aiochclient
15281529
alloc
15291530
allocator
1531+
allowlist
15301532
allowlisted
15311533
allowlisting
15321534
alphaTokens
@@ -3545,6 +3547,8 @@ trimLeft
35453547
trimRight
35463548
trunc
35473549
truncations
3550+
truststore
3551+
truststores
35483552
tryBase
35493553
tryDecrypt
35503554
tryIdnaEncode
@@ -3641,7 +3645,6 @@ uploaders
36413645
upperUTF
36423646
upsert
36433647
upserts
3644-
Upsonic
36453648
uptime
36463649
uptimes
36473650
uptrace

0 commit comments

Comments
 (0)