Skip to content

Commit 7238a75

Browse files
updated Kafka Docs
1 parent da08cfb commit 7238a75

File tree

2 files changed

+96
-22
lines changed

2 files changed

+96
-22
lines changed

website/guides/01-deployment-guide/02-data-management-&-storage.md

Lines changed: 95 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -82,33 +82,107 @@ The file transfer service Score is compatible with any S3 storage provider; for
8282

8383
## Running Kafka
8484

85-
1. **Run Kafka:** Use the following command to pull and run the Kafka docker container
85+
Kafka serves as a distributed streaming platform, enabling high-throughput, fault-tolerant, and scalable messaging between Song and Maestro.
86+
87+
:::info Why Kafka?
88+
Kafka serves as a message broker between Song and Maestro, enabling asynchronous communication between services. Kafka provides reliable message delivery and persistence, handling message queuing and processing at scale. This ensures fault tolerance when processing multiple indexing requests between Song and Maestro.
89+
:::
90+
91+
The following configuration creates a single-node Kafka broker for development use:
92+
93+
1. **Create an env file:** Create a file named `.env.kafka` with the following content:
8694

8795
```bash
88-
docker run -d --name kafka \
89-
--platform linux/amd64 \
90-
-p 9092:9092 -p 29092:29092 \
91-
-e KAFKA_PROCESS_ROLES="broker,controller" \
92-
-e KAFKA_NODE_ID=1 \
93-
-e KAFKA_LISTENERS="PLAINTEXT://kafka:9092,CONTROLLER://kafka:9093" \
94-
-e KAFKA_ADVERTISED_LISTENERS="PLAINTEXT://kafka:9092" \
95-
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP="PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT" \
96-
-e KAFKA_INTER_BROKER_LISTENER_NAME="PLAINTEXT" \
97-
-e KAFKA_CONTROLLER_QUORUM_VOTERS="1@kafka:9093" \
98-
-e KAFKA_CONTROLLER_LISTENER_NAMES="CONTROLLER" \
99-
-e KAFKA_LOG_DIRS="/var/lib/kafka/data" \
100-
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
101-
-e KAFKA_AUTO_CREATE_TOPICS_ENABLE=false \
102-
-e KAFKA_NUM_PARTITIONS=1 \
103-
-e CLUSTER_ID="q1Sh-9_ISia_zwGINzRvyQ" \
104-
confluentinc/cp-kafka:7.6.1
96+
# ==============================
97+
# Kafka Environment Variables
98+
# ==============================
99+
100+
# Core Kafka Configuration
101+
KAFKA_PROCESS_ROLES=broker,controller
102+
KAFKA_NODE_ID=1
103+
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://localhost:29092
104+
KAFKA_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://0.0.0.0:29092,CONTROLLER://kafka:9093
105+
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
106+
KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
107+
KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka:9093
108+
KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER
109+
110+
# Storage Configuration
111+
KAFKA_LOG_DIRS=/var/lib/kafka/data
112+
KAFKA_LOG_RETENTION_HOURS=168
113+
KAFKA_LOG_RETENTION_BYTES=-1
114+
115+
# Topic Configuration
116+
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
117+
KAFKA_AUTO_CREATE_TOPICS_ENABLE=false
118+
KAFKA_NUM_PARTITIONS=1
119+
KAFKA_DEFAULT_REPLICATION_FACTOR=1
120+
KAFKA_MIN_INSYNC_REPLICAS=1
121+
122+
# Performance Tuning
123+
KAFKA_MESSAGE_MAX_BYTES=5242880
124+
KAFKA_REPLICA_FETCH_MAX_BYTES=5242880
125+
126+
# Logging Configuration
127+
KAFKA_LOG4J_LOGGERS=kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO
128+
KAFKA_LOG4J_ROOT_LOGLEVEL=INFO
129+
130+
# Cluster Configuration
131+
CLUSTER_ID=q1Sh-9_ISia_zwGINzRvyQ
105132
```
106133

107-
:::info What is this for?
108-
Kafka serves as a distributed streaming platform, enabling high-throughput, fault-tolerant, and scalable messaging between Song and Maestro. Kafka acts as the backbone messaging system for Song and Maestro, facilitating asynchronous communication ensuring efficient and reliable job execution, queuing, and processing.
134+
<details>
135+
<summary><b>Click here for a detailed breakdown</b></summary>
136+
137+
#### Core Kafka Configuration
138+
- `KAFKA_PROCESS_ROLES`: Defines the roles this broker will fulfill (broker and controller)
139+
- `KAFKA_NODE_ID`: Unique identifier for this broker in the cluster
140+
- `KAFKA_ADVERTISED_LISTENERS`: External connection points other clients will use to connect
141+
- `KAFKA_LISTENERS`: Internal connection points for broker communication
142+
- `KAFKA_LISTENER_SECURITY_PROTOCOL_MAP`: Maps listener names to security protocols
143+
- `KAFKA_INTER_BROKER_LISTENER_NAME`: Listener used for inter-broker communication
144+
- `KAFKA_CONTROLLER_QUORUM_VOTERS`: List of controller nodes in the cluster
145+
- `KAFKA_CONTROLLER_LISTENER_NAMES`: Names of listeners used for controller connections
146+
147+
#### Storage Configuration
148+
- `KAFKA_LOG_DIRS`: Directory where Kafka stores its log files
149+
- `KAFKA_LOG_RETENTION_HOURS`: How long to keep messages (7 days)
150+
- `KAFKA_LOG_RETENTION_BYTES`: Maximum size of the log before deletion (-1 means unlimited)
151+
152+
#### Topic Configuration
153+
- `KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR`: Replication factor for the offsets topic
154+
- `KAFKA_AUTO_CREATE_TOPICS_ENABLE`: Whether to allow automatic topic creation
155+
- `KAFKA_NUM_PARTITIONS`: Default number of partitions for new topics
156+
- `KAFKA_DEFAULT_REPLICATION_FACTOR`: Default replication factor for new topics
157+
- `KAFKA_MIN_INSYNC_REPLICAS`: Minimum number of replicas that must acknowledge writes
158+
159+
#### Performance Tuning
160+
- `KAFKA_MESSAGE_MAX_BYTES`: Maximum message size (5MB)
161+
- `KAFKA_REPLICA_FETCH_MAX_BYTES`: Maximum size of messages that can be fetched
162+
163+
#### Logging Configuration
164+
- `KAFKA_LOG4J_LOGGERS`: Specific logger levels for Kafka components
165+
- `KAFKA_LOG4J_ROOT_LOGLEVEL`: Default logging level for all components
166+
167+
</details>
168+
169+
:::tip For more detailed information about Kafka refer to:
170+
- [Confluent Kafka Documentation](https://docs.confluent.io/platform/current/installation/docker/config-reference.html#confluent-ak-configuration)
171+
- [Spring Cloud Stream Documentation](https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/)
172+
- [Apache Kafka Documentation](https://kafka.apache.org/documentation/)
109173
:::
110174

111-
For more detailed information on Kafka configurations, please refer to the [official Confluent Kafka documentation](https://docs.confluent.io/platform/current/installation/docker/config-reference.html#confluent-ak-configuration).
175+
2. **Run Kafka:** Use the docker run command with your `.env.kafka` file:
176+
177+
```bash
178+
docker run -d \
179+
--name kafka \
180+
--platform linux/amd64 \
181+
-p 9092:9092 \
182+
-p 29092:29092 \
183+
--env-file .env.kafka \
184+
confluentinc/cp-kafka:7.6.1
185+
```
112186

113187
## Running Song
114188

0 commit comments

Comments
 (0)