Skip to content

Commit 6228b86

Browse files
authored
Merge pull request #99791 from lahinson/osdocs-13615-etcd-performance
[OSDOCS-13615]: adding etcd performance content from kbase article
2 parents 060814d + 35600c4 commit 6228b86

11 files changed

+504
-3
lines changed

_topic_maps/_topic_map.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2480,7 +2480,7 @@ Topics:
24802480
File: etcd-overview
24812481
- Name: Recommended etcd practices
24822482
File: etcd-practices
2483-
- Name: Performance considerations for etcd
2483+
- Name: Ensuring reliable etcd performance and scalability
24842484
File: etcd-performance
24852485
- Name: Backing up and restoring etcd data
24862486
Dir: etcd-backup-restore

etcd/etcd-performance.adoc

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,22 @@
11
:_mod-docs-content-type: ASSEMBLY
22
[id="etcd-performance"]
33
include::_attributes/common-attributes.adoc[]
4-
= Performance considerations for etcd
4+
= Ensuring reliable etcd performance and scalability
55
:context: etcd-performance
66

77
toc::[]
88

9-
To ensure optimal performance and scalability for etcd in {product-title}, you can complete the following practices.
9+
To ensure optimal performance with etcd, it's important to understand the conditions that affect performance, including node scaling, leader election, log replication, tuning, latency, network jitter, peer round trip time, database size, and Kubernetes API transaction rates.
1010

11+
// Leader election and log replication
12+
include::modules/etcd-leader-election-log-replication.adoc[leveloffset=+1]
13+
14+
[role="_additional-resources"]
15+
.Additional resources
16+
* link:https://etcd.io/docs/v3.5/learning/design-learner/[The etcd learner design]
17+
* link:https://etcd.io/docs/v3.5/op-guide/failures/[Failure modes]
18+
19+
//Node scaling for etcd
1120
include::modules/etcd-node-scaling.adoc[leveloffset=+1]
1221

1322
[role="_additional-resources"]
@@ -17,18 +26,47 @@ include::modules/etcd-node-scaling.adoc[leveloffset=+1]
1726
* link:https://docs.redhat.com/en/documentation/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/expanding-the-cluster#installing-control-plane-node-healthy-cluster_expanding-the-cluster[Expanding the cluster]
1827
* xref:../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]
1928
29+
// Effects of disk latency on etcd
30+
include::modules/etcd-disk-latency.adoc[leveloffset=+1]
31+
32+
// Monitoring consensus latency for etcd
33+
include::modules/etcd-consensus-latency.adoc[leveloffset=+1]
34+
35+
//Moving etcd to a different disk
2036
include::modules/move-etcd-different-disk.adoc[leveloffset=+1]
2137

2238
[role="_additional-resources"]
2339
.Additional resources
2440
* xref:../architecture/architecture-rhcos.adoc#architecture-rhcos[Red Hat Enterprise Linux CoreOS (RHCOS)]
2541
42+
//Defragmenting etcd data
2643
include::modules/etcd-defrag.adoc[leveloffset=+1]
2744

45+
//Setting tuning parameters for etcd
2846
include::modules/etcd-tuning-parameters.adoc[leveloffset=+1]
2947

3048
[role="_additional-resources"]
3149
.Additional resources
3250
* xref:../nodes/clusters/nodes-cluster-enabling-features.adoc#nodes-cluster-enabling-features-about_nodes-cluster-enabling[Understanding feature gates]
3351
52+
// OCP timer tunables for etcd
53+
include::modules/etcd-timer-tunables.adoc[leveloffset=+1]
54+
55+
// Determinging the size of the etcd database and understanding its affects
56+
include::modules/etcd-database-size.adoc[leveloffset=+1]
57+
58+
//Increasing the database size for etcd
3459
include::modules/etcd-increase-db.adoc[leveloffset=+1]
60+
61+
// Measuring network jitter between control plane nodes
62+
include::modules/etcd-network-latency-jitter.adoc[leveloffset=+1]
63+
64+
// How etcd peer round trip time affects performance
65+
include::modules/etcd-peer-round-trip.adoc[leveloffset=+1]
66+
67+
// Determining Kubernetes API transaction rate for your environment
68+
include::modules/etcd-determine-kube-api-transaction-rate.adoc[leveloffset=+1]
69+
70+
[role="_additional-resources"]
71+
.Additional resources
72+
* link:https://kube-burner.github.io/kube-burner-ocp/latest/[kube-burner-ocp documentation]
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * etcd/etcd-performance.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="etcd-consensus-latency_{context}"]
7+
= Monitoring consensus latency for etcd
8+
9+
By using the `etcdctl` CLI, you can monitor the latency for reaching consensus as experienced by etcd. You must identify one of the etcd pods and then retrieve the endpoint health.
10+
11+
This procedure, which validates and monitors cluster health, can be run only on an active cluster.
12+
13+
.Prerequisites
14+
15+
* During planning for cluster deployment, you completed the disk and network tests.
16+
17+
.Procedure
18+
19+
. Enter the following command:
20+
+
21+
[source,terminal]
22+
----
23+
# oc get pods -n openshift-etcd -l app=etcd
24+
----
25+
+
26+
.Example output
27+
[source,terminal]
28+
----
29+
NAME READY STATUS RESTARTS AGE
30+
etcd-m0 4/4 Running 4 8h
31+
etcd-m1 4/4 Running 4 8h
32+
etcd-m2 4/4 Running 4 8h
33+
----
34+
35+
. Enter the following command. To better understand the etcd latency for consensus, you can run this command on a precise watch cycle for a few minutes to observe that the numbers remain below the ~66 ms threshold. The closer the consensus time is to 100 ms, the more likely the cluster will experience service-affecting events and instability.
36+
+
37+
[source,terminal]
38+
----
39+
# oc exec -ti etcd-m0 -- etcdctl endpoint health -w table
40+
----
41+
+
42+
.Example output
43+
[source,terminal]
44+
----
45+
+----------------------------+--------+-------------+-------+
46+
| ENDPOINT | HEALTH | TOOK | ERROR |
47+
+----------------------------+--------+-------------+-------+
48+
| https://198.18.111.12:2379 | true | 3.798349ms | |
49+
| https://198.18.111.14:2379 | true | 7.389608ms | |
50+
| https://198.18.111.13:2379 | true | 6.263117ms | |
51+
+----------------------------+--------+-------------+-------+
52+
----
53+
54+
. Enter the following command:
55+
+
56+
[source,terminal]
57+
----
58+
# oc exec -ti etcd-m0 -- watch -dp -c etcdctl endpoint health -w table
59+
----
60+
+
61+
.Example output
62+
[source,terminal]
63+
----
64+
+----------------------------+--------+-------------+-------+
65+
| ENDPOINT | HEALTH | TOOK | ERROR |
66+
+----------------------------+--------+-------------+-------+
67+
| https://198.18.111.12:2379 | true | 9.533405ms | |
68+
| https://198.18.111.13:2379 | true | 4.628054ms | |
69+
| https://198.18.111.14:2379 | true | 5.803378ms | |
70+
+----------------------------+--------+-------------+-------+
71+
----

modules/etcd-database-size.adoc

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * etcd/etcd-performance.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="etcd-database-size_{context}"]
7+
= Determining the size of the etcd database and understanding its effects
8+
9+
The size of the etcd database has a direct impact on the time to complete the etcd defragmentation process. {product-title} automatically runs the etcd defragmentation on one etcd member at a time when it detects at least 45% fragmentation. During the defragmentation process, the etcd member cannot process any requests. On small etcd databases, the defragmentation process happens in less than a second. With larger etcd databases, the disk latency directly impacts the fragmentation time, causing additional latency, as operations are blocked while defragmentation happens.
10+
11+
The size of the etcd database is a factor to consider when network partitions isolate a control plane node for a period and the control plane needs to resync after communication is re-established.
12+
13+
Minimal options exist for controlling the size of the etcd database, as it depends on the operators and applications in the system. When you consider the latency range under which the system will operate, account for the effects of synchronization or defragmentation per size of the etcd database.
14+
15+
The magnitude of the effects is specific to the deployment. The time to complete a defragmentation will cause degradation in the transaction rate, as the etcd member cannot accept updates during the defragmentation process. Similarly, the time for the etcd re-synchronization for large databases with high change rate affects the transaction rate and transaction latency on the system.
16+
17+
Consider the following two examples for the type of impacts to plan for.
18+
19+
Example of the effect of etcd defragementation based on database size:: Writing an etcd database of 1 GB to a slow 7200 RPMs disk at 80 Mbit/s takes about 1 minute and 40 seconds. In such a scenario, the defragmentation process takes at least this long, if not longer, to complete the defragmentation.
20+
21+
Example of the effect of database size on etcd synchronization:: If there is a change of 10% of the etcd database during the disconnection of one of the control plane nodes, the resync needs to transfer at least 100 MB. Transferring 100 MB over a 1 Gbps link takes 800 ms. On clusters with regular transactions with the Kubernetes API, the larger the etcd database size, the more network instabilities will cause control plane instabilities.
22+
23+
You can determine the size of an etcd database by using the {product-title} console or by running commands in the `etcdctl` tool.
24+
25+
.Procedure
26+
27+
* To find the database size in the {product-title} console, go to the *etcd* dashboard to view a plot that reports the size of the etcd database.
28+
29+
* To find the database size by using the etcdctl tool, you can enter two commands:
30+
31+
.. Enter the following command to list the pods:
32+
+
33+
[source,terminal]
34+
----
35+
# oc get pods -n openshift-etcd -l app=etcd
36+
----
37+
+
38+
.Example output
39+
[source,terminal]
40+
----
41+
NAME READY STATUS RESTARTS AGE
42+
etcd-m0 4/4 Running 4 22h
43+
etcd-m1 4/4 Running 4 22h
44+
etcd-m2 4/4 Running 4 22h
45+
----
46+
47+
.. Enter the following command and view the database size in the output:
48+
+
49+
[source,terminal]
50+
----
51+
# oc exec -t etcd-m0 -- etcdctl endpoint status -w simple | cut -d, -f 1,3,4
52+
----
53+
+
54+
.Example output
55+
[source,terminal]
56+
----
57+
https://198.18.111.12:2379, 3.5.6, 1.1 GB
58+
https://198.18.111.13:2379, 3.5.6, 1.1 GB
59+
https://198.18.111.14:2379, 3.5.6, 1.1 GB
60+
----
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * etcd/etcd-performance.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="etcd-determine-kube-api-transaction-rate_{context}"]
7+
= Determining Kubernetes API transaction rate for your environment
8+
9+
When you are using stretched control planes, the Kubernetes API transaction rate depends on the characteristics of the particular deployment. Specifically, it depends on the following combined factors:
10+
11+
* The etcd disk latency
12+
* The etcd round trip time
13+
* The size of objects that are being written to the API
14+
15+
As a result, when you use stretched control planes, cluster administrators must test the environment to determine the sustained transaction rate that is possible for the environment. The `kube-burner` tool is useful for that purpose. The binary includes a wrapper for testing OpenShift clusters: `kube-burner-ocp`. You can use `kube-burner-ocp` to test cluster or node density. To test the control plane, `kube-burner-ocp` has three workload profiles: cluster-density, cluster-density-v2, and cluster-density-ms. Each workload profile creates a series of resources that are designed to load the control plane. For more information about each profile, see the `kube-burner-ocp` workload documentation.
16+
17+
.Procedure
18+
19+
. Enter a command to create and delete resources. The following example shows a command that creates and deletes resources within 20 minutes:
20+
+
21+
[source,terminal]
22+
----
23+
# kube-burner ocp cluster-density-ms --churn-duration 20m --churn-delay 0s --iterations 10 --timeout 30m
24+
----
25+
26+
. The {product-title} console provides a dashboard with all the relevant API performance information. To access API performance information, click *Observe* -> *Dashboards*, and from the *Dashboards* menu, click *API Performance*.
27+
28+
. During the run, observe the API performance dashboard in the {product-title} console by clicking *Observe* -> *Dashboards*, and from the *Dashboards* menu, click *API Performance*.
29+
+
30+
On the dashboard, notice how the control plane responds during load and the 99th percentile transaction rate it can achieve for the execution of various verbs and request rates by read and write. Use this information and the knowledge of your organization's workload to determine the load that the organization can put in the clusters for the specific stretched control plane deployment.

modules/etcd-disk-latency.adoc

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * etcd/etcd-performance.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="etcd-disk-latency_{context}"]
7+
= Effects of disk latency on etcd
8+
9+
An etcd cluster is sensitive to disk latencies. To understand the disk latency that is experienced by etcd in your control plane environment, run the `fio` tests or suite.
10+
11+
Make sure that the final report classifies the disk as appropriate for etcd, as shown in the following example:
12+
13+
[source,terminal]
14+
----
15+
...
16+
99th percentile of fsync is 5865472 ns
17+
99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd
18+
----
19+
20+
When a high latency disk is used, a message states that the disk is not recommended for etcd, as shown in the following example:
21+
22+
[source,terminal]
23+
----
24+
...
25+
99th percentile of fsync is 15865472 ns
26+
99th percentile of the fsync is greater than the recommended value which is 20 ms, faster disks are recommended to host etcd for better performance
27+
----
28+
29+
When you use cluster deployments that span multiple data centers that are using disks for etcd that do not meet the recommended latency, it increases the chances of service-affecting failures and dramatically reduces the network latency that the control plane can sustain.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * etcd/etcd-performance.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="etcd-leader-election-log-replication_{context}"]
7+
= Leader election and log replication of etcd
8+
9+
etcd is a consistent, distributed key-value store that operates as a cluster of replicated nodes. Following the Raft algorithm, etcd operates by electing one node as the leader and the others as followers. The leader maintains the system's current state and ensures that the followers are up-to-date.
10+
11+
The leader node is responsible for log replication. It handles incoming write transactions from the client and writes a Raft log entry that it then broadcasts to the followers.
12+
13+
//diagram goes here
14+
15+
When an etcd client such as `kube-apiserver` connects to an etcd member that is requesting an action that requires a quorum, such as writing a value, if the etcd member is a follower, it returns a message indicating the transaction should be sent to the leader.
16+
17+
//second diagram goes here
18+
19+
When the etcd client requests an action that requires a quorum from the leader, the leader keeps the client connection open while it writes the local Raft log, broadcasts the log to the followers, and waits for the majority of the followers to acknowledge to have committed the log without failures. Only then does the leader send the acknowledgment to the etcd client and close the session. If failure notifications are received from the followers and the majority fails to reach a consensus, the leader returns the error message to the client and closes the session.

0 commit comments

Comments
 (0)