openshift
diff --git a/‎_topic_maps/_topic_map.yml‎
Lines changed: 1 addition & 1 deletion b/‎_topic_maps/_topic_map.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎etcd/etcd-performance.adoc‎
Lines changed: 40 additions & 2 deletions b/‎etcd/etcd-performance.adoc‎
Lines changed: 40 additions & 2 deletions
diff --git a/‎modules/etcd-consensus-latency.adoc‎
Lines changed: 71 additions & 0 deletions b/‎modules/etcd-consensus-latency.adoc‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎modules/etcd-database-size.adoc‎
Lines changed: 60 additions & 0 deletions b/‎modules/etcd-database-size.adoc‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎modules/etcd-determine-kube-api-transaction-rate.adoc‎
Lines changed: 30 additions & 0 deletions b/‎modules/etcd-determine-kube-api-transaction-rate.adoc‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎modules/etcd-disk-latency.adoc‎
Lines changed: 29 additions & 0 deletions b/‎modules/etcd-disk-latency.adoc‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎modules/etcd-leader-election-log-replication.adoc‎
Lines changed: 19 additions & 0 deletions b/‎modules/etcd-leader-election-log-replication.adoc‎
Lines changed: 19 additions & 0 deletions
@@ -2480,7 +2480,7 @@ Topics:
   File: etcd-overview
 - Name: Recommended etcd practices
   File: etcd-practices
-- Name: Performance considerations for etcd
+- Name: Ensuring reliable etcd performance and scalability
   File: etcd-performance
 - Name: Backing up and restoring etcd data
   Dir: etcd-backup-restore
 
@@ -1,13 +1,22 @@
 :_mod-docs-content-type: ASSEMBLY
 [id="etcd-performance"]
 include::_attributes/common-attributes.adoc[]
-= Performance considerations for etcd
+= Ensuring reliable etcd performance and scalability
 :context: etcd-performance
 
 toc::[]
 
-To ensure optimal performance and scalability for etcd in {product-title}, you can complete the following practices.
+To ensure optimal performance with etcd, it's important to understand the conditions that affect performance, including node scaling, leader election, log replication, tuning, latency, network jitter, peer round trip time, database size, and Kubernetes API transaction rates.
 
+// Leader election and log replication
+include::modules/etcd-leader-election-log-replication.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+* link:https://etcd.io/docs/v3.5/learning/design-learner/[The etcd learner design]
+* link:https://etcd.io/docs/v3.5/op-guide/failures/[Failure modes]
+
+//Node scaling for etcd
 include::modules/etcd-node-scaling.adoc[leveloffset=+1]
 
 [role="_additional-resources"]
@@ -17,18 +26,47 @@ include::modules/etcd-node-scaling.adoc[leveloffset=+1]
 * link:https://docs.redhat.com/en/documentation/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/expanding-the-cluster#installing-control-plane-node-healthy-cluster_expanding-the-cluster[Expanding the cluster]
 * xref:../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]
 
+// Effects of disk latency on etcd
+include::modules/etcd-disk-latency.adoc[leveloffset=+1]
+
+// Monitoring consensus latency for etcd
+include::modules/etcd-consensus-latency.adoc[leveloffset=+1]
+
+//Moving etcd to a different disk
 include::modules/move-etcd-different-disk.adoc[leveloffset=+1]
 
 [role="_additional-resources"]
 .Additional resources
 * xref:../architecture/architecture-rhcos.adoc#architecture-rhcos[Red Hat Enterprise Linux CoreOS (RHCOS)]
 
+//Defragmenting etcd data
 include::modules/etcd-defrag.adoc[leveloffset=+1]
 
+//Setting tuning parameters for etcd
 include::modules/etcd-tuning-parameters.adoc[leveloffset=+1]
 
 [role="_additional-resources"]
 .Additional resources
 * xref:../nodes/clusters/nodes-cluster-enabling-features.adoc#nodes-cluster-enabling-features-about_nodes-cluster-enabling[Understanding feature gates]
 
+// OCP timer tunables for etcd
+include::modules/etcd-timer-tunables.adoc[leveloffset=+1]
+
+// Determinging the size of the etcd database and understanding its affects
+include::modules/etcd-database-size.adoc[leveloffset=+1]
+
+//Increasing the database size for etcd
 include::modules/etcd-increase-db.adoc[leveloffset=+1]
+
+// Measuring network jitter between control plane nodes
+include::modules/etcd-network-latency-jitter.adoc[leveloffset=+1]
+
+// How etcd peer round trip time affects performance
+include::modules/etcd-peer-round-trip.adoc[leveloffset=+1]
+
+// Determining Kubernetes API transaction rate for your environment
+include::modules/etcd-determine-kube-api-transaction-rate.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+* link:https://kube-burner.github.io/kube-burner-ocp/latest/[kube-burner-ocp documentation]
@@ -0,0 +1,71 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-performance.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="etcd-consensus-latency_{context}"]
+= Monitoring consensus latency for etcd
+
+By using the `etcdctl` CLI, you can monitor the latency for reaching consensus as experienced by etcd. You must identify one of the etcd pods and then retrieve the endpoint health.
+
+This procedure, which validates and monitors cluster health, can be run only on an active cluster.
+
+.Prerequisites
+
+* During planning for cluster deployment, you completed the disk and network tests.
+
+.Procedure
+
+. Enter the following command:
++
+[source,terminal]
+----
+# oc get pods -n openshift-etcd -l app=etcd
+----
++
+.Example output
+[source,terminal]
+----
+NAME      READY   STATUS    RESTARTS   AGE
+etcd-m0   4/4     Running   4          8h
+etcd-m1   4/4     Running   4          8h
+etcd-m2   4/4     Running   4          8h
+----
+
+. Enter the following command. To better understand the etcd latency for consensus, you can run this command on a precise watch cycle for a few minutes to observe that the numbers remain below the ~66 ms threshold. The closer the consensus time is to 100 ms, the more likely the cluster will experience service-affecting events and instability.
++
+[source,terminal]
+----
+# oc exec -ti etcd-m0 -- etcdctl endpoint health -w table
+----
++
+.Example output
+[source,terminal]
+----
++----------------------------+--------+-------------+-------+
+|          ENDPOINT          | HEALTH |    TOOK     | ERROR |
++----------------------------+--------+-------------+-------+
+| https://198.18.111.12:2379 |   true |  3.798349ms |       |
+| https://198.18.111.14:2379 |   true |  7.389608ms |       |
+| https://198.18.111.13:2379 |   true |  6.263117ms |       |
++----------------------------+--------+-------------+-------+
+----
+
+. Enter the following command:
++
+[source,terminal]
+----
+# oc exec -ti etcd-m0 -- watch -dp -c etcdctl endpoint health -w table
+----
++
+.Example output
+[source,terminal]
+----
++----------------------------+--------+-------------+-------+
+|          ENDPOINT          | HEALTH |    TOOK     | ERROR |
++----------------------------+--------+-------------+-------+
+| https://198.18.111.12:2379 |   true |  9.533405ms |       |
+| https://198.18.111.13:2379 |   true |  4.628054ms |       |
+| https://198.18.111.14:2379 |   true |  5.803378ms |       |
++----------------------------+--------+-------------+-------+
+----
@@ -0,0 +1,60 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-performance.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="etcd-database-size_{context}"]
+= Determining the size of the etcd database and understanding its effects
+
+The size of the etcd database has a direct impact on the time to complete the etcd defragmentation process. {product-title} automatically runs the etcd defragmentation on one etcd member at a time when it detects at least 45% fragmentation. During the defragmentation process, the etcd member cannot process any requests. On small etcd databases, the defragmentation process happens in less than a second. With larger etcd databases, the disk latency directly impacts the fragmentation time, causing additional latency, as operations are blocked while defragmentation happens.
+
+The size of the etcd database is a factor to consider when network partitions isolate a control plane node for a period and the control plane needs to resync after communication is re-established.
+
+Minimal options exist for controlling the size of the etcd database, as it depends on the operators and applications in the system. When you consider the latency range under which the system will operate, account for the effects of synchronization or defragmentation per size of the etcd database.
+
+The magnitude of the effects is specific to the deployment. The time to complete a defragmentation will cause degradation in the transaction rate, as the etcd member cannot accept updates during the defragmentation process. Similarly, the time for the etcd re-synchronization for large databases with high change rate affects the transaction rate and transaction latency on the system.
+
+Consider the following two examples for the type of impacts to plan for.
+
+Example of the effect of etcd defragementation based on database size:: Writing an etcd database of 1 GB to a slow 7200 RPMs disk at 80 Mbit/s takes about 1 minute and 40 seconds. In such a scenario, the defragmentation process takes at least this long, if not longer, to complete the defragmentation.
+
+Example of the effect of database size on etcd synchronization:: If there is a change of 10% of the etcd database during the disconnection of one of the control plane nodes, the resync needs to transfer at least 100 MB. Transferring 100 MB over a 1 Gbps link takes 800 ms. On clusters with regular transactions with the Kubernetes API, the larger the etcd database size, the more network instabilities will cause control plane instabilities.
+
+You can determine the size of an etcd database by using the {product-title} console or by running commands in the `etcdctl` tool.
+
+.Procedure
+
+* To find the database size in the {product-title} console, go to the *etcd* dashboard to view a plot that reports the size of the etcd database. 
+
+* To find the database size by using the etcdctl tool, you can enter two commands:
+
+.. Enter the following command to list the pods:
++
+[source,terminal]
+----
+# oc get pods -n openshift-etcd -l app=etcd
+----
++
+.Example output
+[source,terminal]
+----
+NAME      READY   STATUS    RESTARTS   AGE
+etcd-m0   4/4     Running   4          22h
+etcd-m1   4/4     Running   4          22h
+etcd-m2   4/4     Running   4          22h
+----
+
+.. Enter the following command and view the database size in the output:
++
+[source,terminal]
+----
+# oc exec -t etcd-m0 -- etcdctl endpoint status -w simple | cut -d, -f 1,3,4
+----
++
+.Example output
+[source,terminal]
+----
+https://198.18.111.12:2379, 3.5.6, 1.1 GB
+https://198.18.111.13:2379, 3.5.6, 1.1 GB
+https://198.18.111.14:2379, 3.5.6, 1.1 GB
+----
@@ -0,0 +1,30 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-performance.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="etcd-determine-kube-api-transaction-rate_{context}"]
+= Determining Kubernetes API transaction rate for your environment
+
+When you are using stretched control planes, the Kubernetes API transaction rate depends on the characteristics of the particular deployment. Specifically, it depends on the following combined factors:
+
+* The etcd disk latency
+* The etcd round trip time
+* The size of objects that are being written to the API
+
+As a result, when you use stretched control planes, cluster administrators must test the environment to determine the sustained transaction rate that is possible for the environment. The `kube-burner` tool is useful for that purpose. The binary includes a wrapper for testing OpenShift clusters: `kube-burner-ocp`. You can use `kube-burner-ocp` to test cluster or node density. To test the control plane, `kube-burner-ocp` has three workload profiles: cluster-density, cluster-density-v2, and cluster-density-ms. Each workload profile creates a series of resources that are designed to load the control plane. For more information about each profile, see the `kube-burner-ocp` workload documentation.
+
+.Procedure
+
+. Enter a command to create and delete resources. The following example shows a command that creates and deletes resources within 20 minutes:
++
+[source,terminal]
+----
+# kube-burner ocp cluster-density-ms --churn-duration 20m --churn-delay 0s --iterations 10 --timeout 30m
+----
+
+. The {product-title} console provides a dashboard with all the relevant API performance information. To access API performance information, click *Observe* -> *Dashboards*, and from the *Dashboards* menu, click *API Performance*.
+
+. During the run, observe the API performance dashboard in the {product-title} console by clicking *Observe* -> *Dashboards*, and from the *Dashboards* menu, click *API Performance*. 
++
+On the dashboard, notice how the control plane responds during load and the 99th percentile transaction rate it can achieve for the execution of various verbs and request rates by read and write. Use this information and the knowledge of your organization's workload to determine the load that the organization can put in the clusters for the specific stretched control plane deployment.
@@ -0,0 +1,29 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-performance.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="etcd-disk-latency_{context}"]
+= Effects of disk latency on etcd
+
+An etcd cluster is sensitive to disk latencies. To understand the disk latency that is experienced by etcd in your control plane environment, run the `fio` tests or suite.
+
+Make sure that the final report classifies the disk as appropriate for etcd, as shown in the following example:
+
+[source,terminal]
+----
+...
+99th percentile of fsync is 5865472 ns
+99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd
+----
+
+When a high latency disk is used, a message states that the disk is not recommended for etcd, as shown in the following example:
+
+[source,terminal]
+----
+...
+99th percentile of fsync is 15865472 ns
+99th percentile of the fsync is greater than the recommended value which is 20 ms, faster disks are recommended to host etcd for better performance
+----
+
+When you use cluster deployments that span multiple data centers that are using disks for etcd that do not meet the recommended latency, it increases the chances of service-affecting failures and dramatically reduces the network latency that the control plane can sustain.
@@ -0,0 +1,19 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-performance.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="etcd-leader-election-log-replication_{context}"]
+= Leader election and log replication of etcd
+
+etcd is a consistent, distributed key-value store that operates as a cluster of replicated nodes. Following the Raft algorithm, etcd operates by electing one node as the leader and the others as followers. The leader maintains the system's current state and ensures that the followers are up-to-date.
+
+The leader node is responsible for log replication. It handles incoming write transactions from the client and writes a Raft log entry that it then broadcasts to the followers.
+
+//diagram goes here
+
+When an etcd client such as `kube-apiserver` connects to an etcd member that is requesting an action that requires a quorum, such as writing a value, if the etcd member is a follower, it returns a message indicating the transaction should be sent to the leader.
+
+//second diagram goes here
+
+When the etcd client requests an action that requires a quorum from the leader, the leader keeps the client connection open while it writes the local Raft log, broadcasts the log to the followers, and waits for the majority of the followers to acknowledge to have committed the log without failures. Only then does the leader send the acknowledgment to the etcd client and close the session. If failure notifications are received from the followers and the majority fails to reach a consensus, the leader returns the error message to the client and closes the session.