Skip to content

Commit cc55e76

Browse files
committed
Address review comments
1 parent 66bccdf commit cc55e76

File tree

1 file changed

+11
-13
lines changed
  • keps/sig-node/3953-node-resource-hot-plug

1 file changed

+11
-13
lines changed

keps/sig-node/3953-node-resource-hot-plug/README.md

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,7 @@ aims to optimize resource management, improve scalability, and minimize disrupti
9999

100100
## Motivation
101101
Currently, the node's resource configurations are recorded solely during the kubelet bootstrap phase and subsequently cached, assuming the node's compute capacity remains unchanged throughout the cluster's lifecycle.
102-
In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time,
103-
necessitating supplementary resources within the cluster.
102+
In a conventional Kubernetes environment, cluster resources might need modification because of inaccurate resource allocation or due to escalating workloads over time, requiring supplementary resources within the cluster.
104103

105104
Contemporarily, kernel capabilities enable the dynamic addition of CPUs and memory to a node (for example: https://docs.kernel.org/core-api/cpu_hotplug.html and https://docs.kernel.org/core-api/memory-hotplug.html).
106105
This can be across different architecture and compute environments like Cloud, Bare metal or VM. During this exercise it can lead to Kubernetes being unaware of the node's altered compute capacities during a live-resize,
@@ -119,7 +118,7 @@ However, this approach does carry a few drawbacks such as
119118
- https://github.com/kubernetes/kubernetes/issues/125579
120119
- https://github.com/kubernetes/kubernetes/issues/127793
121120

122-
Hence, it is necessary to handle the updates in the compute capacity in a graceful fashion across the cluster, than adopting to reset the cluster components to achieve the same.
121+
Hence, it is necessary to handle capacity updates gracefully across the cluster, rather than resetting the cluster components to achieve the same outcome.
123122

124123
Also, given that the capability to live-resize a node exists in the Linux and Windows kernels, enabling the kubelet to be aware of the underlying changes in the node's compute capacity will mitigate any actions that are required to be made
125124
by the Kubernetes administrator.
@@ -152,7 +151,7 @@ Implementing this KEP will empower nodes to recognize and adapt to changes in th
152151
This KEP strives to enable node resource hot plugging by making the kubelet to watch and retrieve machine resource information from cAdvisor's cache as and when it changes, cAdvisor's cache is already updated periodically.
153152
The kubelet will fetch this information, subsequently entrusting the node status updater to disseminate these updates at the node level across the cluster.
154153
Moreover, this KEP aims to refine the initialization and reinitialization processes of resource managers, including the memory manager and CPU manager, to ensure their adaptability to changes in node configurations.
155-
With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries small overhead due to recalculation of swap and OOMScoreAdj.
154+
With this proposal it's also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries a small overhead due to recalculation of swap and OOMScoreAdj.
156155

157156
### User Stories
158157

@@ -202,8 +201,8 @@ detect the change in compute capacity, which can bring in additional complicatio
202201
- Post up-scale any failure in resync of Resource managers may be lead to incorrect or rejected allocation, which can lead to underperformed or rejected workload.
203202
- To mitigate the risks adequate tests should be added to avoid the scenarios where failure to resync resource managers can occur.
204203

205-
- Lack of coordination about change in resource availability across kubelet/runtime/plugins.
206-
- The plugins/runtime should be updated to react to change in resource information on the node.
204+
- Lack of coordination about change in resource availability across kubelet/runtime/NRI plugins.
205+
- The runtime/NRI plugins should be updated to react to change in resource information on the node.
207206

208207
- Kubelet missing on processing hotplug instance(s)
209208
- Kubelet observes the underlying node for any hotplug of resources as and when generated,
@@ -221,7 +220,7 @@ detect the change in compute capacity, which can bring in additional complicatio
221220

222221
## Design Details
223222

224-
Below diagram is shows the interaction between kubelet, node and cAdvisor.
223+
The diagram below shows the interaction between kubelet, node and cAdvisor.
225224

226225
```mermaid
227226
sequenceDiagram
@@ -263,9 +262,9 @@ With increase in cluster resources the following components will be updated:
263262
`(<containerMemoryRequest>/<nodeTotalMemory>)*<totalPodsSwapAvailable>`
264263
* Increase in nodeTotalMemory or totalPodsSwapAvailable will result in updated swap memory limit for pods deployed post resize and also recalculate the same for existing pods.
265264

266-
3. Resource managers will re-initialised.
265+
3. Resource managers are re-initialised.
267266

268-
4. Update in Node allocatable capacity.
267+
4. Update in Node capacity.
269268

270269
5. Scheduler:
271270
* Scheduler will automatically schedule any pending pods.
@@ -439,7 +438,7 @@ Following scenarios need to be covered:
439438
* Node resource information before and after resource hot plug for the following scenarios.
440439
* upsize -> downsize
441440
* upsize -> downsize -> upsize
442-
* downsize- > upsize
441+
* downsize -> upsize
443442
* State of Pending pods due to lack of resources after resource hot plug.
444443
* Resource manager states after the resync of components.
445444
@@ -593,8 +592,7 @@ will rollout across nodes.
593592
-->
594593
595594
Rollout may fail if the resource managers are not re-synced properly due to programmatic errors.
596-
In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain
597-
in the pending state only.
595+
In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain pending.
598596
Rollback failure should not affect running workloads.
599597
600598
###### What specific metrics should inform a rollback?
@@ -915,7 +913,7 @@ VMs of cluster should support hot plug of compute resources for e2e tests.
915913
or if it has to be terminated due to resource crunch.
916914
* Recalculate OOM adjust score and Swap limits:
917915
* Since the total capacity of the node has changed, values associated with the nodes memory capacity must be recomputed.
918-
* Handling unplug of reserved CPUs.
916+
* Handling unplug of reserved and exclusively allocated cpus CPUs.
919917
920918
* Fetching machine info via CRI
921919
* At present, the machine data is retrieved from cAdvisor's cache through periodic checks. There is ongoing development to utilize CRI APIs for this purpose.

0 commit comments

Comments
 (0)