You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/3953-node-resource-hot-plug/README.md
+11-13Lines changed: 11 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,8 +99,7 @@ aims to optimize resource management, improve scalability, and minimize disrupti
99
99
100
100
## Motivation
101
101
Currently, the node's resource configurations are recorded solely during the kubelet bootstrap phase and subsequently cached, assuming the node's compute capacity remains unchanged throughout the cluster's lifecycle.
102
-
In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time,
103
-
necessitating supplementary resources within the cluster.
102
+
In a conventional Kubernetes environment, cluster resources might need modification because of inaccurate resource allocation or due to escalating workloads over time, requiring supplementary resources within the cluster.
104
103
105
104
Contemporarily, kernel capabilities enable the dynamic addition of CPUs and memory to a node (for example: https://docs.kernel.org/core-api/cpu_hotplug.html and https://docs.kernel.org/core-api/memory-hotplug.html).
106
105
This can be across different architecture and compute environments like Cloud, Bare metal or VM. During this exercise it can lead to Kubernetes being unaware of the node's altered compute capacities during a live-resize,
@@ -119,7 +118,7 @@ However, this approach does carry a few drawbacks such as
Hence, it is necessary to handle the updates in the compute capacity in a graceful fashion across the cluster, than adopting to reset the cluster components to achieve the same.
121
+
Hence, it is necessary to handle capacity updates gracefully across the cluster, rather than resetting the cluster components to achieve the same outcome.
123
122
124
123
Also, given that the capability to live-resize a node exists in the Linux and Windows kernels, enabling the kubelet to be aware of the underlying changes in the node's compute capacity will mitigate any actions that are required to be made
125
124
by the Kubernetes administrator.
@@ -152,7 +151,7 @@ Implementing this KEP will empower nodes to recognize and adapt to changes in th
152
151
This KEP strives to enable node resource hot plugging by making the kubelet to watch and retrieve machine resource information from cAdvisor's cache as and when it changes, cAdvisor's cache is already updated periodically.
153
152
The kubelet will fetch this information, subsequently entrusting the node status updater to disseminate these updates at the node level across the cluster.
154
153
Moreover, this KEP aims to refine the initialization and reinitialization processes of resource managers, including the memory manager and CPU manager, to ensure their adaptability to changes in node configurations.
155
-
With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries small overhead due to recalculation of swap and OOMScoreAdj.
154
+
With this proposal it's also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries a small overhead due to recalculation of swap and OOMScoreAdj.
156
155
157
156
### User Stories
158
157
@@ -202,8 +201,8 @@ detect the change in compute capacity, which can bring in additional complicatio
202
201
- Post up-scale any failure in resync of Resource managers may be lead to incorrect or rejected allocation, which can lead to underperformed or rejected workload.
203
202
- To mitigate the risks adequate tests should be added to avoid the scenarios where failure to resync resource managers can occur.
204
203
205
-
- Lack of coordination about change in resource availability across kubelet/runtime/plugins.
206
-
- The plugins/runtime should be updated to react to change in resource information on the node.
204
+
- Lack of coordination about change in resource availability across kubelet/runtime/NRI plugins.
205
+
- The runtime/NRI plugins should be updated to react to change in resource information on the node.
207
206
208
207
- Kubelet missing on processing hotplug instance(s)
209
208
- Kubelet observes the underlying node for any hotplug of resources as and when generated,
@@ -221,7 +220,7 @@ detect the change in compute capacity, which can bring in additional complicatio
221
220
222
221
## Design Details
223
222
224
-
Below diagram is shows the interaction between kubelet, node and cAdvisor.
223
+
The diagram below shows the interaction between kubelet, node and cAdvisor.
225
224
226
225
```mermaid
227
226
sequenceDiagram
@@ -263,9 +262,9 @@ With increase in cluster resources the following components will be updated:
* Increase in nodeTotalMemory or totalPodsSwapAvailable will result in updated swap memory limit for pods deployed post resize and also recalculate the same for existing pods.
265
264
266
-
3. Resource managers will re-initialised.
265
+
3. Resource managers are re-initialised.
267
266
268
-
4. Update in Node allocatable capacity.
267
+
4. Update in Node capacity.
269
268
270
269
5. Scheduler:
271
270
* Scheduler will automatically schedule any pending pods.
@@ -439,7 +438,7 @@ Following scenarios need to be covered:
439
438
* Node resource information before and after resource hot plug for the following scenarios.
440
439
* upsize -> downsize
441
440
* upsize -> downsize -> upsize
442
-
* downsize- > upsize
441
+
* downsize -> upsize
443
442
* State of Pending pods due to lack of resources after resource hot plug.
444
443
* Resource manager states after the resync of components.
445
444
@@ -593,8 +592,7 @@ will rollout across nodes.
593
592
-->
594
593
595
594
Rollout may fail if the resource managers are not re-synced properly due to programmatic errors.
596
-
In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain
597
-
in the pending state only.
595
+
In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain pending.
598
596
Rollback failure should not affect running workloads.
599
597
600
598
###### What specific metrics should inform a rollback?
@@ -915,7 +913,7 @@ VMs of cluster should support hot plug of compute resources for e2e tests.
915
913
or if it has to be terminated due to resource crunch.
916
914
* Recalculate OOM adjust score and Swap limits:
917
915
* Since the total capacity of the node has changed, values associated with the nodes memory capacity must be recomputed.
918
-
* Handling unplug of reserved CPUs.
916
+
* Handling unplug of reserved and exclusively allocated cpus CPUs.
919
917
920
918
* Fetching machine info via CRI
921
919
* At present, the machine data is retrieved from cAdvisor's cache through periodic checks. There is ongoing development to utilize CRI APIs for this purpose.
0 commit comments