Skip to content

Commit da4e8cb

Browse files
committed
kep: adds nodedm proposal
1 parent 97cc94c commit da4e8cb

File tree

1 file changed

+311
-0
lines changed

1 file changed

+311
-0
lines changed
Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
---
2+
title: Proposal EKS Support in CAPA for nodeadm
3+
authors:
4+
- "@faiq"
5+
reviewers:
6+
creation-date: 2025-09-22
7+
last-updated: 2025-09-22
8+
status: proposed
9+
see-also:
10+
- https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3518
11+
replaces: []
12+
superseded-by: []
13+
---
14+
15+
16+
## Table of Contents
17+
- [Summary](#summary)
18+
- [Motivation](#motivation)
19+
- [Goals](#goals)
20+
- [Non-Goals](#non-goals)
21+
- [Proposal](#proposal)
22+
- [User Stories](#user-stories)
23+
- [Alternatives](#alternatives)
24+
25+
26+
## Summary
27+
28+
Currently, EKS support in the Cluster API Provider for AWS (CAPA) is broken for Amazon Linux 2023 (AL2023) because the `bootstrap.sh` script is no longer supported. This proposal introduces a new Custom Resource Definition (CRD), `NodeadmConfig`, to handle the new `nodeadm` bootstrapping method required by AL2023. This approach is favored over modifying the existing `EKSConfig` type to maintain a cleaner API, avoid fields that are not cross-compatible between bootstrapping methods, and simplify the future deprecation of the `bootstrap.sh` implementation.
29+
30+
-----
31+
32+
## Motivation
33+
34+
Currently EKS support in CAPA is broken for AL2023 (Amazon Linux 2023) because the bootstrapping method that was previously being used to provision EKS nodes is no longer supported `bootstrap.sh`. Users who are using AL2023 see errors like this on the worker nodes:
35+
36+
```bash
37+
[root@localhost bin]# /etc/eks/bootstrap.sh default_dk-eks-133-control-plane
38+
39+
40+
\!\!\!\!\!\!\!\!\!\!
41+
\!\!\!\!\!\!\!\!\!\! ERROR: bootstrap.sh has been removed from AL2023-based EKS AMIs.
42+
\!\!\!\!\!\!\!\!\!\!
43+
\!\!\!\!\!\!\!\!\!\! EKS nodes are now initialized by nodeadm.
44+
\!\!\!\!\!\!\!\!\!\!
45+
\!\!\!\!\!\!\!\!\!\! To migrate your user data, see:
46+
\!\!\!\!\!\!\!\!\!\!
47+
\!\!\!\!\!\!\!\!\!\! https://awslabs.github.io/amazon-eks-ami/nodeadm/
48+
\!\!\!\!\!\!\!\!\!\!
49+
50+
````
51+
52+
In CAPA our implementation of the EKS bootstrapping method is currently tied to the `bootstrap.sh` script and is implemented by the `EKSConfig` type.
53+
54+
Additionally, the EKS team is not publishing any more AmazonLinux (AL2) AMIs after November 26th, 2025, and Kubernetes version 1.32 is the last version for which AL2 AMIs will be released. This makes the transition to a new bootstrapping method for AL2023 urgent.
55+
56+
### Goals
57+
58+
* Restore the ability to provision EKS nodes using CAPA with AL2023 AMIs.
59+
* Introduce a new, clean API (`NodeadmConfig`) specifically for the `nodeadm` bootstrap method.
60+
* Provide a clear upgrade path for users moving from `EKSConfig` (`bootstrap.sh`) to `NodeadmConfig` (`nodeadm`).
61+
* Make future deprecation of the `bootstrap.sh` implementation in `EKSConfig` easier.
62+
63+
### Non-Goals
64+
* Create a metatype that can handle both bootstrap.sh and nodeadm.
65+
* Handle Operating Systems with different bootstrapping mechanisms like bottlerocket.
66+
-----
67+
68+
## Proposal
69+
70+
This KEP proposes a new type that handles bootstrapping with `nodeadm` alone. This new type, `NodeadmConfig`, will wrap the API implementation for the Nodeadm option as a bootstrap provider.
71+
72+
This approach is proposed due to drawbacks with the alternative of modifying the existing `EKSConfig` type, which would involve the introduction of new fields to distinguish between bootstrap methods and lead to a confusing API where some fields are only valid for one method.
73+
74+
Examples of fields in the existing API that are no longer valid with `nodeadm`:
75+
76+
* `ContainerRuntime`
77+
* `DNSClusterIP`
78+
* `DockerConfigJSON`
79+
* `APIRetryAttempts`
80+
* `PostBootstrapCommands`
81+
* `BootstrapCommandOverride`
82+
83+
The **pros** of this approach are:
84+
85+
* A cleaner API that’s more descriptive for each bootstrap method.
86+
* A new implementation will make deprecating EKSConfig’s `bootstrap.sh` implementation easier.
87+
88+
The **cons** are:
89+
90+
* The scope of work to support EKS nodes grows significantly and is pushed out.
91+
* Users need to know that setting `AWSMachine.Spec.Ami.EKSLookupType` to `AmazonLinux` won’t work with the new `nodeadm` bootstrap method.
92+
93+
### User Stories
94+
95+
* As a cluster admin I need to provision nodes to my EKS cluster using Kubernetes 1.33 or higher
96+
* As a cluster admin I need to provision EKS worker nodes using the latest AL2023 AMIs
97+
* As a cluster admin I need to upgrade my existing EKS cluster nodes from an AL2-based version (e.g., 1.32) to an AL2023-based version (e.g., 1.33) with minimal disruption.
98+
99+
### API Design
100+
101+
On a high level this new type `NodeadmConfig` wraps the API implementation for the Nodeadm option as a bootstrap provider.
102+
103+
```go
104+
// NodeadmConfigSpec defines the desired state of NodeadmConfig.
105+
type NodeadmConfigSpec struct {
106+
// Kubelet contains options for kubelet.
107+
// +optional
108+
Kubelet *KubeletOptions `json:"kubelet,omitempty"`
109+
110+
// Containerd contains options for containerd.
111+
// +optional
112+
Containerd *ContainerdOptions `json:"containerd,omitempty"`
113+
114+
// Instance contains options for the node's operating system and devices.
115+
// +optional
116+
Instance *InstanceOptions `json:"instance,omitempty"`
117+
118+
// FeatureGates holds key-value pairs to enable or disable application features.
119+
// +optional
120+
FeatureGates map[Feature]bool `json:"featureGates,omitempty"`
121+
122+
// PreBootstrapCommands specifies extra commands to run before bootstrapping nodes.
123+
// +optional
124+
PreBootstrapCommands []string `json:"preBootstrapCommands,omitempty"`
125+
126+
// Files specifies extra files to be passed to user_data upon creation.
127+
// +optional
128+
Files []File `json:"files,omitempty"`
129+
130+
// Users specifies extra users to add.
131+
// +optional
132+
Users []User `json:"users,omitempty"`
133+
134+
// NTP specifies NTP configuration.
135+
// +optional
136+
NTP *NTP `json:"ntp,omitempty"`
137+
138+
// DiskSetup specifies options for the creation of partition tables and file systems on devices.
139+
// +optional
140+
DiskSetup *DiskSetup `json:"diskSetup,omitempty"`
141+
142+
// Mounts specifies a list of mount points to be setup.
143+
// +optional
144+
Mounts []MountPoints `json:"mounts,omitempty"`
145+
}
146+
```
147+
148+
-----
149+
150+
## Design Details
151+
152+
### Upgrade Strategy
153+
154+
A valid concern that CAPA users will have is upgrading existing clusters to machines that use the new bootstrap `Nodeadm` CRD. This is largely assuaged by CAPI's immutable design where new machines will be deployed by referencing new templates. In summary, upgrading should not be a problem.
155+
156+
#### MachineDeployment Upgrade Example
157+
158+
A user with a `MachineDeployment` using `EKSConfig` for Kubernetes v1.32 would upgrade to v1.33 by creating a new `NodeadmConfigTemplate` and updating the `MachineDeployment` to reference it and the new Kubernetes version. New machines are rolled out according to the `MachineDeployment` update strategy.
159+
160+
**Before (v1.32 with `EKSConfigTemplate`):**
161+
162+
```yaml
163+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
164+
kind: EKSConfigTemplate
165+
metadata:
166+
name: default132
167+
spec:
168+
template:
169+
spec:
170+
postBootstrapCommands:
171+
- "echo \"bye world\""
172+
---
173+
apiVersion: cluster.x-k8s.io/v1beta1
174+
kind: MachineDeployment
175+
metadata:
176+
name: default
177+
spec:
178+
clusterName: default
179+
template:
180+
spec:
181+
bootstrap:
182+
configRef:
183+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
184+
kind: EKSConfigTemplate
185+
name: default132
186+
infrastructureRef:
187+
kind: AWSMachineTemplate
188+
name: default132
189+
version: v1.32.0
190+
````
191+
192+
**After (v1.33 with `NodeadmConfigTemplate`):**
193+
194+
```yaml
195+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
196+
kind: NodeadmConfigTemplate
197+
metadata:
198+
name: default
199+
spec:
200+
template:
201+
spec:
202+
preBootstrapCommands:
203+
- "echo \"hello world\""
204+
---
205+
apiVersion: cluster.x-k8s.io/v1beta1
206+
kind: MachineDeployment
207+
metadata:
208+
name: default
209+
spec:
210+
clusterName: default
211+
template:
212+
spec:
213+
bootstrap:
214+
configRef:
215+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
216+
kind: NodeadmConfigTemplate
217+
name: default
218+
infrastructureRef:
219+
kind: AWSMachineTemplate
220+
name: default
221+
version: v1.33.0
222+
```
223+
224+
#### MachinePool Upgrade Example
225+
226+
The flow would be very similar for `MachinePools`. A user would update the `MachinePool` resource to reference a new `NodeadmConfigTemplate` and the target Kubernetes version.
227+
228+
**Before (v1.32 with `EKSConfigTemplate`):**
229+
230+
```yaml
231+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
232+
kind: EKSConfigTemplate
233+
metadata:
234+
name: default-132
235+
spec:
236+
template:
237+
spec: {}
238+
---
239+
apiVersion: cluster.x-k8s.io/v1beta1
240+
kind: MachinePool
241+
metadata:
242+
name: default
243+
spec:
244+
clusterName: default
245+
template:
246+
spec:
247+
bootstrap:
248+
configRef:
249+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
250+
kind: EKSConfigTemplate
251+
name: default-132
252+
infrastructureRef:
253+
kind: AWSMachinePool
254+
name: default
255+
version: v1.32.0
256+
```
257+
258+
**After (v1.33 with `NodeadmConfigTemplate`):**
259+
260+
```yaml
261+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
262+
kind: NodeadmConfigTemplate
263+
metadata:
264+
name: default-133
265+
spec:
266+
template:
267+
spec:
268+
preBootstrapCommands:
269+
- "echo \"hello from v1.33.0\""
270+
---
271+
apiVersion: cluster.x-k8s.io/v1beta1
272+
kind: MachinePool
273+
metadata:
274+
name: default
275+
spec:
276+
clusterName: default
277+
template:
278+
spec:
279+
bootstrap:
280+
configRef:
281+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
282+
kind: NodeadmConfigTemplate
283+
name: default-133
284+
infrastructureRef:
285+
kind: AWSMachinePool
286+
name: default
287+
version: v1.33.0
288+
```
289+
290+
### Test Plan
291+
292+
* Unit tests for the new code.
293+
* Integration tests for new Nodeadm Controller.
294+
* E2e tests exercising the migration from EKSConfig to NodeadmConfig,
295+
296+
297+
## Drawbacks
298+
299+
* The scope of work to support EKS nodes grows significantly and is pushed out.
300+
* Users need to know that setting `AWSMachine.Spec.Ami.EKSLookupType` to `AmazonLinux` won’t work with the new `nodeadm` bootstrap method.
301+
302+
-----
303+
304+
## Alternatives
305+
306+
The primary alternative considered was to modify the existing `EKSConfig` type to support `nodeadm`. Currently, there’s work being done upstream to address this gap. On a high level, [this PR](https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5553) is adding a new bootstrapping implementation to the existing `EKSConfig` type with some additional API fields to distinguish between bootstrap methods.
307+
308+
However, there are some drawbacks with this implementation regarding the API design:
309+
310+
* **Introduction of new fields to distinguish between bootstrap methods**: This complicates the API.
311+
* **Fields that are valid for `bootstrap.sh` are not valid for `nodeadm` and vice versa**: This would lead to a confusing user experience where users could set fields that have no effect for their chosen bootstrap method.

0 commit comments

Comments
 (0)