Skip to content

Commit 86e15f7

Browse files
authored
Merge pull request #5678 from faiq/faiq/add-nodeadm-kep
📖 KEP: adds nodedm proposal
2 parents 516f342 + 3cb4cdd commit 86e15f7

File tree

1 file changed

+302
-0
lines changed

1 file changed

+302
-0
lines changed
Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
---
2+
title: Proposal EKS Support in CAPA for nodeadm
3+
authors:
4+
- "@faiq"
5+
reviewers:
6+
creation-date: 2025-09-22
7+
last-updated: 2025-09-22
8+
status: proposed
9+
see-also:
10+
- https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3518
11+
replaces: []
12+
superseded-by: []
13+
---
14+
15+
16+
## Table of Contents
17+
- [Summary](#summary)
18+
- [Motivation](#motivation)
19+
- [Goals](#goals)
20+
- [Non-Goals](#non-goals)
21+
- [Proposal](#proposal)
22+
- [User Stories](#user-stories)
23+
- [Alternatives](#alternatives)
24+
25+
26+
## Summary
27+
28+
Currently, EKS support in the Cluster API Provider for AWS (CAPA) is broken for Amazon Linux 2023 (AL2023) because the `bootstrap.sh` script is no longer supported. This proposal introduces a new Custom Resource Definition (CRD), `NodeadmConfig`, to handle the new `nodeadm` bootstrapping method required by AL2023. This approach is favored over modifying the existing `EKSConfig` type to maintain a cleaner API, avoid fields that are not cross-compatible between bootstrapping methods, and simplify the future deprecation of the `bootstrap.sh` implementation.
29+
30+
-----
31+
32+
## Motivation
33+
34+
Currently EKS support in CAPA is broken for AL2023 (Amazon Linux 2023) because the bootstrapping method that was previously being used to provision EKS nodes is no longer supported `bootstrap.sh`. Users who are using AL2023 see errors like this on the worker nodes:
35+
36+
```bash
37+
[root@localhost bin]# /etc/eks/bootstrap.sh default_dk-eks-133-control-plane
38+
39+
40+
\!\!\!\!\!\!\!\!\!\!
41+
\!\!\!\!\!\!\!\!\!\! ERROR: bootstrap.sh has been removed from AL2023-based EKS AMIs.
42+
\!\!\!\!\!\!\!\!\!\!
43+
\!\!\!\!\!\!\!\!\!\! EKS nodes are now initialized by nodeadm.
44+
\!\!\!\!\!\!\!\!\!\!
45+
\!\!\!\!\!\!\!\!\!\! To migrate your user data, see:
46+
\!\!\!\!\!\!\!\!\!\!
47+
\!\!\!\!\!\!\!\!\!\! https://awslabs.github.io/amazon-eks-ami/nodeadm/
48+
\!\!\!\!\!\!\!\!\!\!
49+
50+
````
51+
52+
In CAPA our implementation of the EKS bootstrapping method is currently tied to the `bootstrap.sh` script and is implemented by the `EKSConfig` type.
53+
54+
Additionally, the EKS team is not publishing any more AmazonLinux (AL2) AMIs after November 26th, 2025, and Kubernetes version 1.32 is the last version for which AL2 AMIs will be released. This makes the transition to a new bootstrapping method for AL2023 urgent.
55+
56+
### Goals
57+
58+
* Restore the ability to provision EKS nodes using CAPA with AL2023 AMIs.
59+
* Introduce a new, clean API (`NodeadmConfig`) specifically for the `nodeadm` bootstrap method.
60+
* Provide a clear upgrade path for users moving from `EKSConfig` (`bootstrap.sh`) to `NodeadmConfig` (`nodeadm`).
61+
* Make future deprecation of the `bootstrap.sh` implementation in `EKSConfig` easier.
62+
63+
### Non-Goals
64+
* Create a metatype that can handle both bootstrap.sh and nodeadm.
65+
* Handle Operating Systems with different bootstrapping mechanisms like bottlerocket.
66+
-----
67+
68+
## Proposal
69+
70+
This KEP proposes a new type that handles bootstrapping with `nodeadm` alone. This new type, `NodeadmConfig`, will wrap the API implementation for the Nodeadm option as a bootstrap provider.
71+
72+
This approach is proposed due to drawbacks with the alternative of modifying the existing `EKSConfig` type, which would involve the introduction of new fields to distinguish between bootstrap methods and lead to a confusing API where some fields are only valid for one method.
73+
74+
Examples of fields in the existing API that are no longer valid with `nodeadm`:
75+
76+
* `ContainerRuntime`
77+
* `DNSClusterIP`
78+
* `DockerConfigJSON`
79+
* `APIRetryAttempts`
80+
* `PostBootstrapCommands`
81+
* `BootstrapCommandOverride`
82+
83+
The **pros** of this approach are:
84+
85+
* A cleaner API that’s more descriptive for each bootstrap method.
86+
* A new implementation will make deprecating EKSConfig’s `bootstrap.sh` implementation easier.
87+
88+
The **cons** are:
89+
90+
* The scope of work to support EKS nodes grows significantly and is pushed out.
91+
92+
### User Stories
93+
94+
* As a cluster admin I need to provision nodes to my EKS cluster using Kubernetes 1.33 or higher
95+
* As a cluster admin I need to provision EKS worker nodes using the latest AL2023 AMIs
96+
* As a cluster admin I need to upgrade my existing EKS cluster nodes from an AL2-based version (e.g., 1.32) to an AL2023-based version (e.g., 1.33) with minimal disruption.
97+
98+
### API Design
99+
100+
On a high level this new type `NodeadmConfig` wraps the API implementation for the Nodeadm option as a bootstrap provider.
101+
102+
```go
103+
// NodeadmConfigSpec defines the desired state of NodeadmConfig.
104+
type NodeadmConfigSpec struct {
105+
// Kubelet contains options for kubelet.
106+
// +optional
107+
Kubelet *KubeletOptions `json:"kubelet,omitempty"`
108+
109+
// Containerd contains options for containerd.
110+
// +optional
111+
Containerd *ContainerdOptions `json:"containerd,omitempty"`
112+
113+
// FeatureGates holds key-value pairs to enable or disable application features.
114+
// +optional
115+
FeatureGates map[Feature]bool `json:"featureGates,omitempty"`
116+
117+
// PreBootstrapCommands specifies extra commands to run before bootstrapping nodes.
118+
// +optional
119+
PreBootstrapCommands []string `json:"preBootstrapCommands,omitempty"`
120+
121+
// Files specifies extra files to be passed to user_data upon creation.
122+
// +optional
123+
Files []File `json:"files,omitempty"`
124+
125+
// Users specifies extra users to add.
126+
// +optional
127+
Users []User `json:"users,omitempty"`
128+
129+
// NTP specifies NTP configuration.
130+
// +optional
131+
NTP *NTP `json:"ntp,omitempty"`
132+
133+
// DiskSetup specifies options for the creation of partition tables and file systems on devices.
134+
// +optional
135+
DiskSetup *DiskSetup `json:"diskSetup,omitempty"`
136+
137+
// Mounts specifies a list of mount points to be setup.
138+
// +optional
139+
Mounts []MountPoints `json:"mounts,omitempty"`
140+
}
141+
```
142+
143+
-----
144+
145+
## Design Details
146+
147+
### Upgrade Strategy
148+
149+
A valid concern that CAPA users will have is upgrading existing clusters to machines that use the new bootstrap `Nodeadm` CRD. This KEP does not change the process. As before, the user will reference a new BootstrapConfigTemplate. However, the kind will change from EKSConfigTemplate to NodeadmConfigTemplate.
150+
151+
#### MachineDeployment Upgrade Example
152+
153+
A user with a `MachineDeployment` using `EKSConfig` for Kubernetes v1.32 would upgrade to v1.33 by creating a new `NodeadmConfigTemplate` and updating the `MachineDeployment` to reference it and the new Kubernetes version. New machines are rolled out according to the `MachineDeployment` update strategy.
154+
155+
**Before (v1.32 with `EKSConfigTemplate`):**
156+
157+
```yaml
158+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
159+
kind: EKSConfigTemplate
160+
metadata:
161+
name: default132
162+
spec:
163+
template:
164+
spec:
165+
postBootstrapCommands:
166+
- "echo \"bye world\""
167+
---
168+
apiVersion: cluster.x-k8s.io/v1beta1
169+
kind: MachineDeployment
170+
metadata:
171+
name: default
172+
spec:
173+
clusterName: default
174+
template:
175+
spec:
176+
bootstrap:
177+
configRef:
178+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
179+
kind: EKSConfigTemplate
180+
name: default132
181+
infrastructureRef:
182+
kind: AWSMachineTemplate
183+
name: default132
184+
version: v1.32.0
185+
````
186+
187+
**After (v1.33 with `NodeadmConfigTemplate`):**
188+
189+
```yaml
190+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
191+
kind: NodeadmConfigTemplate
192+
metadata:
193+
name: default
194+
spec:
195+
template:
196+
spec:
197+
preBootstrapCommands:
198+
- "echo \"hello world\""
199+
---
200+
apiVersion: cluster.x-k8s.io/v1beta1
201+
kind: MachineDeployment
202+
metadata:
203+
name: default
204+
spec:
205+
clusterName: default
206+
template:
207+
spec:
208+
bootstrap:
209+
configRef:
210+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
211+
kind: NodeadmConfigTemplate
212+
name: default
213+
infrastructureRef:
214+
kind: AWSMachineTemplate
215+
name: default
216+
version: v1.33.0
217+
```
218+
219+
#### MachinePool Upgrade Example
220+
221+
The flow would be very similar for `MachinePools`. A user would update the `MachinePool` resource to reference a new `NodeadmConfigTemplate` and the target Kubernetes version.
222+
223+
**Before (v1.32 with `EKSConfigTemplate`):**
224+
225+
```yaml
226+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
227+
kind: EKSConfigTemplate
228+
metadata:
229+
name: default-132
230+
spec:
231+
template:
232+
spec: {}
233+
---
234+
apiVersion: cluster.x-k8s.io/v1beta1
235+
kind: MachinePool
236+
metadata:
237+
name: default
238+
spec:
239+
clusterName: default
240+
template:
241+
spec:
242+
bootstrap:
243+
configRef:
244+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
245+
kind: EKSConfigTemplate
246+
name: default-132
247+
infrastructureRef:
248+
kind: AWSMachinePool
249+
name: default
250+
version: v1.32.0
251+
```
252+
253+
**After (v1.33 with `NodeadmConfigTemplate`):**
254+
255+
```yaml
256+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
257+
kind: NodeadmConfigTemplate
258+
metadata:
259+
name: default-133
260+
spec:
261+
template:
262+
spec:
263+
preBootstrapCommands:
264+
- "echo \"hello from v1.33.0\""
265+
---
266+
apiVersion: cluster.x-k8s.io/v1beta1
267+
kind: MachinePool
268+
metadata:
269+
name: default
270+
spec:
271+
clusterName: default
272+
template:
273+
spec:
274+
bootstrap:
275+
configRef:
276+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
277+
kind: NodeadmConfigTemplate
278+
name: default-133
279+
infrastructureRef:
280+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
281+
kind: AWSMachinePool
282+
name: default
283+
version: v1.33.0
284+
```
285+
286+
### Test Plan
287+
288+
* Unit tests for the new code.
289+
* Integration tests for new Nodeadm Controller.
290+
* E2e tests exercising the migration from EKSConfig to NodeadmConfig,
291+
292+
293+
-----
294+
295+
## Alternatives
296+
297+
The primary alternative considered was to modify the existing `EKSConfig` type to support `nodeadm`. Currently, there’s work being done upstream to address this gap. On a high level, [this PR](https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5553) is adding a new bootstrapping implementation to the existing `EKSConfig` type with some additional API fields to distinguish between bootstrap methods.
298+
299+
However, there are some drawbacks with this implementation regarding the API design:
300+
301+
* **Introduction of new fields to distinguish between bootstrap methods**: This complicates the API.
302+
* **Fields that are valid for `bootstrap.sh` are not valid for `nodeadm` and vice versa**: This would lead to a confusing user experience where users could set fields that have no effect for their chosen bootstrap method.

0 commit comments

Comments
 (0)