-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
This is a tracking issue for the implementation of in-place updates. At the moment, it only covers the work required for the initial phase of the project to reach the experimental (alpha) stage.
The design and approach are described in the in-place updates proposal.
-
Initial refactorings / prereqs:
- ✨ Ensure ExtensionConfig controller can be used outside of the core provider #12754
- 🌱 Remove unused CleanUpManagedFieldsForSSAAdoption code #12788
- 🌱 Refactor BootstrapConfig/InfraMachine creation in MachineSet controller #12881
- ✨ KCP/MS: Refactor BootstrapConfig/InfraMachine managedFields for in-place #12890
-
Add
InPlaceUpdatesfeature gate: ✨Add inplace updates featuregate #12755 -
Introduce Runtime Hook API changes, see examples section of the proposal for details.
- Introduce CanUpdateMachine/CanUpdateMachineSet hooks and corresponding Request/Response types @alexander-demicev
- Introduce UpdateMachine hook and corresponding Request/Response types @alexander-demicev
-
Modify core controllers
- [@sbueringer] KubeadmControlPlane controller updates
- ✨ KCP: compare ClusterConfiguration via KubeadmConfig instead of annotation on Machine #12758
- 🌱 Simplify cleanupConfigFields in KCP #12776
- 🌱 Cleanup KCP code: variable/func renames, func order #12793
- 🌱 Simplify KCP matchesKubeadmConfig #12813
- 🌱 KCP: Add current/desired objects to NotUpToDateResult & refactor object creation #12817
- ✨ KCP: Extend rollout logic for in-place updates #12840
- ✨ KCP: Implement CanUpdateMachine #12857
- ✨ KCP: implement trigger in-place update #12897
- Consider: EligibleForInPlaceUpdate computation ✨ KCP: Extend rollout logic for in-place updates #12840 (comment)
- 🌱 Adjust UpToDate condition to consider Updating, move UpToDate condition to Machine ctrl for workers #12959
- See these parts of the proposal for details:
- [@fabriziopandini] MachineDeployment controller updates
- Add rollout planner part1
- RolloutUpdate strategy 🌱 Add rollout planner #12804
- OnDelete strategy 🐛 Fix race conditions ScaleDownOldMS OnDelete #12830
- Create newMS, apply changes 🌱 Move compute and create ms to rollout planner #12841
- Preliminary refactor 🌱 Refactor MachineTemplateUpToDate #12811
- Fix issues in scale down old MS
- RolloutUpdate strategy 🐛 Fix race conditions ScaleDownOldMS #12812
- OnDelete strategy 🐛 Fix race conditions ScaleDownOldMS OnDelete #12830
- Add support for in-place in rollout planner: ✨ Add in-place to rollout planner #12865
- ✨ MD: Implement CanUpdateMachineSet #12965
- Change MS controller to handle move
- 🌱 Add in-place to machineset controller #12906
- Consider how to handle pending machines e.g. for deletion ✨ Add in-place to rollout planner #12865 (comment)
- Consider how to handle when new MS reaches desired number of machines ✨ Add in-place to rollout planner #12865 (comment)
- Align fake machine controller to the implementation in prod code, see e.g. ✨ Add in-place to rollout planner #12865 (comment)
- 🌱 Adjust UpToDate condition to consider Updating, move UpToDate condition to Machine ctrl for workers #12959
- See these parts of the proposal for details:
- Add rollout planner part1
- [@alexander-demicev] Machine controller modifications
- [(@furkatgofurov7)] MHC
- [@sbueringer] KubeadmControlPlane controller updates
-
Follow-ups
- ✨ Introduce & use wait for cache utils #12957 (@sbueringer)
- Consider if to improve SSA helper to add items to the cache immediately after apply: 🌱 Add items to cache immediately after apply #12877 (@lentzi90)
- Runtime Client: Use GetAllExtensions in CallAllExtensions: 🌱 Deduplicate extension filtering and response validation logic #12905
- 🌱 Add defensive response status checking in runtime client #12898
- 🌱 Cleanup getMachinesSucceeded flag from MD controller #12882
- Ensure KCP/MS controller do not trigger in-place if the Machine is not yet fully provisioned: ✨ Add in-place updates support for machine controller #12831 (comment)
- Align logging / error handling & events between KCP & MS controllers: 🌱 Add in-place to machineset controller #12906 (comment)
-
[@sbueringer] E2E testing:
-
Create documentation
- Feature flag configuration
- New runtime hook and its API contracts
- Extend the ControlPlane contract with a (optional) rule about in-place
- Updater structure and logic
- Guide to implementing extensions
- Explanation of the CAPD Kubeadm Updater
- Tutorials for usage
-
Update proposal
- Consider adding PR description of ✨ KCP/MS: Refactor BootstrapConfig/InfraMachine managedFields for in-place #12890
- Consider adding parts of the implementation design doc: https://docs.google.com/document/d/1MuhwSL-1ZMsiMoEHE9fVRha9Wkb5ORUFCLSMVNK_mTA/edit?tab=t.0#heading=h.lr4ie3qnitfn
Follow-up (after everything else is done, possibly in the next relase cyle)
- Improve InfraMachine controller to only add finalizer after InfraMachine has owner (avoids retries on conflict in RemoveManagedFieldsForLabelsAndAnnotations) (ask @sbueringer for more details, same in CAPV)
- [MHC] We should probably find a way to remediate based on Ready & Available Machine condition
- [MHC] More advanced MHC behavior (e.g. different behavior during in-place update)
- KCP: Optimize preflightCheck calls: ✨ KCP: Implement CanUpdateMachine #12857 (comment)
- KCP should not spam "Rolling out Control Plane machines.." while waiting for preflightChecks to pass
- Consider if to improve the MS definition of Unhealthy when picking machines for deletion, see 🐛 Fix race conditions ScaleDownOldMS #12812 (comment)
- Improve Machine controller unit tests to share more setup and verification code across tests in table tests: ✨ Add in-place updates support for machine controller #12831 (comment)
- Improve ExtensionConfig pause behavior: updating the registry is also paused, warmup runnable ignores pause today
- Improve how to configure to which “objects” a RuntimeExtension applies
- Goal: We should avoid unnecessary CanUpdateMachine/MachineSet calls (e.g. for the wrong infra provider)
- Ideas: ExtensionConfig objectSelector (like FieldSelectorRequirement), Extend response of Discovery call (e.g. objectSelector (like FieldSelectorRequirement) or “infraProviderKind”)
- Hook ordering
- Note: It’s important that hooks of different update extensions are called in the same order for CanUpdateMachine/CanUpdateMachineSet and UpdateMachine hooks
- We are overwriting Machine / InfraMachine / BootstrapConfig in-place.This is better than trying to rotate InfraMachine / BootstrapConfig as rotation would be very hard to handle for infra providers. With some infra providers InfraMachines are immutable, this would have to be changed if they want to start supporting in-place updates of InfraMachines.
- TBD: If immutability check should be dropped entirely or only when the update comes from core CAPI (this could maybe be detected via the UserAgent header example)
- Probably need some docs for in-place updates with infra providers (e.g. to cover what they should or shouldn't do once an InfraMachine was updated)
- Anything to do for autoscaler?
Docs
- Implementation design: https://docs.google.com/document/d/1MuhwSL-1ZMsiMoEHE9fVRha9Wkb5ORUFCLSMVNK_mTA/edit?tab=t.0
furkatgofurov7, hyunsun, s3rj1k, dkoshkin and nikParasyr
Metadata
Metadata
Assignees
Labels
help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.