Skip to content

Conversation

@zhanggbj
Copy link
Contributor

@zhanggbj zhanggbj commented Oct 28, 2025

What this PR does / why we need it:
Support Node Auto Placement and Node AF/AAF

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

zhanggbj and others added 11 commits October 22, 2025 14:19
- Bump VMOP including Node AF/AAF support
- Add NodeAutoPlacement Feature Gate

(cherry picked from commit 700c8ae)
Removes the extra cases for VMG creation, such that VMG is created for:
1. Multiple zones, multiple MDs with no failureDomain
2. Multiple zones, multiple MDs with failureDomain
3. Single zone, existing cluster with no failureDomain MDs

Signed-off-by: Sagar Muchhal <sagar.muchhal@broadcom.com>
- Updates VMOP API dependency

Misc VMG fixes
- Use namingStrategy to calculate VM names
- Use MachineDeployment names for VMG placement label
- Includes all machinedeployments to generate node-pool -> zone
  mapping

Fixes VMG webhook validation error
- Adds cluster-name label to Af/AAF spec
- re-adds zone topology key back to anti-aff spec

Signed-off-by: Sagar Muchhal <sagar.muchhal@broadcom.com>
Signed-off-by: Sagar Muchhal <sagar.muchhal@broadcom.com>
Signed-off-by: Sagar Muchhal <sagar.muchhal@broadcom.com>
…gs#71)

* Refine VMG controller when generate per-MD zone labels

- Skip legacy already-placed VM which do not have placement info
- Skip VM which do not have zone info

* Apply suggestions from code review

---------

Co-authored-by: Sagar Muchhal <sagar.muchhal@broadcom.com>
- Sync VSphereMachines during day-2 operations in VMG controller
- Only wait for all intended VSphereMachines during initial Cluster creation
- Use annotations in VMG for per-md-zone info

Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
- Add VMG recociler unit test
- Bump VMOP due to API change
- Filter out VSphereMachine event except create/delete events

Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 28, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 28, 2025
@zhanggbj zhanggbj changed the title [WIP][DoNotReview]Support Node Auto Placement and Node AF/AAF [WIP][DoNotReview]✨ Support Node Auto Placement and Node AF/AAF Oct 28, 2025
@zhanggbj zhanggbj force-pushed the node_auto_placement branch 3 times, most recently from 1160ce9 to 1cd61f9 Compare October 28, 2025 10:31
@zhanggbj
Copy link
Contributor Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@zhanggbj: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
/test pull-cluster-api-provider-vsphere-test-main
/test pull-cluster-api-provider-vsphere-verify-main

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-vsphere-apidiff-main
/test pull-cluster-api-provider-vsphere-janitor-main

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-vsphere-apidiff-main
pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
pull-cluster-api-provider-vsphere-test-main
pull-cluster-api-provider-vsphere-verify-main

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zhanggbj
Copy link
Contributor Author

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
/test pull-cluster-api-provider-vsphere-test-main
/test pull-cluster-api-provider-vsphere-verify-main
/test pull-cluster-api-provider-vsphere-apidiff-main
/test pull-cluster-api-provider-vsphere-janitor-main

@zhanggbj zhanggbj force-pushed the node_auto_placement branch 7 times, most recently from 22604bf to 7c7c1a7 Compare November 3, 2025 08:55
- Refine VMG controller watch
- Handle race conditions in VMG controller by gating member update
- Refine data struct for VM Affinity config
- Refine UT
- Refine naming, logging, godoc
- Miscellaneous

Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from ea432ed to dc441f6 Compare November 21, 2025 13:29
Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
Copy link
Contributor Author

@zhanggbj zhanggbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments are addressed, including the ones missed by mistake last time.

zhanggbj

This comment was marked as spam.

@zhanggbj zhanggbj force-pushed the node_auto_placement branch 2 times, most recently from 8dabe39 to 19ebccb Compare November 23, 2025 13:27
Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from 19ebccb to b981168 Compare November 23, 2025 14:16
@zhanggbj
Copy link
Contributor Author

/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main

@zhanggbj zhanggbj force-pushed the node_auto_placement branch 3 times, most recently from 42a6d73 to 3ed2e93 Compare November 24, 2025 14:46
@zhanggbj
Copy link
Contributor Author

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main

@zhanggbj
Copy link
Contributor Author

Hey @fabriziopandini @sbueringer ,
All comments are addressed, including the ones missed by mistake last time. Now CI are green. Please help to take another look, really appreciate!

For the concern about race conditions, I have implemented checks to mitigate race conditions during initial and post-placement, and also have unit teststo cover the scenarios.

@zhanggbj zhanggbj force-pushed the node_auto_placement branch from 3ed2e93 to 9b1754a Compare November 25, 2025 06:44
@zhanggbj
Copy link
Contributor Author

Pushed a small change and checking UT failures.

Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from 9b1754a to 2ac47d4 Compare November 25, 2025 09:25
@zhanggbj
Copy link
Contributor Author

/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main

@zhanggbj
Copy link
Contributor Author

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main

Since Encryption Class requires API in this newer version, bump vm-operator package

Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from fd72f09 to c41abeb Compare November 26, 2025 07:49
zhanggbj and others added 5 commits November 27, 2025 11:46
Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
Signed-off-by: Gong Zhang <gong.zhang@broadcom.com>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from b8c569e to 08b8304 Compare November 27, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants