NetworkCompartmentId implementation + fixes for OCIManagedMachinePool #455

alcampag · 2025-09-26T13:05:34Z

What this PR does / why we need it:
This PR introduces a new field for OCIManagedCluster that allows for the network to be created in a different compartment than the one specified in OCIManagedCluster.Spec.CompartmentId

The new field is called OCIManagedCluster.Spec.NetworkSpec.CompartmentId, it is optional and if not set it will default to the only compartment specified.

In addition, this PR fixes many bugs on the managed machine pool controller. The controller logic caused continues updates on the node pool if the original OCIManagedCluster was modified. This is because not all fields were set as the actual managed node pool state and updated correctly.

The new reconciliation logic is following this design:

If the user has defined a value for a field in the OCIManagedCluster object, then the controller will try to synchronize it
If the field is not specified entirely, the controller will try to keep it as defined in the actual state of the node pool

Which issue(s) this PR fixes:
Fixes #450

joekr · 2025-09-29T13:25:50Z

What happens if a user 1) updates the network compartment ID to a new compartment or 2) deletes the network compartment id?

joekr · 2025-09-29T13:28:17Z

Instead of calling out all the places tests need to be added I'll just make the blanket statement here. We need to make sure we have some unit test coverage for this update. I know we talked about adding e2e tests around this already as well, but I'm just putting this in the comment for posterity.

api/v1beta1/types.go

api/v1beta2/types.go

exp/api/v1beta1/ocimanagedmachinepool_types.go

exp/api/v1beta2/ocimanagedmachinepool_types.go

cloud/scope/cluster.go

cloud/scope/oci_managed_cluster.go

cloud/scope/oci_selfmanaged_cluster.go

cloud/scope/subnet_reconciler.go

cloud/scope/subnet_reconciler_test.go

cloud/scope/managed_machine_pool.go

alcampag · 2025-09-30T09:47:42Z

I will need to think how to handle the networkCompartmentId update/deletion.

joekr · 2025-09-30T20:48:24Z

I will need to think how to handle the networkCompartmentId update/deletion.

We talked about this and we came to the conclusion that update/delete of the compartmentID won't work as it is setup now. We won't address this as part of this PR. At somepoint in the future we will induce this but it will need to leverage https://docs.oracle.com/en-us/iaas/api/#/en/iaas/latest/Vcn/ChangeVcnCompartment. This will be a bigger lift than just adding it as part of this PR.

…-oci

- Introduce new cluster template for OCIManagedCluster with separate network compartment. - Update e2e_conf.yaml to include the template and OCI_NETWORK_COMPARTMENT_ID variable. - Modify Makefile to generate the new template (replacing node-recycling build). - Add --load flag to docker-build for local image loading. - Create cluster.yaml defining the managed cluster and control plane specs.

alcampag · 2025-10-16T07:34:26Z

@joekr I have added the e2e template and merged with the latest release. If you see a test failing locally, it's because @vladcristi has implemented a new unit test for instance principals, which will fail on a local environment.
Only thing missing is the e2e for the NetworkCompartmentId, but I have already done my share of tests manually with different configurations.

joekr · 2025-11-03T18:01:13Z

Makefile

 	$(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-externally-managed-vcn --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-externally-managed-vcn.yaml
 	$(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-machine-pool --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-machine-pool.yaml
 	$(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-managed --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-managed.yaml
-	$(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-managed-node-recycling --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-managed-node-recycling.yaml


Should this be deleted?

joekr · 2025-11-03T20:52:36Z

cloud/scope/util.go

+		if err == nil && resp.Subnet.DisplayName != nil {
+			return *resp.Subnet.DisplayName
+		}
+	}


I don't think we want to add API calls into the util methods. 🤔

I think most things in this layer are really treating things as in-memory calls. What if we added these API calls to the respective controller? Would that solve the same issues being solve here?

joekr · 2025-11-10T16:30:36Z

...2e/data/infrastructure-oci/v1beta2/cluster-template-managed-network-compartment/cluster.yaml

+spec:
+  compartmentId: "${OCI_COMPARTMENT_ID}"
+  networkSpec:
+    compartmentId: "${OCI_NETWORK_COMPARTMENT_ID}"


NIT: do you mind adding a simple comment that calls out this line since this is the major change that we are testing?

joekr · 2025-11-10T16:34:10Z

controllers/ocimanagedcluster_controller.go

-		if err := r.reconcileComponent(ctx, ociManagedCluster, clusterScope.ReconcileVCN, "VCN",
-			infrastructurev1beta2.VcnReconciliationFailedReason, infrastructurev1beta2.VcnEventReady); err != nil {
+		if err := r.reconcileComponent(ctx, ociManagedCluster, clusterScope.ReconcileDRG, "DRG",
+			infrastructurev1beta2.DrgReconciliationFailedReason, infrastructurev1beta2.DrgEventReady); err != nil {


What is the purpose of reordering this?

exp/api/v1beta1/ocimanagedmachinepool_types.go

joekr · 2025-11-10T16:58:54Z

cloud/scope/managed_machine_pool.go

 }

-func (m *ManagedMachinePoolScope) getPodSubnets(subnets []string) []string {
+func (m *ManagedMachinePoolScope) getNetworkCompartmentId() string {


Do you mind adding a method comment. Something like

// getNetworkCompartmentId return compartmentId of the NetworkSpec if set, otherwise defaults to the OCIManagedCluster compartmentId

joekr · 2025-11-10T18:47:26Z

cloud/scope/managed_machine_pool.go

+					}
+					subnet.ID = &subnetId
+				}
+				subnetList = append(subnetList, *subnet.ID)


I'm not 100% sure what we want to do here. Originally we were pulling the spec pod subnets, which I believe get updated. So wouldn't the subnets be up to date here when reconciled?

If I'm reading this correctly if the subnet id in the spec is nil this will make API calls to get the subnet info. I understand that, but I guess I'm trying to understand why the subnet ID would be nil even after the reconcile. I'm probably not understanding something.

joekr · 2025-11-10T18:49:10Z

cloud/scope/managed_machine_pool.go

+			subnetList = podDetails.PodSubnetIds
+		}
+	}
+	return subnetList, nil


If no subnets found and we have an existing node pool

I don't understand how is this possible? Without a subnet we couldn't have a nodepool correct? again, probably not something I'm understanding.

joekr · 2025-11-10T18:50:38Z

cloud/scope/managed_machine_pool.go

+	if len(nsgList) == 0 && pool != nil && pool.NodeConfigDetails != nil {
+		if podDetails, ok := pool.NodeConfigDetails.NodePoolPodNetworkOptionDetails.(oke.OciVcnIpNativeNodePoolPodNetworkOptionDetails); ok {
+			nsgList = podDetails.PodNsgIds
+		}


If no NSGs found and we have an existing node pool, use the NSGs from the node pool

We will want to make sure this change is documented for the users.

joekr · 2025-11-10T19:01:53Z

cloud/scope/managed_machine_pool.go

+// The main reconciliation loop for the node pool is triggered by reflect.DeepEqual(spec, actual). While it's correct to update a node pool if the spec has changed,
+// this can lead to an infinite update loop for some parameters that are not specified in spec, but they are returned by the actual state of the node pool.
+// To solve this, spec is updated before the reflect.DeepEqual(spec, actual) only for those problematic parameters
+func setSpecFromActual(spec *infrav2exp.OCIManagedMachinePoolSpec, actual *expinfra1.OCIManagedMachinePoolSpec) {


This is going to become a nightmare to keep up to date if there are any changes to machinepool. There has to be a better way to do this. This seems very brittle.

This is mutating the spec at the very least. I wonder if we should at least make a copy instead of mutating the spec 🤔

What about something like instead compute a minimal “desired patch” (only fields the user specified) and trigger UpdateNodePool only if that patch has at least one field set.

I would have put this comment up in the UpdateNodePool method, but since the context is all here I thought I would put it here.

joekr · 2025-11-10T19:32:06Z

cloud/scope/vcn_reconciler.go

 		s.OCIClusterAccessor.GetNetworkSpec().Vcn.ID = vcn.Id
+		// Someone has updated the network compartment ID, or the VCN was moved. Return an error
+		if vcn.CompartmentId != nil && *vcn.CompartmentId != s.GetNetworkCompartmentId() {
+			return errors.New("CompartmentId of the VCN is not the same as the one specified in the spec")


I think we should probably try to catch this at the webhook layer. We can also leave this, but if the user wants to update it, the webhook might be catch it before even applying the change. Might be a better UX

alcampag and others added 2 commits September 26, 2025 14:37

network compartment, bug fixes

fcb4aeb

Merge branch 'oracle:main' into main

1e62e02

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 26, 2025

joekr requested review from dranicu, joekr and vladcristi September 26, 2025 13:12

joekr assigned alcampag Sep 26, 2025

joekr added bug Something isn't working enhancement New feature or request labels Sep 26, 2025

joekr reviewed Sep 29, 2025

View reviewed changes

alcampag and others added 5 commits October 2, 2025 14:37

Merge branch 'oracle:main' into main

c58e517

Finished implementing NetworkCompartmentId feature + some more fixes

31f6ecb

Merge branch 'main' of https://github.com/oracle/cluster-api-provider…

070fd4b

…-oci

Fixed and merged with main branch

7dcb15c

alcampag marked this pull request as ready for review October 16, 2025 07:30

alcampag requested a review from joekr October 16, 2025 07:30

joekr reviewed Nov 10, 2025

View reviewed changes

NetworkCompartmentId implementation + fixes for OCIManagedMachinePool #455

Are you sure you want to change the base?

NetworkCompartmentId implementation + fixes for OCIManagedMachinePool #455

Uh oh!

Conversation

alcampag commented Sep 26, 2025

Uh oh!

joekr commented Sep 29, 2025

Uh oh!

joekr commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alcampag commented Sep 30, 2025

Uh oh!

joekr commented Sep 30, 2025

Uh oh!

alcampag commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants