-
Notifications
You must be signed in to change notification settings - Fork 38
NetworkCompartmentId implementation + fixes for OCIManagedMachinePool #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
What happens if a user 1) updates the network compartment ID to a new compartment or 2) deletes the network compartment id? |
|
Instead of calling out all the places tests need to be added I'll just make the blanket statement here. We need to make sure we have some unit test coverage for this update. I know we talked about adding e2e tests around this already as well, but I'm just putting this in the comment for posterity. |
|
I will need to think how to handle the networkCompartmentId update/deletion. |
We talked about this and we came to the conclusion that update/delete of the compartmentID won't work as it is setup now. We won't address this as part of this PR. At somepoint in the future we will induce this but it will need to leverage https://docs.oracle.com/en-us/iaas/api/#/en/iaas/latest/Vcn/ChangeVcnCompartment. This will be a bigger lift than just adding it as part of this PR. |
- Introduce new cluster template for OCIManagedCluster with separate network compartment. - Update e2e_conf.yaml to include the template and OCI_NETWORK_COMPARTMENT_ID variable. - Modify Makefile to generate the new template (replacing node-recycling build). - Add --load flag to docker-build for local image loading. - Create cluster.yaml defining the managed cluster and control plane specs.
|
@joekr I have added the e2e template and merged with the latest release. If you see a test failing locally, it's because @vladcristi has implemented a new unit test for instance principals, which will fail on a local environment. |
| $(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-externally-managed-vcn --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-externally-managed-vcn.yaml | ||
| $(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-machine-pool --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-machine-pool.yaml | ||
| $(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-managed --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-managed.yaml | ||
| $(KUSTOMIZE) build $(OCI_TEMPLATES)/v1beta2/cluster-template-managed-node-recycling --load-restrictor LoadRestrictionsNone > $(OCI_TEMPLATES)/v1beta2/cluster-template-managed-node-recycling.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be deleted?
| if err == nil && resp.Subnet.DisplayName != nil { | ||
| return *resp.Subnet.DisplayName | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to add API calls into the util methods. 🤔
I think most things in this layer are really treating things as in-memory calls. What if we added these API calls to the respective controller? Would that solve the same issues being solve here?
| spec: | ||
| compartmentId: "${OCI_COMPARTMENT_ID}" | ||
| networkSpec: | ||
| compartmentId: "${OCI_NETWORK_COMPARTMENT_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: do you mind adding a simple comment that calls out this line since this is the major change that we are testing?
| if err := r.reconcileComponent(ctx, ociManagedCluster, clusterScope.ReconcileVCN, "VCN", | ||
| infrastructurev1beta2.VcnReconciliationFailedReason, infrastructurev1beta2.VcnEventReady); err != nil { | ||
| if err := r.reconcileComponent(ctx, ociManagedCluster, clusterScope.ReconcileDRG, "DRG", | ||
| infrastructurev1beta2.DrgReconciliationFailedReason, infrastructurev1beta2.DrgEventReady); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of reordering this?
| } | ||
|
|
||
| func (m *ManagedMachinePoolScope) getPodSubnets(subnets []string) []string { | ||
| func (m *ManagedMachinePoolScope) getNetworkCompartmentId() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding a method comment. Something like
// getNetworkCompartmentId return compartmentId of the NetworkSpec if set, otherwise defaults to the OCIManagedCluster compartmentId
| } | ||
| subnet.ID = &subnetId | ||
| } | ||
| subnetList = append(subnetList, *subnet.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure what we want to do here. Originally we were pulling the spec pod subnets, which I believe get updated. So wouldn't the subnets be up to date here when reconciled?
If I'm reading this correctly if the subnet id in the spec is nil this will make API calls to get the subnet info. I understand that, but I guess I'm trying to understand why the subnet ID would be nil even after the reconcile. I'm probably not understanding something.
| subnetList = podDetails.PodSubnetIds | ||
| } | ||
| } | ||
| return subnetList, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no subnets found and we have an existing node pool
I don't understand how is this possible? Without a subnet we couldn't have a nodepool correct? again, probably not something I'm understanding.
| if len(nsgList) == 0 && pool != nil && pool.NodeConfigDetails != nil { | ||
| if podDetails, ok := pool.NodeConfigDetails.NodePoolPodNetworkOptionDetails.(oke.OciVcnIpNativeNodePoolPodNetworkOptionDetails); ok { | ||
| nsgList = podDetails.PodNsgIds | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no NSGs found and we have an existing node pool, use the NSGs from the node pool
We will want to make sure this change is documented for the users.
| // The main reconciliation loop for the node pool is triggered by reflect.DeepEqual(spec, actual). While it's correct to update a node pool if the spec has changed, | ||
| // this can lead to an infinite update loop for some parameters that are not specified in spec, but they are returned by the actual state of the node pool. | ||
| // To solve this, spec is updated before the reflect.DeepEqual(spec, actual) only for those problematic parameters | ||
| func setSpecFromActual(spec *infrav2exp.OCIManagedMachinePoolSpec, actual *expinfra1.OCIManagedMachinePoolSpec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to become a nightmare to keep up to date if there are any changes to machinepool. There has to be a better way to do this. This seems very brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mutating the spec at the very least. I wonder if we should at least make a copy instead of mutating the spec 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like instead compute a minimal “desired patch” (only fields the user specified) and trigger UpdateNodePool only if that patch has at least one field set.
I would have put this comment up in the UpdateNodePool method, but since the context is all here I thought I would put it here.
| s.OCIClusterAccessor.GetNetworkSpec().Vcn.ID = vcn.Id | ||
| // Someone has updated the network compartment ID, or the VCN was moved. Return an error | ||
| if vcn.CompartmentId != nil && *vcn.CompartmentId != s.GetNetworkCompartmentId() { | ||
| return errors.New("CompartmentId of the VCN is not the same as the one specified in the spec") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably try to catch this at the webhook layer. We can also leave this, but if the user wants to update it, the webhook might be catch it before even applying the change. Might be a better UX
What this PR does / why we need it:
This PR introduces a new field for OCIManagedCluster that allows for the network to be created in a different compartment than the one specified in
OCIManagedCluster.Spec.CompartmentIdThe new field is called
OCIManagedCluster.Spec.NetworkSpec.CompartmentId, it is optional and if not set it will default to the only compartment specified.In addition, this PR fixes many bugs on the managed machine pool controller. The controller logic caused continues updates on the node pool if the original OCIManagedCluster was modified. This is because not all fields were set as the actual managed node pool state and updated correctly.
The new reconciliation logic is following this design:
Which issue(s) this PR fixes:
Fixes #450