From 53b12414dac6750baa687362c45c340cd24153cc Mon Sep 17 00:00:00 2001 From: Richard Case Date: Fri, 7 Nov 2025 14:40:06 +0000 Subject: [PATCH 1/4] docs: machinepool contract spec This adds documentation that details the contract for providers when implementing an infrastructure machine pool. This has been created retrospectively from looking at a number of providers and the MachinePool controller. Signed-off-by: Richard Case --- docs/book/src/SUMMARY.md | 1 + .../providers/contracts/infra-machinepool.md | 525 ++++++++++++++++++ .../developer/providers/contracts/overview.md | 2 +- 3 files changed, 527 insertions(+), 1 deletion(-) create mode 100644 docs/book/src/developer/providers/contracts/infra-machinepool.md diff --git a/docs/book/src/SUMMARY.md b/docs/book/src/SUMMARY.md index d1f2fc3ea859..1852c6e08225 100644 --- a/docs/book/src/SUMMARY.md +++ b/docs/book/src/SUMMARY.md @@ -95,6 +95,7 @@ - [Provider contracts](developer/providers/contracts/overview.md) - [InfraCluster](./developer/providers/contracts/infra-cluster.md) - [InfraMachine](developer/providers/contracts/infra-machine.md) + - [InfraMachinePool](developer/providers/contracts/infra-machinepool.md) - [BootstrapConfig](developer/providers/contracts/bootstrap-config.md) - [ControlPlane](developer/providers/contracts/control-plane.md) - [clusterctl](developer/providers/contracts/clusterctl.md) diff --git a/docs/book/src/developer/providers/contracts/infra-machinepool.md b/docs/book/src/developer/providers/contracts/infra-machinepool.md new file mode 100644 index 000000000000..a04db180eec1 --- /dev/null +++ b/docs/book/src/developer/providers/contracts/infra-machinepool.md @@ -0,0 +1,525 @@ +# Contract rules for InfraMachinePool + +Infrastructure providers CAN OPTIONALLY implement an InfraMachinePool resource. + +The goal of an InfraMachinePool is to manage the lifecycle of a provider-specific pool of machines using a provider specific service (like auto-scale groups in AWS & virtual machine scalesets in Azure). + +The machines in the pool may be physical or virtual instances (although most likely virtual), and they represent the infastructure for Kubernetes nodes. + +The [MachinePool's controller](../../core/controllers/machine-pool.md) is responsible to coordinate operations of the InfraMachinePool, and the interaction between the MachinePool's controller and the InfraMachinePool is based on the contract rules defined in this page. + +Once contract rules are satisfied by an InfraMachinePool implementation, other implementation details +could be addressed according to the specific needs (Cluster API is not prescriptive). + +Nevertheless, it is always recommended to take a look at Cluster API controllers, +in-tree providers, other providers and use them as a reference implementation (unless custom solutions are required +in order to address very specific needs). + + + +## Rules (contract version v1beta2) + +| Rule | Mandatory | Note | +|----------------------------------------------------------------------|-----------|--------------------------------------| +| [All resources: scope] | Yes | | +| [All resources: `TypeMeta` and `ObjectMeta`field] | Yes | | +| [All resources: `APIVersion` field value] | Yes | | +| [InfraMachinePool, InfraMachinePoolList resource definition] | Yes | | +| [InfraMachinePool: infrastructureMachineKind] | No | Mandatory for MachinePoolMachines. | +| [InfraMachinePool: instances] | No | | +| [InfraMachinePool: providerID] | No | | +| [InfraMachinePool: providerIDList] | Yes | | +| [InfraMachinePool: ready] | Yes | | +| [InfraMachinePool: pausing] | No | | +| [InfraMachinePool: conditions] | No | | +| [InfraMachinePool: replicas] | Yes | | +| [InfraMachinePool: terminal failures] | No | | +| [InfraMachinePoolTemplate, InfraMachineTemplatePoolList resource definition] | No | Mandatory for ClusterClasses support | +| [InfraMachinePoolTemplate: support for SSA dry run] | No | Mandatory for ClusterClasses support | +| [MachinePoolMachines support] | No | | +| [Multi tenancy] | No | Mandatory for clusterctl CLI support | +| [Clusterctl support] | No | Mandatory for clusterctl CLI support | + +Note: + +- `All resources` refers to all the provider's resources "core" Cluster API interacts with; + In the context of this page: `InfraMachinePool`, `InfraMachinePoolTemplate` and corresponding list types + +### All resources: scope + +All resources MUST be namespace-scoped. + +### All resources: `TypeMeta` and `ObjectMeta` field + +All resources MUST have the standard Kubernetes `TypeMeta` and `ObjectMeta` fields. + +### All resources: `APIVersion` field value + +In Kubernetes `APIVersion` is a combination of API group and version. +Special consideration MUST applies to both API group and version for all the resources Cluster API interacts with. + +#### All resources: API group + +The domain for Cluster API resources is `cluster.x-k8s.io`, and infrastructure providers under the Kubernetes SIGS org +generally use `infrastructure.cluster.x-k8s.io` as API group. + +If your provider uses a different API group, you MUST grant full read/write RBAC permissions for resources in your API group +to the Cluster API core controllers. The canonical way to do so is via a `ClusterRole` resource with the [aggregation label] +`cluster.x-k8s.io/aggregate-to-manager: "true"`. + +The following is an example ClusterRole for a `FooMachine` resource in the `infrastructure.foo.com` API group: + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: capi-foo-clusters + labels: + cluster.x-k8s.io/aggregate-to-manager: "true" +rules: +- apiGroups: + - infrastructure.foo.com + resources: + - foomachinepools + - foomachinepooltemplates + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +``` + +Note: The write permissions are required because Cluster API manages InfraMachinePools generated from InfraMachinePooolTemplates when using ClusterClass and managed topologies, also InfraMachinePoolTemplates are managed directly by Cluster API. + +#### All resources: version + +The resource Version defines the stability of the API and its backward compatibility guarantees. +Examples include `v1alpha1`, `v1beta1`, `v1`, etc. and are governed by the [Kubernetes API Deprecation Policy]. + +Your provider SHOULD abide by the same policies. + +Note: The version of your provider does not need to be in sync with the version of core Cluster API resources. +Instead, prefer choosing a version that matches the stability of the provider API and its backward compatibility guarantees. + +Additionally: + +Providers MUST set `cluster.x-k8s.io/` label on the InfraMachinePool Custom Resource Definitions. + +The label is a map from a Cluster API contract version to your Custom Resource Definition versions. +The value is an underscore-delimited (_) list of versions. Each value MUST point to an available version in your CRD Spec. + +The label allows Cluster API controllers to perform automatic conversions for object references, the controllers will pick +the last available version in the list if multiple versions are found. + +To apply the label to CRDs it’s possible to use labels in your `kustomization.yaml` file, usually in `config/crd`: + +```yaml +labels: +- pairs: + cluster.x-k8s.io/v1beta1: v1beta1 + cluster.x-k8s.io/v1beta2: v1beta2 +``` + +An example of this is in the [AWS infrastructure provider](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/config/crd/kustomization.yaml). + +### InfraMachinePool, InfraMachinePoolList resource definition + +You CAN define a InfraMachinePool resource. +The InfraMachinePool resource name must have the format produced by `sigs.k8s.io/cluster-api/util/contract.CalculateCRDName(Group, Kind)`. + +Note: Cluster API is using such a naming convention to avoid an expensive CRD lookup operation when looking for labels from +the CRD definition of the InfraMachinePool resource. + +It is a generally applied convention to use names in the format `${env}MachinePool`, where ${env} is a, possibly short, name +for the environment in question. For example `AWSMachinePool` is an implementation for Amazon Web Services, and `AzureMachinePool` +is one for Azure. + +```go +// +kubebuilder:object:root=true +// +kubebuilder:resource:path=foomachinepools,shortName=foomp,scope=Namespaced,categories=cluster-api +// +kubebuilder:storageversion +// +kubebuilder:subresource:status + +// FooMachinePool is the Schema for foomachinepools. +type FooMachinePool struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + Spec FooMachinePoolSpec `json:"spec,omitempty"` + Status FooMachinePoolStatus `json:"status,omitempty"` +} + +type FooMachinePoolSpec struct { + // See other rules for more details about mandatory/optional fields in InfraMachinePool spec. + // Other fields SHOULD be added based on the needs of your provider. +} + +type FooMachinePoolStatus struct { + // See other rules for more details about mandatory/optional fields in InfraMachinePool status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +For each InfraMachinePool resource, you MUST also add the corresponding list resource. +The list resource MUST be named as `List`. + +```go +// +kubebuilder:object:root=true + +// FooMachinePoolList contains a list of foomachinepools. +type FooMachinePoolList struct { + metav1.TypeMeta `json:",inline"` + metav1.ListMeta `json:"metadata,omitempty"` + Items []FooMachinePool `json:"items"` +} +``` + +### InfraMachinePool: infrastructureMachineKind + +If a providers implementation of a InfraMachinePool supports "MachinePool Machines" (where all the replicas in a MachinePool can be represented by a Machine & InfraMachine) then specifying and supplying a value for this field signals to Cluster API that the provider is opted-in to MachinePoolMachines. + +If you want to adopt MachinePool Machines then you MUST have a `status.infrastructureMachineKind` field and the field must contain the resource kind of the InfraMachine that represent the replicas of the pool. For example, for the AWS provider the value would be set to `AWSMachine`. + +```go +type FooMachinePoolStatus struct { + // InfrastructureMachineKind is the kind of the infrastructure resources behind MachinePool Machines. + // +optional + InfrastructureMachineKind string `json:"infrastructureMachineKind,omitempty"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +By opting into MachinePool Machines its the responsibility of the provider to create an instance of a InfraMachine for every replica and manage their lifecycle. + +### InfraMachinePool: instances + +Each InfraMachinePool MAY specify a status field that is used to report information about each instance within the machine pool. This field is purely informational and is used as convenient way for a user to get details of the instances such as provider id and addresses. + +If you implement this then create a `status.instances` field that is a slice of a struct type that contains the information you want to store and be made available to the users. + +```go +type FooMachinePoolStatus struct { + // Instances contains the status for each instance in the pool + // +optional + Instances []FooMachinePoolInstanceStatus `json:"instances,omitempty"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool status. + // Other fields SHOULD be added based on the needs of your provider. +} + +// FooMachinePoolInstanceStatus contains instance status information about a FooMachinePool. +type FooMachinePoolInstanceStatus struct { + // Addresses contains the associated addresses for the machine. + // +optional + Addresses []clusterv1.MachineAddress `json:"addresses,omitempty"` + + // InstanceName is the identification of the Machine Instance within the Machine Pool + InstanceName string `json:"instanceName,omitempty"` + + // ProviderID is the provider identification of the Machine Pool Instance + // +optional + ProviderID *string `json:"providerID,omitempty"` + + // Version defines the Kubernetes version for the Machine Instance + // +optional + Version *string `json:"version,omitempty"` + + // Ready denotes that the machine is ready + // +optional + Ready bool `json:"ready"` +} +``` + +### InfraMachinePool: providerID + +Each InfraMachinePool MAY specify a provider ID on `spec.providerID` that can be used to identify the InfraMachinePool. + +```go +type FooMachinePoolSpec struct { + // providerID is the identification ID of the FooMachinePool. + // +optional + // +kubebuilder:validation:MinLength=1 + // +kubebuilder:validation:MaxLength=512 + ProviderID string `json:"providerID,omitempty"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool spec. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `ProviderID` field should be +of type `string`. + + +### InfraMachinePool: providerIDList + +Each InfraMachinePool MUST supply a list of the identification IDs of the machine instances managed by the machine pool by storing these in `spec.providerIDList`. + +```go +type FooMachinePoolSpec struct { + // ProviderIDList is the list of identification IDs of machine instances managed by this Machine Pool + // +optional + ProviderIDList []string `json:"providerIDList,omitempty"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool spec. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +Cluster API uses this list to determine the status of the machine pool and to know when replicas have been deleted, at which point the Node will be deleted. + +### InfraMachinePool: ready + +Each provider MUST indicate when then the InfraMachinePool has been complteley provisioned by setting `staus.ready` to **true**. This indicates to Cluster API that the infrastructure is ready and that it can continue with its processing. The value retuned here is stored in the MachinePool's `status.infraStructureReady` field. + +```go +type FooMachinePoolStatus struct { + // Ready is true when the provider resource is ready. + // +optional + Ready bool `json:"ready"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +When `ready` becomes true the phase of the MachinePool changes from **provisioning** to **provisioned**. Its also the signal that the providerIDList and replica status fields should be set on the MachinePool. + +### InfraMachinePool: pausing + +Providers SHOULD implement the pause behaviour for every object with a reconciliation loop. This is done by checking if `spec.paused` is set on the Cluster object and by checking for the `cluster.x-k8s.io/paused` annotation on the InfraMachinePool object. + +If implementing the pause behavior, providers SHOULD surface the paused status of an object using the Paused condition: `Status.Conditions[Paused]`. + +### InfraMachinePool: conditions + +According to [Kubernetes API Conventions], Conditions provide a standard mechanism for higher-level +status reporting from a controller. + +Providers implementers SHOULD implement `status.conditions` for their InfraMachinePool resource. +In case conditions are implemented on a InfraMachinePool resource, Cluster API will only consider conditions providing the following information: +- `type` (required) +- `status` (required, one of True, False, Unknown) +- `reason` (optional, if omitted a default one will be used) +- `message` (optional, if omitted an empty message will be used) +- `lastTransitionTime` (optional, if omitted time.Now will be used) +- `observedGeneration` (optional, if omitted the generation of the InfraMachinePool resource will be used) + +Other fields will be ignored. + +Conditions are not currently used by the Cluster APIs MachinePool controllers for any logic or status reporting. This will likely change in the future. + +See [Improving status in CAPI resources] for more context. + + + +### InfraMachinePool: replicas + +Provider implementors MUST implement `status.replicas` to report the most recently observed number of machine instances in the pool. For example, in AWS this would be the number of replicas in a Auto Scale Group (ASG). + +```go +type FooMachinePoolStatus struct { + // Replicas is the most recently observed number of replicas. + // +optional + Replicas int32 `json:"replicas"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +The value from this field is surfaced via the MachinePool's `status.replicas` field. + +### InfraMachinePool: terminal failures + +A provider MAY report failure information via their `status.failureReason` and `status.failureMessage` fields. + +```go +type FooMachinePoolStatus struct { + // FailureReason will be set in the event that there is a terminal problem + // reconciling the Machine and will contain a succinct value suitable + // for machine interpretation. + // +optional + FailureReason *string `json:"failureReason,omitempty"` + + // FailureMessage will be set in the event that there is a terminal problem + // reconciling the Machine and will contain a more verbose string suitable + // for logging and human consumption. + // +optional + FailureMessage *string `json:"failureMessage,omitempty"` + + // See other rules for more details about mandatory/optional fields in InfraMachinePool status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +If a provider sets these fields then their value will populated to the same named fields on the the MachinePool. + + + + + +### InfraMachinePoolTemplate, InfraMachineTemplatePoolList resource definition + +For a given InfraMachinePool resource, you SHOULD also add a sorresponding InfraMachinePoolTemplate resource in order to use it in ClusterClasses. The template resource MUST be name `Template`. + +### InfraMachinePoolTemplate: support for SSA dry run + +```go +// +kubebuilder:object:root=true +// +kubebuilder:resource:path=foomachinepooltemplates,scope=Namespaced,categories=cluster-api +// +kubebuilder:storageversion + +// FooMachinePoolTemplate is the Schema for the foomachinepooltemplates API. +type FooMachinePoolTemplate struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + Spec FooMachinePoolTemplateSpec `json:"spec,omitempty"` +} + +type FooMachinePoolTemplateSpec struct { + Template FooMachinePooleTemplateResource `json:"template"` +} + +type FooMachinePoolTemplateResource struct { + // Standard object's metadata. + // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata + // +optional + ObjectMeta clusterv1.ObjectMeta `json:"metadata,omitempty,omitzero"` + Spec FooMachinePoolSpec `json:"spec"` +} +``` + +NOTE: in this example `spec.template.spec` embeds `FooMachinePoolSpec` from MachinePool. This might not always be +the best choice depending of if/how InfraMachinePools spec fields applies to many machine pools vs only one. + +For each InfraMachinePoolTemplate resource, you MUST also add the corresponding list resource. +The list resource MUST be named as `List`. + +```go +// +kubebuilder:object:root=true + +// FooMachinePoolTemplateList contains a list of FooMachinePoolTemplates. +type FooMachinePoolTemplateList struct { + metav1.TypeMeta `json:",inline"` + metav1.ListMeta `json:"metadata,omitempty"` + Items []FooMachinePoolTemplate `json:"items"` +} +``` + +### InfraMachinePoolTemplate: support for SSA dry run + +When Cluster API's topology controller is trying to identify differences between templates defined in a ClusterClass and +the current Cluster topology, it is required to run [Server Side Apply] (SSA) dry run call. + +However, in case you have immutability checks for your InfraMachinePoolTemplate, this can lead the SSA dry run call to error. + +In order to avoid this InfraMachinePoolTemplate MUST specifically implement support for SSA dry run calls from the topology controller. + +The implementation requires to use controller runtime's `CustomValidator`, available in CR versions >= v0.12.3. + +This will allow to skip the immutability check only when the topology controller is dry running while preserving the +validation behavior for all other cases. + +### MachinePoolMachines support + +A provider can opt-in to MachinePool Machines. The mechanims to opt-in is by including `status.infrastructreMachineKind` (see InfraMachinePool: infrastructureMachineKind) in the InfraMachinePool. + +By opting in an infra provider is expected to create a InfraMachine for every replica in the pool. The lifecycle of these InfraMachines must be managed so that when scale up or scale down happens the list of InfraMachines is representative. + +By adopting MachinePool Machines this enables common processing with Cluster API, such as draining nodes before scale down. Also it enables integration with Cluster Autoscaler. + +For further information see the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220209-machinepool-machines.md). + +### Multi tenancy + +Multi tenancy in Cluster API defines the capability of an infrastructure provider to manage different credentials, +each one of them corresponding to an infrastructure tenant. + +See [infrastructure Provider Security Guidance] for considerations about cloud provider credential management. + +Please also note that Cluster API does not support running multiples instances of the same provider, which someone can +assume an alternative solution to implement multi tenancy; same applies to the clusterctl CLI. + +See [Support running multiple instances of the same provider] for more context. + +However, if you want to make it possible for users to run multiples instances of your provider, your controller's SHOULD: + +- support the `--namespace` flag. +- support the `--watch-filter` flag. + +Please, read carefully the page linked above to fully understand implications and risks related to this option. + +### Clusterctl support + +The clusterctl command is designed to work with all the providers compliant with the rules defined in the [clusterctl provider contract]. + +[All resources: Scope]: #all-resources-scope +[All resources: `TypeMeta` and `ObjectMeta`field]: #all-resources-typemeta-and-objectmeta-field +[All resources: `APIVersion` field value]: #all-resources-apiversion-field-value +[InfraMachinePool, InfraMachinePoolList resource definition]: #inframachinepool-inframachinepoollist-resource-definition +[InfraMachinePool: infrastructureMachineKind]: #inframachinepool-infrastructuremachinekind +[InfraMachinePool: instances]: #inframachinepool-instances +[InfraMachinePool: providerID]: #inframachinepool-providerid +[InfraMachinePool: providerIDList]: #inframachinepool-provideridlist +[InfraMachinePool: ready]: #inframachinepool-ready +[InfraMachinePool: pausing]: #inframachinepool-pausing +[InfraMachinePool: conditions]: #inframachinepool-conditions +[InfraMachinePool: replicas]: #inframachinepool-replicas +[InfraMachinePool: terminal failures]: #inframachinepool-terminal-failures +[InfraMachinePoolTemplate, InfraMachineTemplatePoolList resource definition]: #inframachinepooltemplate-inframachinetemplatepoollist-resource-definition +[InfraMachinePoolTemplate: support for SSA dry run]: #inframachinepooltemplate-support-for-ssa-dry-run +[MachinePoolMachines support]: #machinepoolmachines-support +[Multi tenancy]: #multi-tenancy +[Clusterctl support]: #clusterctl-support +[Kubernetes API Conventions]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties +[Improving status in CAPI resources]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md +[infrastructure Provider Security Guidance]: ../security-guidelines.md +[Support running multiple instances of the same provider]: ../../core/support-multiple-instances.md +[clusterctl provider contract]: clusterctl.md \ No newline at end of file diff --git a/docs/book/src/developer/providers/contracts/overview.md b/docs/book/src/developer/providers/contracts/overview.md index 08f1a27460e3..8a90030108a3 100644 --- a/docs/book/src/developer/providers/contracts/overview.md +++ b/docs/book/src/developer/providers/contracts/overview.md @@ -10,7 +10,7 @@ See [Cluster API release vs contract versions](../../../reference/versions.md#cl - Infrastructure provider - Contract rules for [InfraCluster](infra-cluster.md) resource - Contract rules for [InfraMachine](infra-machine.md) resource - - Contract rules for InfraMachinePool resource (TODO) + - Contract rules for [InfraMachinePool](infra-machinepool.md) resource - Bootstrap provider - Contract rules for [BootstrapConfig](bootstrap-config.md) resource From 22f729ef8e978e1fd1d10f242c63a412ba92ef7a Mon Sep 17 00:00:00 2001 From: Richard Case Date: Mon, 10 Nov 2025 15:33:17 +0000 Subject: [PATCH 2/4] doc: review changes Changes after the first review by Fabrizio. Signed-off-by: Richard Case --- .../providers/contracts/infra-machinepool.md | 207 ++++++++++-------- 1 file changed, 112 insertions(+), 95 deletions(-) diff --git a/docs/book/src/developer/providers/contracts/infra-machinepool.md b/docs/book/src/developer/providers/contracts/infra-machinepool.md index a04db180eec1..d621492d7d6c 100644 --- a/docs/book/src/developer/providers/contracts/infra-machinepool.md +++ b/docs/book/src/developer/providers/contracts/infra-machinepool.md @@ -1,10 +1,12 @@ # Contract rules for InfraMachinePool -Infrastructure providers CAN OPTIONALLY implement an InfraMachinePool resource. +Infrastructure providers CAN OPTIONALLY implement an InfraMachinePool resource using Kubernetes' CustomResourceDefinition (CRD). The goal of an InfraMachinePool is to manage the lifecycle of a provider-specific pool of machines using a provider specific service (like auto-scale groups in AWS & virtual machine scalesets in Azure). -The machines in the pool may be physical or virtual instances (although most likely virtual), and they represent the infastructure for Kubernetes nodes. +The machines in the pool may be physical or virtual instances (although most likely virtual), and they represent the infrastructure for Kubernetes nodes. + +The InfraMachinePool resource will be referenced by one of the Cluster API core resources, MachinePool. The [MachinePool's controller](../../core/controllers/machine-pool.md) is responsible to coordinate operations of the InfraMachinePool, and the interaction between the MachinePool's controller and the InfraMachinePool is based on the contract rules defined in this page. @@ -18,7 +20,7 @@ in order to address very specific needs). - - ### InfraMachinePoolTemplate, InfraMachineTemplatePoolList resource definition -For a given InfraMachinePool resource, you SHOULD also add a sorresponding InfraMachinePoolTemplate resource in order to use it in ClusterClasses. The template resource MUST be name `Template`. +For a given InfraMachinePool resource, you SHOULD also add a corresponding InfraMachinePoolTemplate resource in order to use it in ClusterClasses. The template resource MUST be name `Template`. -### InfraMachinePoolTemplate: support for SSA dry run ```go // +kubebuilder:object:root=true @@ -460,23 +488,13 @@ the current Cluster topology, it is required to run [Server Side Apply] (SSA) dr However, in case you have immutability checks for your InfraMachinePoolTemplate, this can lead the SSA dry run call to error. -In order to avoid this InfraMachinePoolTemplate MUST specifically implement support for SSA dry run calls from the topology controller. +In order to avoid this InfraMachinePoolTemplate MUST specifically implement support for SSA dry run calls from the topology controller. The implementation requires to use controller runtime's `CustomValidator`, available in CR versions >= v0.12.3. This will allow to skip the immutability check only when the topology controller is dry running while preserving the validation behavior for all other cases. -### MachinePoolMachines support - -A provider can opt-in to MachinePool Machines. The mechanims to opt-in is by including `status.infrastructreMachineKind` (see InfraMachinePool: infrastructureMachineKind) in the InfraMachinePool. - -By opting in an infra provider is expected to create a InfraMachine for every replica in the pool. The lifecycle of these InfraMachines must be managed so that when scale up or scale down happens the list of InfraMachines is representative. - -By adopting MachinePool Machines this enables common processing with Cluster API, such as draining nodes before scale down. Also it enables integration with Cluster Autoscaler. - -For further information see the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220209-machinepool-machines.md). - ### Multi tenancy Multi tenancy in Cluster API defines the capability of an infrastructure provider to manage different credentials, @@ -504,11 +522,10 @@ The clusterctl command is designed to work with all the providers compliant with [All resources: `TypeMeta` and `ObjectMeta`field]: #all-resources-typemeta-and-objectmeta-field [All resources: `APIVersion` field value]: #all-resources-apiversion-field-value [InfraMachinePool, InfraMachinePoolList resource definition]: #inframachinepool-inframachinepoollist-resource-definition -[InfraMachinePool: infrastructureMachineKind]: #inframachinepool-infrastructuremachinekind [InfraMachinePool: instances]: #inframachinepool-instances [InfraMachinePool: providerID]: #inframachinepool-providerid [InfraMachinePool: providerIDList]: #inframachinepool-provideridlist -[InfraMachinePool: ready]: #inframachinepool-ready +[InfraMachinePool: initialization completed]: #inframachinepool-initialization-completed [InfraMachinePool: pausing]: #inframachinepool-pausing [InfraMachinePool: conditions]: #inframachinepool-conditions [InfraMachinePool: replicas]: #inframachinepool-replicas From 9e1b9fb2a3ea1ed34d309faec689f32c3e5e2f92 Mon Sep 17 00:00:00 2001 From: Richard Case Date: Mon, 10 Nov 2025 15:33:17 +0000 Subject: [PATCH 3/4] doc: review changes Changes after the first review by Fabrizio. Signed-off-by: Richard Case --- .../book/src/developer/providers/contracts/infra-machinepool.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/book/src/developer/providers/contracts/infra-machinepool.md b/docs/book/src/developer/providers/contracts/infra-machinepool.md index d621492d7d6c..75f58c14fc5f 100644 --- a/docs/book/src/developer/providers/contracts/infra-machinepool.md +++ b/docs/book/src/developer/providers/contracts/infra-machinepool.md @@ -254,6 +254,8 @@ type FooMachinePoolStatus struct { } ``` +Note: not all InfraMachinePool implementations support MPM as it depends on whether the infrastructure service underpinning the InfraMachinePool supports operations being performed against single machines. For example, in CAPA `AWSManagedMachinePool` is used to represent an "EKS managed node group" and as a "managed" service you are expected to NOT perform operations against single nodes. + For further information see the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220209-machinepool-machines.md). ### InfraMachinePool: providerID From 3e1ec909996f6f3692c90827069bfa73a4c75ce0 Mon Sep 17 00:00:00 2001 From: Richard Case Date: Tue, 11 Nov 2025 11:33:57 +0000 Subject: [PATCH 4/4] docs: updates after second review Some updates after an additional review by Andreas. Signed-off-by: Richard Case --- .../providers/contracts/infra-machinepool.md | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/book/src/developer/providers/contracts/infra-machinepool.md b/docs/book/src/developer/providers/contracts/infra-machinepool.md index 75f58c14fc5f..c5a965a31504 100644 --- a/docs/book/src/developer/providers/contracts/infra-machinepool.md +++ b/docs/book/src/developer/providers/contracts/infra-machinepool.md @@ -2,7 +2,7 @@ Infrastructure providers CAN OPTIONALLY implement an InfraMachinePool resource using Kubernetes' CustomResourceDefinition (CRD). -The goal of an InfraMachinePool is to manage the lifecycle of a provider-specific pool of machines using a provider specific service (like auto-scale groups in AWS & virtual machine scalesets in Azure). +The goal of an InfraMachinePool is to manage the lifecycle of a provider-specific pool of machines using a provider specific service (like Auto Scaling groups in AWS & Virtual Machine Scale Sets in Azure). The machines in the pool may be physical or virtual instances (although most likely virtual), and they represent the infrastructure for Kubernetes nodes. @@ -32,8 +32,7 @@ Instead, whenever you need something more from the Cluster API contract, you MUS The Cluster API maintainers welcome feedback and contributions to the contract in order to improve how it's defined, its clarity and visibility to provider implementers and its suitability across the different kinds of Cluster API providers. -To provide feedback or open a discussion about the provider contract please [open an issue on the Cluster API](https://github.com/kubernetes-sigs/cluster-api/issues/new?assignees=&labels=&template=feature_request.md) -repo or add an item to the agenda in the [Cluster API community meeting](https://git.k8s.io/community/sig-cluster-lifecycle/README.md#cluster-api). +To provide feedback or open a discussion about the provider contract please [open an issue on the Cluster API](https://github.com/kubernetes-sigs/cluster-api/issues/new?template=feature_request.yaml) repo or add an item to the agenda in the [Cluster API community meeting](https://git.k8s.io/community/sig-cluster-lifecycle/README.md#cluster-api). @@ -75,7 +74,7 @@ All resources MUST have the standard Kubernetes `TypeMeta` and `ObjectMeta` fiel ### All resources: `APIVersion` field value In Kubernetes `APIVersion` is a combination of API group and version. -Special consideration MUST applies to both API group and version for all the resources Cluster API interacts with. +Special consideration MUST apply to both API group and version for all the resources Cluster API interacts with. #### All resources: API group @@ -147,7 +146,7 @@ An example of this is in the [AWS infrastructure provider](https://github.com/ku ### InfraMachinePool, InfraMachinePoolList resource definition You MUST define a InfraMachinePool resource if you provider supports MachinePools. -The InfraMachinePool resource name must have the format produced by `sigs.k8s.io/cluster-api/util/contract.CalculateCRDName(Group, Kind)`. +The InfraMachinePool CRD name must have the format produced by [`sigs.k8s.io/cluster-api/util/contract.CalculateCRDName(Group, Kind)`](https://github.com/search?q=repo%3Akubernetes-sigs%2Fcluster-api+%22func+CalculateCRDName%22&type=code). Note: Cluster API is using such a naming convention to avoid an expensive CRD lookup operation when looking for labels from the CRD definition of the InfraMachinePool resource. @@ -237,11 +236,11 @@ type FooMachinePoolInstanceStatus struct { ### MachinePoolMachines support -A provider can opt-in to MachinePool Machines (MPM). With MPM machines all the replicas in a MachinePool are represented by a Machine & InfraMachine. This enables core CAPI to perform common operations on single machines (and their Nodes), such as draining a node before scale down, integration with Cluster Autoscaler and also machine healthchecks. +A provider can opt-in to MachinePool Machines (MPM). With MPM machines all the replicas in a MachinePool are represented by a Machine & InfraMachine. This enables core CAPI to perform common operations on single machines (and their Nodes), such as draining a node before scale down, integration with Cluster Autoscaler and also [MachineHealthChecks]. -If you want to adopt MPM then you MUST have a `status.infrastructureMachineKind` field and the field must contain the resource kind of the InfraMachine that represent the replicas in the pool. For example, for the AWS provider the value would be set to `AWSMachine`. +If you want to adopt MPM then you MUST have an `status.infrastructureMachineKind` field and the field must contain the resource kind that represents the replicas in the pool. This is usually named InfraMachine if machine pool machines are representable like regular machines, or InfraMachinePoolMachine in other cases. For example, for the AWS provider the value would be set to `AWSMachine`. -By opting in an infra provider is expected to create a InfraMachine for every replica in the pool. The lifecycle of these InfraMachines must be managed so that when scale up or scale down happens the list of InfraMachines is representative. +By opting in, the infra provider is expected to create a InfraMachine for every replica in the pool. The lifecycle of these InfraMachines must be managed so that when scale up or scale down happens, the list of InfraMachines is kept up to date. ```go type FooMachinePoolStatus struct { @@ -299,13 +298,13 @@ type FooMachinePoolSpec struct { } ``` -Cluster API uses this list to determine the status of the machine pool and to know when replicas have been deleted, at which point the Node will be deleted. +Cluster API uses this list to determine the status of the machine pool and to know when replicas have been deleted, at which point the Node will be deleted. Therefore, the list MUST be kept up to date. ### InfraMachinePool: initialization completed Each provider MUST indicate when then the InfraMachinePool has been completely provisioned. -Currently this is done by setting `staus.ready` to **true**. The value retuned here is stored in the MachinePool's `status.infraStructureReady` field. +Currently this is done by setting `status.ready` to **true**. The value returned here is stored in the MachinePool's `status.infraStructureReady` field. Additionally providers should set `initialization.provisioned` to **true**. This value isn't currently used by core CAPI for MachinePools. However, MachinePools will start to use this instead and `status.ready` will be deprecated. By setting both these fields it will make the future migration easier. @@ -339,7 +338,7 @@ Once `status.ready` is set the MachinePool “core” controller will bubble up ### InfraMachinePool: pausing -Providers SHOULD implement the pause behaviour for every object with a reconciliation loop. This is done by checking if `spec.paused` is set on the Cluster object and by checking for the `cluster.x-k8s.io/paused` annotation on the InfraMachinePool object. +Providers SHOULD implement the pause behaviour for every object with a reconciliation loop. This is done by checking if `spec.paused` is set on the Cluster object and by checking for the `cluster.x-k8s.io/paused` annotation on the InfraMachinePool object. Preferably, the utility `sigs.k8s.io/cluster-api/util/annotations.IsPaused(cluster, infraMachinePool)` SHOULD be used. If implementing the pause behaviour, providers SHOULD surface the paused status of an object using the Paused condition: `Status.Conditions[Paused]`. @@ -384,7 +383,7 @@ the implication of this choice which are described both in the [Cluster API v1.1 ### InfraMachinePool: replicas -Provider implementers MUST implement `status.replicas` to report the most recently observed number of machine instances in the pool. For example, in AWS this would be the number of replicas in a Auto Scale Group (ASG). +Provider implementers MUST implement `status.replicas` to report the most recently observed number of machine instances in the pool. For example, in AWS this would be the number of replicas in a Auto Scaling group (ASG). ```go type FooMachinePoolStatus struct { @@ -492,7 +491,7 @@ However, in case you have immutability checks for your InfraMachinePoolTemplate, In order to avoid this InfraMachinePoolTemplate MUST specifically implement support for SSA dry run calls from the topology controller. -The implementation requires to use controller runtime's `CustomValidator`, available in CR versions >= v0.12.3. +The implementation requires to use controller runtime's `CustomValidator`, available since version v0.12.3. This will allow to skip the immutability check only when the topology controller is dry running while preserving the validation behavior for all other cases. @@ -541,4 +540,5 @@ The clusterctl command is designed to work with all the providers compliant with [Improving status in CAPI resources]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md [infrastructure Provider Security Guidance]: ../security-guidelines.md [Support running multiple instances of the same provider]: ../../core/support-multiple-instances.md -[clusterctl provider contract]: clusterctl.md \ No newline at end of file +[clusterctl provider contract]: clusterctl.md +[MachineHealthChecks]: ../../../tasks/automated-machine-management/healthchecking.md