|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// *scalability_and_performance/cnf-numa-aware-scheduling.adoc |
| 4 | + |
| 5 | +:_mod-docs-content-type: PROCEDURE |
| 6 | + |
| 7 | +[id="cnf-configuring-nrop-on-schedulable-cp-nodes_{context}"] |
| 8 | += Configuring NUMA Resources Operator on schedulable control plane nodes |
| 9 | + |
| 10 | +[role="_abstract"] |
| 11 | +This procedure describes how to configure the NUMA Resources Operator (NROP) to manage control plane nodes that a user configures to be schedulable. This is particularly useful in compact clusters where control plane nodes also serve as worker nodes, or in multi-node OpenShift (MNO) clusters where control plane nodes are configured as schedulable to run workloads. |
| 12 | + |
| 13 | +.Prerequisites |
| 14 | + |
| 15 | +* Install the {oc-first}. |
| 16 | +* Log in as a user with `cluster-admin` privileges. |
| 17 | +* Install the NUMA Resources Operator. |
| 18 | + |
| 19 | +.Procedure |
| 20 | + |
| 21 | +. To enable Topology Aware Scheduling (TAS) on control plane nodes, configure the nodes to be schedulable first. This allows the NUMA Resources Operator to deploy and manage pods on them. Without this action, the operator cannot deploy the pods required to gather NUMA topology information from these nodes. Follow these steps to make the control plane nodes schedulable: |
| 22 | + |
| 23 | +.. Edit the `schedulers.config.openshift.io` resource by running the following command: |
| 24 | ++ |
| 25 | +[source,terminal] |
| 26 | +---- |
| 27 | +$ oc edit schedulers.config.openshift.io cluster |
| 28 | +---- |
| 29 | + |
| 30 | +.. In the editor, set the `mastersSchedulable` field to `true`, then save and exit the editor. |
| 31 | ++ |
| 32 | +[source,yaml] |
| 33 | +---- |
| 34 | +apiVersion: config.openshift.io/v1 |
| 35 | +kind: Scheduler |
| 36 | +metadata: |
| 37 | + creationTimestamp: "2019-09-10T03:04:05Z" |
| 38 | + generation: 1 |
| 39 | + name: cluster |
| 40 | + resourceVersion: "433" |
| 41 | + selfLink: /apis/config.openshift.io/v1/schedulers/cluster |
| 42 | + uid: a636d30a-d377-11e9-88d4-0a60097bee62 |
| 43 | +spec: |
| 44 | + mastersSchedulable: true |
| 45 | +status: {} |
| 46 | +#... |
| 47 | +---- |
| 48 | + |
| 49 | +. To configure the NUMA Resources Operator, you must create a single NUMAResourcesOperator custom resource (CR) on the cluster. The `nodeGroups` configuration within this CR specifies the node pools the Operator must manage. |
| 50 | ++ |
| 51 | +[NOTE] |
| 52 | +==== |
| 53 | +Before configuring `nodeGroups`, ensure the specified node pool meets all prerequisites detailed in Section 12.5, "Configuring a single NUMA node policy." The NUMA Resources Operator requires all nodes within a group to be identical. Non-compliant nodes prevent the NUMA Resources Operator from performing the expected topology-aware scheduling for the entire pool. |
| 54 | +
|
| 55 | +You can specify multiple non-overlapping node sets for the NUMA Resources Operator to manage. Each of these sets should correspond to a different machine config pool (MCP). The NUMA Resources Operator then manages the schedulable control plane nodes within these specified node groups. |
| 56 | +==== |
| 57 | + |
| 58 | +.. For a compact cluster, the compact cluster's master nodes are also the schedulable nodes, so specify only the master pool. Create the following `nodeGroups` configuration in the `NUMAResourcesOperator` CR: |
| 59 | ++ |
| 60 | +[source,yaml] |
| 61 | +---- |
| 62 | +apiVersion: nodetopology.openshift.io/v1 |
| 63 | +kind: NUMAResourcesOperator |
| 64 | +metadata: |
| 65 | + name: numaresourcesoperator |
| 66 | +spec: |
| 67 | + nodeGroups: |
| 68 | + - poolName: master |
| 69 | +---- |
| 70 | ++ |
| 71 | +[NOTE] |
| 72 | +==== |
| 73 | +Configuring a compact cluster with a worker pool in addition to the `master` pool should be avoided. While this setup does not break the cluster or affect operator functionality, it can lead to redundant or duplicate pods and create unnecessary noise in the system. The worker pool is essentially a pointless, empty MCP in this context and serves no purpose. |
| 74 | +==== |
| 75 | + |
| 76 | +.. For an MNO cluster where both control plane and worker nodes are schedulable, you have the option to configure the NUMA Resources Operator to manage multiple `nodeGroups`. You can specify which nodes to include by adding their corresponding MCPs to the `nodeGroups` list in the `NUMAResourcesOperator` CR. The configuration depends entirely on your specific requirements. For example, to manage both the `master` and `worker-cnf` pools, create the following `nodeGroups` configuration in the NUMAResourcesOperator CR: |
| 77 | ++ |
| 78 | +[source,yaml] |
| 79 | +---- |
| 80 | +apiVersion: nodetopology.openshift.io/v1 |
| 81 | +kind: NUMAResourcesOperator |
| 82 | +metadata: |
| 83 | + name: numaresourcesoperator |
| 84 | +spec: |
| 85 | + nodeGroups: |
| 86 | + - poolName: master |
| 87 | + - poolName: worker-cnf |
| 88 | +---- |
| 89 | ++ |
| 90 | +[NOTE] |
| 91 | +==== |
| 92 | +You can customize this list to include any combination of nodeGroups for management with Topology-Aware Scheduling. To prevent duplicate, pending pods, you must ensure that each `poolName` in the configuration corresponds to a MachineConfigPool (MCP) with a unique node selector label. The label must be applied only to the nodes within that specific pool and must not overlap with labels on any other nodes in the cluster. The `worker-cnf` MCP designates a set of nodes that run telecommunications workloads. |
| 93 | +==== |
| 94 | + |
| 95 | +.. After you update the `nodeGroups` field in the `NUMAResourcesOperator` CR to reflect your cluster's configuration, apply the changes by running the following command: |
| 96 | ++ |
| 97 | +[source,terminal] |
| 98 | +---- |
| 99 | +$ oc apply -f <filename>.yaml |
| 100 | +---- |
| 101 | ++ |
| 102 | +[NOTE] |
| 103 | +==== |
| 104 | +Replace `<filename>.yaml` with the name of your configuration file. |
| 105 | +==== |
| 106 | + |
| 107 | +.Verification |
| 108 | + |
| 109 | +After applying the configuration, verify that the NUMA Resources Operator is correctly managing the schedulable control plane nodes by performing the following checks: |
| 110 | + |
| 111 | +. Confirm that the control plane nodes have the worker role and are schedulable by running the following command: |
| 112 | ++ |
| 113 | +[source,terminal] |
| 114 | +---- |
| 115 | +$ oc get nodes |
| 116 | +---- |
| 117 | ++ |
| 118 | +.Example output: |
| 119 | + |
| 120 | +[source,terminal] |
| 121 | +---- |
| 122 | +NAME STATUS ROLES AGE VERSION |
| 123 | +worker-0 Ready worker,worker-cnf 100m v1.33.3 |
| 124 | +worker-1 Ready worker 93m v1.33.3 |
| 125 | +master-0 Ready control-plane,master,worker 108m v1.33.3 |
| 126 | +master-1 Ready control-plane,master,worker 107m v1.33.3 |
| 127 | +master-2 Ready control-plane,master,worker 107m v1.33.3 |
| 128 | +worker-2 Ready worker 100m v1.33.3 |
| 129 | +---- |
| 130 | + |
| 131 | +. Verify that the NUMA Resources Operator’s pods are running on the intended nodes by running the following command. You should see a numaresourcesoperator pod for each node group you specified in the CR: |
| 132 | ++ |
| 133 | +[source,terminal] |
| 134 | +---- |
| 135 | +$ oc get pods -n openshift-numaresources -o wide |
| 136 | +---- |
| 137 | ++ |
| 138 | +.Example output: |
| 139 | +[source,terminal] |
| 140 | +---- |
| 141 | +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES |
| 142 | +numaresources-controller-manager-bdbdd574-xx6bw 1/1 Running 0 49m 10.130.0.17 master-0 <none> <none> |
| 143 | +numaresourcesoperator-master-lprrh 2/2 Running 0 20m 10.130.0.20 master-0 <none> 2/2 |
| 144 | +numaresourcesoperator-master-qk6k4 2/2 Running 0 20m 10.129.0.50 master-2 <none> 2/2 |
| 145 | +numaresourcesoperator-master-zm79n 2/2 Running 0 20m 10.128.0.44 master-1 <none> 2/2 |
| 146 | +numaresourcesoperator-worker-cnf-gqlmd 2/2 Running 0 4m27s 10.128.2.21 worker-0 <none> 2/2 |
| 147 | +---- |
| 148 | + |
| 149 | +. Confirm that the NUMA Resources Operator has collected and reported the NUMA topology data for all nodes in the specified groups by running the following command: |
| 150 | ++ |
| 151 | +[source,terminal] |
| 152 | +---- |
| 153 | +$ oc get noderesourcetopologies.topology.node.k8s.io |
| 154 | +---- |
| 155 | ++ |
| 156 | +.Example output: |
| 157 | +[source,terminal] |
| 158 | +---- |
| 159 | +NAME AGE |
| 160 | +worker-0 6m11s |
| 161 | +master-0 22m |
| 162 | +master-1 21m |
| 163 | +master-2 21m |
| 164 | +---- |
| 165 | ++ |
| 166 | +The presence of a `NodeResourceTopology` resource for a node confirms that the NUMA Resources Operator was able to schedule a pod on it to collect the data, enabling topology-aware scheduling. |
| 167 | + |
| 168 | + |
| 169 | +. Inspect a single Node Resource Topology by running the following command: |
| 170 | ++ |
| 171 | +[source,terminal] |
| 172 | +---- |
| 173 | +$ oc get noderesourcetopologies <master_node_name> -o yaml |
| 174 | +---- |
| 175 | ++ |
| 176 | +.Example output: |
| 177 | +[source,yaml] |
| 178 | +---- |
| 179 | +apiVersion: topology.node.k8s.io/v1alpha2 |
| 180 | +attributes: |
| 181 | +- name: nodeTopologyPodsFingerprint |
| 182 | + value: pfp0v001ef46db3751d8e999 |
| 183 | +- name: nodeTopologyPodsFingerprintMethod |
| 184 | + value: with-exclusive-resources |
| 185 | +- name: topologyManagerScope |
| 186 | + value: container |
| 187 | +- name: topologyManagerPolicy |
| 188 | + value: single-numa-node |
| 189 | +kind: NodeResourceTopology |
| 190 | +metadata: |
| 191 | + annotations: |
| 192 | + k8stopoawareschedwg/rte-update: periodic |
| 193 | + topology.node.k8s.io/fingerprint: pfp0v001ef46db3751d8e999 |
| 194 | + creationTimestamp: "2025-09-23T10:18:34Z" |
| 195 | + generation: 1 |
| 196 | + name: master-0 |
| 197 | + resourceVersion: "58173" |
| 198 | + uid: 35c0d27e-7d9f-43d3-bab9-2ebc0d385861 |
| 199 | +zones: |
| 200 | +- costs: |
| 201 | + - name: node-0 |
| 202 | + value: 10 |
| 203 | + name: node-0 |
| 204 | + resources: |
| 205 | + - allocatable: "3" |
| 206 | + available: "2" |
| 207 | + capacity: "4" |
| 208 | + name: cpu |
| 209 | + - allocatable: "1476189952" |
| 210 | + available: "1378189952" |
| 211 | + capacity: "1576189952" |
| 212 | + name: memory |
| 213 | + type: Node |
| 214 | +---- |
| 215 | ++ |
| 216 | +The presence of this resource for a node with a master role proves that the NUMA Resources Operator was able to deploy its discovery pods onto that node. These pods are what gather the NUMA topology data, and they can only be scheduled on nodes that are considered schedulable. |
| 217 | ++ |
| 218 | +The output confirms that the procedure to make the master nodes schedulable was successful, as the NUMA Resources Operator has now collected and reported the NUMA-related information for that specific control plane node. |
0 commit comments