Merge pull request #100428 from slovern/TELCODOCS-2436

gaurav-nelson · web-flow · commit 4f4c09e06887 · 2025-10-17T07:09:11.000+10:00
TELCOCODS-2436 ARM updates - RAN RDS
diff --git a/modules/telco-ran-engineering-considerations-for-the-ran-du-use-model.adoc b/modules/telco-ran-engineering-considerations-for-the-ran-du-use-model.adoc
@@ -23,6 +23,11 @@ The recommended topology for RAN DU workloads is {sno}.
 DU workloads may be run on other cluster topologies such as 3-node compact cluster, high availability (3 control plane + n worker nodes), or SNO+1 as needed.
 Multiple SNO clusters, or a highly-available 3-node compact cluster, are recommended over the SNO+1 topology.
 
+Under the standard cluster topology case (3+n), a mixed architecture cluster is allowed only if:
+
+* All control plane nodes are x86_64.
+* All worker nodes are aarch64.
+
 Remote worker node (RWN) cluster topologies are not recommended or included under this reference design specification.
 For workloads with high service level agreement requirements such as RAN DU the following drawbacks exclude RWN from consideration:
 
@@ -35,11 +40,45 @@ For workloads with high service level agreement requirements such as RAN DU the
 
 --
 
+Supported cluster topologies for RAN DU::
++
+.Supported cluster topologies for RAN DU
+[cols="1,2,3,4,5,6", options="header"]
+|===
+|Architecture
+|SNO
+|SNO+1
+|3-node 
+|Standard 
+|RWN
+
+|x86_64
+|Yes
+|Yes
+|Yes
+|Yes
+|No
+
+|aarch64
+|Yes
+|No
+|No
+|No
+|No
+
+|mixed
+|N/A
+|No
+|No
+|Yes 
+|No
+
+|===
+
 Workloads::
 . DU workloads are described in xref:../scalability_and_performance/telco-ran-du-rds.adoc#telco-ran-du-application-workloads_telco-ran-du[Telco RAN DU application workloads].
 . DU worker nodes are Intel 3rd Generation Xeon (IceLake) 2.20 GHz or newer with host firmware tuned for maximum performance.
 
-
 Resources::
 The maximum number of running pods in the system, inclusive of application workload and {product-title} pods, is 120.
 
diff --git a/modules/telco-ran-node-tuning-operator.adoc b/modules/telco-ran-node-tuning-operator.adoc
@@ -7,49 +7,96 @@
 = CPU partitioning and performance tuning
 
 New in this release::
-* No reference design updates in this release
+* The `PerformanceProfile` and `TunedPerformancePatch` objects have been updated to fully support the aarch64 architecture.
+** If you have previously applied additional patches to the `TunedPerformancePatch` object, you must convert those patches to a new performance profile that includes the `ran-du-performance` profile instead. See the "Engineering considerations" section.
+
 
 Description::
-The RAN DU use model includes cluster performance tuning using `PerformanceProfile` CRs for low-latency performance.
+The RAN DU use model includes cluster performance tuning using `PerformanceProfile` CRs for low-latency performance, and a `TunedPerformancePatch` CR that adds additional RAN-specific tuning.
+A reference `PerformanceProfile` is provided for both x86_64 and aarch64 CPU architectures.
+The single `TunedPerformancePatch` object provided automatically detects the CPU architecture and performs the required additional tuning.
 The RAN DU use case requires the cluster to be tuned for low-latency performance.
-The Node Tuning Operator reconciles the `PerformanceProfile` CRs.
+The Node Tuning Operator reconciles the `PerformanceProfile` and `TunedPerformancePatch` CRs.
+
 For more information about node tuning with the `PerformanceProfile` CR, see "Tuning nodes for low latency with the performance profile".
 
 Limits and requirements::
-The Node Tuning Operator uses the `PerformanceProfile` CR to configure the cluster.
 You must configure the following settings in the telco RAN DU profile `PerformanceProfile` CR:
 +
 --
-* Set a reserved `cpuset` of 4 or more, equating to 4 hyper-threads (2 cores) for either of the following CPUs:
+* Set a reserved `cpuset` of 4 or more, equating to 4 hyper-threads (2 cores) on x86_64, or 4 cores on aarch64 for any of the following CPUs:
 ** Intel 3rd Generation Xeon (IceLake) 2.20 GHz, or newer, CPUs with host firmware tuned for maximum performance
 ** AMD EPYC Zen 4 CPUs (Genoa, Bergamo)
+** ARM CPUs (Neoverse)
 +
 [NOTE]
 ====
-AMD EPYC Zen 4 CPUs (Genoa, Bergamo) are fully supported.
-Power consumption evaluations are ongoing.
 It is recommended to evaluate features, such as per-pod power management, to determine any potential impact on performance.
 ====
 
-* Set the reserved `cpuset` to include both hyper-thread siblings for each included core.
-Unreserved cores are available as allocatable CPU for scheduling workloads.
-* Ensure that hyper-thread siblings are not split across reserved and isolated cores.
-* Ensure that reserved and isolated CPUs include all the threads for all cores in the CPU.
-* Include Core 0 for each NUMA node in the reserved CPU set.
-* Set the huge page size to 1G.
+* x86_64:
+** Set the reserved `cpuset` to include both hyper-thread siblings for each included core.
+   Unreserved cores are available as allocatable CPU for scheduling workloads.
+** Ensure that hyper-thread siblings are not split across reserved and isolated cores.
+** Ensure that reserved and isolated CPUs include all the threads for all cores in the CPU.
+** Include Core 0 for each NUMA node in the reserved CPU set.
+** Set the hugepage size to 1G.
+* aarch64:
+** Use the first 4 cores for the reserved CPU set (or more).
+** Set the hugepage size to 512M.
 * Only pin {product-title} pods that are by default configured as part of the management workload partition to reserved cores.
 * When recommended by the hardware vendor, set the maximum CPU frequency for reserved and isolated CPUs using the `hardwareTuning` section.
 --
 
 Engineering considerations::
-* Meeting the full performance metrics requires use of the RT kernel.
-If required, you can use the non-RT kernel with corresponding impact to performance.
+
+* RealTime (RT) kernel
+** Under x86_64, to reach the full performance metrics, you must use the RT kernel, which is the default in the `x86_64/PerformanceProfile.yaml` configuration.
+*** If required, you can select the non-RT kernel with corresponding impact to performance. 
+** Under aarch64, only the 64k-pagesize non-RT kernel is recommended for RAN DU use cases, which is the default in the `aarch64/PerformanceProfile.yaml` configuration.
 * The number of hugepages you configure depends on application workload requirements.
 Variation in this parameter is expected and allowed.
 * Variation is expected in the configuration of reserved and isolated CPU sets based on selected hardware and additional components in use on the system.
 The variation must still meet the specified limits.
 * Hardware without IRQ affinity support affects isolated CPUs.
 To ensure that pods with guaranteed whole CPU QoS have full use of allocated CPUs, all hardware in the server must support IRQ affinity.
-* If you enable workload partitioning by setting `cpuPartitioningMode` to `AllNodes` during deployment, you must use the `PerformanceProfile` CR to allocate enough CPUs to support the operating system, interrupts, and {product-title} pods.
-* The reference performance profile includes additional kernel arguments settings for `vfio_pci`.
+* To enable workload partitioning, set `cpuPartitioningMode` to `AllNodes` during deployment, and then use the `PerformanceProfile` CR to allocate enough CPUs to support the operating system, interrupts, and {product-title} pods.
+* Under x86_64, the `PerformanceProfile` CR includes additional kernel arguments settings for `vfio_pci`.
 These arguments are included for support of devices such as the FEC accelerator. You can omit them if they are not required for your workload.
+* Under aarch64, the `PerformanceProfile` must be adjusted depending on the needs of the platform:
+** For Grace Hopper systems, the following kernel commandline arguments are required:
+*** `acpi_power_meter.force_cap_on=y`
+*** `module_blacklist=nouveau`
+*** `pci=realloc=off`
+*** `pci=pcie_bus_safe`
+** For other ARM platforms, you may need to enable `iommu.passthrough=1` or `pci=realloc`
+* Extending and augmenting `TunedPerformancePatch.yaml`:
+** `TunedPerformancePatch.yaml` introduces a default top-level tuned profile named `ran-du-performance` and an architecture-aware RAN tuning profile named `ran-du-performance-architecture-common`, and additional archichitecture-specific child policies that are automatically selected by the common policy.
+** By default, the `ran-du-performance` profile is set to `priority` level `18`, and it includes both the PerformanceProfile-created profile `openshift-node-performance-openshift-node-performance-profile` and `ran-du-performance-architecture-common`
+** If you have customized the name of the `PerformanceProfile` object, you must create a new tuned object that includes the name change of the tuned profile created by the `PerformanceProfile` CR, as well as the `ran-du-performance-architecture-common` RAN tuning profile. This must have a `priority` less than 18.
+For example, if the PerformanceProfile object is named `change-this-name`:
++
+[source,yaml]
+----
+apiVersion: tuned.openshift.io/v1
+kind: Tuned
+metadata:
+  name: custom-performance-profile-override
+  namespace: openshift-cluster-node-tuning-operator
+spec:
+  profile:
+    - name: custom-performance-profile-x
+      data: |
+        [main]
+        summary=Override of the default ran-du performance tuning to adjust for our renamed PerformanceProfile
+        include=openshift-node-performance-change-this-name,ran-du-performance-architecture-common
+  recommend:
+    - machineConfigLabels:
+        machineconfiguration.openshift.io/role: "master"
+      priority: 15
+      profile: custom-performance-profile-x
+----
++
+** To further override, the optional `TunedPowerCustom.yaml` config file exemplifies how to extend the provided `TunedPerformancePatch.yaml` without needing to overlay or edit it directly.
+Creating an additional tuned profile which includes the top-level tuned profile named `ran-du-performance` and has a lower `priority` number in the `recommend` section allows adding additional settings easily.
+** For additional information on the Node Tuning Operator, see "Using the Node Tuning Operator".
diff --git a/modules/telco-ran-ptp-operator.adoc b/modules/telco-ran-ptp-operator.adoc
@@ -7,17 +7,32 @@
 = PTP Operator
 
 New in this release::
-* Dual-port NIC for PTP ordinary clock is enabled.
-* The PTP events REST API v1 and events consumer application sidecar support are removed.
-* A maximum of 3 Westport channel NIC configurations is now supported for T-GM.
+* No reference design updates in this release
 
 Description::
-Configure PTP in cluster nodes with `PTPConfig` CRs for the RAN DU use case with features like Grandmaster clock (T-GM) support using GPS, ordinary clock (OC), boundary clocks (T-BC), dual boundary clocks, high availability (HA), and optional fast event notification over HTTP.
-PTP ensures precise timing and reliability in the RAN environment.
+Configure Precision Time Protocol (PTP) in cluster nodes.
+PTP ensures precise timing and reliability in the RAN environment, compared to other clock synchronization protocols, like NTP.
+Support includes::
+* Grandmaster clock (T-GM): use GPS to sync the local clock and provide time synchronization to other devices
+* Boundary clock (T-BC): receive time from another PTP source and redistribute it to other devices
+* Ordinary clock (T-TSC): synchronize the local clock from another PTP time provider
+
+Configuration variations allow for multiple NIC configurations for greater time distribution and high availability (HA), and optional fast event notification over HTTP.
 
 Limits and requirements::
-* Limited to 2 boundary clocks for nodes with dual NICs and HA
-* Limited to 3 Westport channel NIC configurations for T-GM
+
+* Supports the PTP G.8275.1 profile for the following telco use-cases:
+** T-GM use-case:
+*** Limited to a maximum of 3 Westport channel NICs
+*** Requires GNSS input to one NIC card, with SMA connections to synchronize additional NICs
+*** HA support N/A
+** T-BC use-case:
+*** Limited to a maximum of 2 NICs
+*** System clock HA support is optional in 2-NIC configuration.
+** T-TSC use-case:
+*** Limited to single NIC only
+*** System clock HA support is optional in active/standby 2-port configuration.
+* Log reduction must be enabled with `true` or `enhanced`.
 
 Engineering considerations::
 * RAN DU RDS configurations are provided for ordinary clocks, boundary clocks, grandmaster clocks, and highly available dual NIC boundary clocks.
diff --git a/modules/telco-ref-design-overview.adoc b/modules/telco-ref-design-overview.adoc
@@ -31,3 +31,23 @@ The reference configurations in this document are deployed using a centrally man
 
 .Telco RAN DU deployment architecture
 image::474_OpenShift_OpenShift_RAN_RDS_arch_updates_1023.png[A diagram showing two distinctive network far edge deployment processes, one showing how the hub cluster uses {ztp} to install managed clusters, and the other showing how the hub cluster uses TALM to apply policies to managed clusters]
+
+== Supported CPU architectures for RAN DU
+
+.Supported CPU architectures for RAN DU
+[cols="1,2,3", options="header"]
+|===
+
+|Architecture
+|Real-time Kernel
+|Non-Realtime Kernel
+
+|x86_64
+|Yes
+|Yes
+
+|aarch64
+|No
+|Yes
+|===
+
diff --git a/scalability_and_performance/telco-ran-du-rds.adoc b/scalability_and_performance/telco-ran-du-rds.adoc
@@ -48,6 +48,8 @@ include::modules/telco-ran-node-tuning-operator.adoc[leveloffset=+2]
 
 * xref:../scalability_and_performance/cnf-tuning-low-latency-nodes-with-perf-profile.adoc#cnf-tuning-low-latency-nodes-with-perf-profile[Tuning nodes for low latency with the performance profile]
 
+* xref:../scalability_and_performance/using-node-tuning-operator.adoc#using-node-tuning-operator[Using the Node Tuning Operator]
+
 include::modules/telco-ran-ptp-operator.adoc[leveloffset=+2]
 
 include::modules/telco-ran-sr-iov-operator.adoc[leveloffset=+2]