You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The telco core cluster use model is designed for clusters that run on commodity hardware.
9
+
The telco core cluster use model is designed for clusters running on commodity hardware.
10
10
Telco core clusters support large scale telco applications including control plane functions like signaling, aggregation, session border controller (SBC), and centralized data plane functions such as 5G user plane functions (UPF).
11
11
Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments.
12
12
13
13
Networking requirements for telco core functions vary widely across a range of networking features and performance points.
14
14
IPv6 is a requirement and dual-stack is common.
15
15
Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking.
16
-
Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing.
16
+
Other functions use typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing.
17
17
18
18
Telco core clusters are configured as standard with three control plane and one or more worker nodes configured with the stock (non-RT) kernel.
19
19
In support of workloads with varying networking and performance requirements, you can segment worker nodes by using `MachineConfigPool` custom resources (CR), for example, for non-user data plane or high-throughput use cases.
20
20
In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed.
21
21
22
-
23
22
.Telco core RDS cluster service-based architecture and networking topology
24
23
image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology]
The recommended method for Telco Core cluster installation is using Red Hat Advanced Cluster Management.
15
+
The Agent Based Installer (ABI) is a separate installation flow for Openshift in environments without existing infrastructure for running cluster deployments.
16
+
Use the ABI to install {product-title} on bare-metal servers without requiring additional servers or VMs for managing the installation, but does not provide ongoing lifecycle management, monitoring or automations.
17
+
The ABI can be run on any system for example, from a laptop to generate an ISO installation image.
18
+
The ISO is used as the installation media for the cluster control plane nodes.
19
+
You can monitor the progress by using the ABI from any system with network connectivity to the control plane node's API interfaces.
13
20
+
14
-
--
15
-
Telco core clusters can be installed using the Agent-based Installer.
16
-
This method allows you to install {product-title} on bare-metal servers without requiring additional servers or VMs for managing the installation.
17
-
The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image.
18
-
The ISO is used as the installation media for the cluster supervisor nodes.
19
-
Progress can be monitored using the Agent-based Installer from any system with network connectivity to the supervisor node's API interfaces.
20
-
21
-
Agent-based Installer supports the following:
21
+
ABI supports the following:
22
22
23
-
* Installation from declarative CRs.
24
-
* Installation in disconnected environments.
25
-
* Installation without the use of additional servers to support installation, for example, the bastion node.
26
-
--
23
+
* Installation from declarative CRs
24
+
* Installation in disconnected environments
25
+
* No additional servers required to support installation, for example, the bastion node is no longer needed
27
26
28
27
Limits and requirements::
29
28
* Disconnected installation requires a registry with all required content mirrored and reachable from the installed host.
30
29
31
30
Engineering considerations::
32
-
* Networking configuration should be applied as NMState configuration during installation. Day 2 networking configuration using the NMState Operator is not supported.
33
-
31
+
* Networking configuration should be applied as NMState configuration during installation as opposed to Day 2 configuration using the NMState Operator.
Copy file name to clipboardExpand all lines: modules/telco-core-application-workloads.adoc
+11-12Lines changed: 11 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,26 +13,25 @@ Typically, pods that run high performance or latency sensitive CNFs by using use
13
13
When creating pod configurations that require exclusive CPUs, be aware of the potential implications of hyper-threaded systems.
14
14
Pods should request multiples of 2 CPUs when the entire core (2 hyper-threads) must be allocated to the pod.
15
15
16
+
16
17
Pods running network functions that do not require high throughput or low latency networking should be scheduled with best-effort or burstable QoS pods and do not require dedicated or isolated CPU cores.
17
18
18
19
Engineering considerations::
19
-
+
20
-
--
21
-
Use the following information to plan telco core workloads and cluster resources:
22
20
23
-
* As of {product-title} 4.19, cgroup v1 is no longer supported and has been removed. All workloads must now be compatible with cgroup v2. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads].
21
+
Plan telco core workloads and cluster resources by using the following information:
22
+
23
+
* As of {product-title} 4.19, `cgroup v1` is no longer supported and has been removed.
24
+
All workloads must now be compatible with `cgroup v2`.
25
+
For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads].
24
26
* CNF applications should conform to the latest version of https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
25
27
* Use a mix of best-effort and burstable QoS pods as required by your applications.
26
28
** Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the `PerformanceProfile` CR that configures the node.
27
29
** Guaranteed QoS Pods must include annotations for fully isolating CPUs.
28
30
** Best effort and burstable pods are not guaranteed exclusive CPU use.
29
31
Workloads can be preempted by other workloads, operating system daemons, or kernel tasks.
30
32
* Use exec probes sparingly and only when no other suitable option is available.
31
-
** Do not use exec probes if a CNF uses CPU pinning.
32
-
Use other probe implementations, for example, `httpGet` or `tcpSocket`.
33
-
** When you need to use exec probes, limit the exec probe frequency and quantity.
34
-
The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds.
35
-
** You can use startup probes, because they do not use significant resources at steady-state operation.
36
-
The limitation on exec probes applies primarily to liveness and readiness probes.
37
-
Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
38
-
--
33
+
** Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example, `httpGet` or `tcpSocket`.
34
+
** When you need to use exec probes, limit the exec probe frequency and quantity. The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds.
35
+
** You can use startup probes, because they do not use significant resources at steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes. Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
36
+
* Use pre-stop hooks to allow the application workload to perform required actions before pod disruption, such as during an upgrade or node maintenance. The hooks enable a pod to save state to persistent storage, offload traffic from a Service, or signal other Pods.
Copy file name to clipboardExpand all lines: modules/telco-core-cluster-common-use-model-engineering-considerations.adoc
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,10 @@
8
8
9
9
* Cluster workloads are detailed in "Application workloads".
10
10
* Worker nodes should run on either of the following CPUs:
11
-
** Intel 3rd Generation Xeon (IceLake) CPUs or newer when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
12
-
Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled. When Skylake and older CPUs change power states, this can cause latency.
13
-
** AMD EPYC Zen 4 CPUs (Genoa, Bergamo).
11
+
** Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
12
+
Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled.
13
+
** AMD EPYC Zen 4 CPUs (Genoa, Bergamo) or AMD EPYC Zen 5 CPUs (Turin) when supported by {product-title}.
14
+
** Intel Sierra Forest CPUs when supported by the {product-title}.
14
15
** IRQ balancing is enabled on worker nodes.
15
16
The `PerformanceProfile` CR sets the `globallyDisableIrqLoadBalancing` parameter to a value of `false`.
16
17
Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning".
@@ -23,6 +24,7 @@ Guaranteed QoS pods are annotated to ensure isolation as described in "CPU parti
23
24
24
25
* The balance between power management and maximum performance varies between machine config pools in the cluster.
25
26
The following configurations should be consistent for all nodes in a machine config pools group.
27
+
26
28
** Cluster scaling.
27
29
See "Scalability" for more information.
28
30
** Clusters should be able to scale to at least 120 nodes.
@@ -35,7 +37,6 @@ For a cluster configured according to the reference configuration running a simu
35
37
** The NICs used for non-DPDK network traffic should be configured to use at most 32 RX/TX queues.
36
38
** Nodes with large numbers of pods or other resources might require additional reserved CPUs.
37
39
The remaining CPUs are available for user workloads.
Copy file name to clipboardExpand all lines: modules/telco-core-cluster-network-operator.adoc
+20-10Lines changed: 20 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,18 +7,20 @@
7
7
= Cluster Network Operator
8
8
9
9
New in this release::
10
-
* No reference design updates in this release
10
+
11
+
* No reference design updates in this release
11
12
12
13
Description::
13
-
+
14
-
--
15
-
The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during cluster installation. The CNO allows for configuring primary interface MTU settings, OVN gateway configurations to use node routing tables for pod egress, and additional secondary networks such as MACVLAN.
16
14
15
+
The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during cluster installation.
16
+
The CNO allows configuration of primary interface MTU settings, OVN gateway modes to use node routing tables for pod egress, and additional secondary networks such as MACVLAN.
17
+
+
17
18
In support of network traffic separation, multiple network interfaces are configured through the CNO.
18
-
Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator. To ensure that pod traffic is properly routed, OVN-K is configured with the `routingViaHost` option enabled. This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic.
19
-
19
+
Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator.
20
+
To ensure that pod traffic is properly routed, OVN-K is configured with the `routingViaHost` option enabled.
21
+
This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic.
22
+
+
20
23
The Whereabouts CNI plugin is used to provide dynamic IPv4 and IPv6 addressing for additional pod network interfaces without the use of a DHCP server.
21
-
--
22
24
23
25
Limits and requirements::
24
26
* OVN-Kubernetes is required for IPv6 support.
@@ -29,15 +31,23 @@ MTU size up to 8900 is supported.
29
31
This handler allows a third-party module to process incoming packets before the host processes them, and only one such handler can be registered per network interface.
30
32
Since both MACVLAN and IPVLAN need to register their own `rx_handler` to function, they conflict and cannot coexist on the same interface.
* Alternative NIC configurations include splitting the shared NIC into multiple NICs or using a single dual-port NIC, though they have not been tested and validated.
35
39
* Clusters with single-stack IP configuration are not validated.
36
-
* The `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR configures the `EgressIP` node reachability check total timeout in seconds.
37
-
The recommended value is `1` second.
40
+
* EgressIP
41
+
** EgressIP failover time depends on the `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR.
42
+
This parameter determines the frequency of probes used to detect when the selected egress node is unreachable.
43
+
The recommended value of this parameter is `1` second.
44
+
** When EgressIP is configured with multiple egress nodes, the failover time is expected to be on the order of seconds or longer.
45
+
** On nodes with additional network interfaces EgressIP traffic will egress through the interface on which the EgressIP address has been assigned.
46
+
See the "Configuring an egress IP address".
38
47
* Pod-level SR-IOV bonding mode must be set to `active-backup` and a value in `miimon` must be set (`100` is recommended).
39
48
40
49
Engineering considerations::
41
-
* Pod egress traffic is handled by kernel routing table using the `routingViaHost` option.
50
+
51
+
* Pod egress traffic is managed by kernel routing table using the `routingViaHost` option.
42
52
Appropriate static routes must be configured in the host.
Copy file name to clipboardExpand all lines: modules/telco-core-common-baseline-model.adoc
+15-8Lines changed: 15 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,20 +10,28 @@ The following configurations and use models are applicable to all telco core use
10
10
The telco core use cases build on this common baseline of features.
11
11
12
12
Cluster topology::
13
-
Telco core clusters conform to the following requirements:
14
13
15
-
* High availability control plane (three or more control plane nodes)
16
-
* Non-schedulable control plane nodes
17
-
* Multiple machine config pools
14
+
The telco core reference design supports two distinct cluster configuration variants:
15
+
16
+
* A non-schedulable control plane variant, where user workloads are strictly prohibited from running on master nodes.
17
+
18
+
* A schedulable control plane variant, which allows for user workloads to run on master nodes to optimize resource utilization. This variant is only applicable to bare-metal control plane nodes and must be configured at installation time.
19
+
+
20
+
All clusters, regardless of the variant, must conform to the following requirements:
21
+
22
+
* A highly available control plane consisting of three or more nodes.
23
+
24
+
* The use of multiple machine config pools.
18
25
19
26
Storage::
20
-
Telco core use cases require persistent storage as provided by {rh-storage}.
27
+
Telco core use cases require highly available persistent storage as provided by an external storage solution.
28
+
{rh-storage} might be used to manage access to the external storage.
21
29
22
30
Networking::
23
31
Telco core cluster networking conforms to the following requirements:
24
32
25
33
* Dual stack IPv4/IPv6 (IPv4 primary).
26
-
* Fully disconnected – clusters do not have access to public networking at any point in their lifecycle.
34
+
* Fully disconnected - clusters do not have access to public networking at any point in their lifecycle.
27
35
* Supports multiple networks.
28
36
Segmented networking provides isolation between operations, administration and maintenance (OAM), signaling, and storage traffic.
29
37
* Cluster network type is OVN-Kubernetes as required for IPv6 support.
@@ -43,6 +51,5 @@ User plane networking runs in cloud-native network functions (CNFs).
43
51
44
52
Service Mesh::
45
53
Telco CNFs can use Service Mesh.
46
-
All telco core clusters require a Service Mesh implementation.
54
+
Telco core clusters typically include a Service Mesh implementation.
47
55
The choice of implementation and configuration is outside the scope of this specification.
0 commit comments