Skip to content

Commit 29a9301

Browse files
authored
Merge pull request #99868 from lcavalle/TELCODOCS-2171bis
TELCODOCS-2171#Generalize Day2Ops for Telco
2 parents 7abf32b + 5ca4308 commit 29a9301

File tree

36 files changed

+138
-152
lines changed

36 files changed

+138
-152
lines changed

_topic_maps/_topic_map.yml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3603,23 +3603,23 @@ Topics:
36033603
File: telco-update-completing-the-y-stream-update
36043604
- Name: Completing the z-stream update
36053605
File: telco-update-completing-the-z-stream-update
3606-
- Name: Troubleshooting and maintaining telco core CNF clusters
3606+
- Name: Troubleshooting and maintaining OpenShift Container Platform clusters
36073607
Dir: troubleshooting
36083608
Topics:
3609-
- Name: Troubleshooting and maintaining telco core CNF clusters
3610-
File: telco-troubleshooting-intro
3609+
- Name: Troubleshooting and maintaining OpenShift Container Platform clusters
3610+
File: troubleshooting-intro
36113611
- Name: General troubleshooting
3612-
File: telco-troubleshooting-general-troubleshooting
3612+
File: troubleshooting-general-troubleshooting
36133613
- Name: Cluster maintenance
3614-
File: telco-troubleshooting-cluster-maintenance
3614+
File: troubleshooting-cluster-maintenance
36153615
- Name: Security
3616-
File: telco-troubleshooting-security
3616+
File: troubleshooting-security
36173617
- Name: Certificate maintenance
3618-
File: telco-troubleshooting-cert-maintenance
3618+
File: troubleshooting-cert-maintenance
36193619
- Name: Machine Config Operator
3620-
File: telco-troubleshooting-mco
3620+
File: troubleshooting-mco
36213621
- Name: Bare-metal node maintenance
3622-
File: telco-troubleshooting-bmn-maintenance
3622+
File: troubleshooting-bmn-maintenance
36233623
- Name: Observability
36243624
Dir: observability
36253625
Topics:

edge_computing/day_2_core_cnf_clusters/telco-day-2-welcome.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You can use the following Day 2 operations to manage telco core CNF clusters.
1111
Updating a telco core CNF cluster:: Updating your cluster is a critical task that ensures that bugs and potential security vulnerabilities are patched.
1212
For more information, see xref:../day_2_core_cnf_clusters/updating/telco-update-welcome.adoc#telco-update-welcome[Updating a telco core CNF cluster].
1313

14-
Troubleshooting and maintaining telco core CNF clusters:: To maintain and troubleshoot a bare-metal environment where high-bandwidth network throughput is required, see xref:../day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-intro.adoc#telco-troubleshooting-intro[Troubleshooting and maintaining telco core CNF clusters].
14+
Troubleshooting and maintaining telco core CNF clusters:: To maintain and troubleshoot a bare-metal environment where high-bandwidth network throughput is required, see xref:../day_2_core_cnf_clusters/troubleshooting/troubleshooting-intro.adoc#troubleshooting-intro[Troubleshooting and maintaining {product-title} clusters].
1515

1616
Observability in telco core CNF clusters:: {product-title} generates a large amount of data, such as performance metrics and logs from the platform and the workloads running on it.
1717
As an administrator, you can use tools to collect and analyze the available data.

edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-cluster-maintenance.adoc

Lines changed: 0 additions & 21 deletions
This file was deleted.

edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-mco.adoc

Lines changed: 0 additions & 20 deletions
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,29 @@
11
:_mod-docs-content-type: ASSEMBLY
2-
[id="telco-troubleshooting-bmn-maintenance"]
2+
[id="troubleshooting-bmn-maintenance"]
33
= Bare-metal node maintenance
44
include::_attributes/common-attributes.adoc[]
5-
:context: telco-troubleshooting-bmn-maintenance
5+
:context: troubleshooting-bmn-maintenance
66

77
toc::[]
88

99
You can connect to a node for general troubleshooting.
1010
However, in some cases, you need to perform troubleshooting or maintenance tasks on certain hardware components.
11-
This section discusses topics that you need to perform that hardware maintenance.
11+
This section discusses topics that you need to perform for hardware maintenance.
1212

13-
include::modules/telco-troubleshooting-bmn-connect-to-node.adoc[leveloffset=+1]
14-
include::modules/telco-troubleshooting-bmn-move-apps-to-pods.adoc[leveloffset=+1]
13+
include::modules/troubleshooting-bmn-connect-to-node.adoc[leveloffset=+1]
14+
include::modules/troubleshooting-bmn-move-apps-to-pods.adoc[leveloffset=+1]
1515

1616
[role="_additional-resources"]
1717
.Additional resources
1818

1919
* xref:../../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working_nodes-nodes-working[Working with nodes]
2020
21-
include::modules/telco-troubleshooting-bmn-replace-dimm.adoc[leveloffset=+1]
21+
include::modules/troubleshooting-bmn-replace-dimm.adoc[leveloffset=+1]
22+
include::modules/troubleshooting-bmn-replace-disk.adoc[leveloffset=+1]
2223

2324
[role="_additional-resources"]
2425
.Additional resources
2526

2627
* xref:../../../storage/index.adoc#storage-overview_storage-overview[{product-title} storage overview]
2728
28-
include::modules/telco-troubleshooting-bmn-replace-disk.adoc[leveloffset=+1]
29-
include::modules/telco-troubleshooting-bmn-replace-nw-card.adoc[leveloffset=+1]
29+
include::modules/troubleshooting-bmn-replace-nw-card.adoc[leveloffset=+1]
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
:_mod-docs-content-type: ASSEMBLY
2-
[id="telco-troubleshooting-cert-maintenance"]
2+
[id="troubleshooting-cert-maintenance"]
33
= Certificate maintenance
44
include::_attributes/common-attributes.adoc[]
5-
:context: telco-troubleshooting-cert-maintenance
5+
:context: troubleshooting-cert-maintenance
66

77
toc::[]
88

@@ -14,22 +14,22 @@ Learn about certificates in {product-title} and how to maintain them by using th
1414
* link:https://access.redhat.com/solutions/5018231[Which OpenShift certificates do rotate automatically and which do not in Openshift 4.x?]
1515
* link:https://access.redhat.com/solutions/7000968[Checking etcd certificate expiry in OpenShift 4]
1616
17-
include::modules/telco-troubleshooting-certs-manual.adoc[leveloffset=+1]
18-
include::modules/telco-troubleshooting-certs-manual-proxy.adoc[leveloffset=+2]
17+
include::modules/troubleshooting-certs-manual.adoc[leveloffset=+1]
18+
include::modules/troubleshooting-certs-manual-proxy.adoc[leveloffset=+2]
1919

2020
[role="_additional-resources"]
2121
.Additional resources
2222

2323
* xref:../../../security/certificate_types_descriptions/proxy-certificates.adoc#cert-types-proxy-certificates[Proxy certificates]
2424
25-
include::modules/telco-troubleshooting-certs-manual-user-provisioned.adoc[leveloffset=+2]
25+
include::modules/troubleshooting-certs-manual-user-provisioned.adoc[leveloffset=+2]
2626

2727
[role="_additional-resources"]
2828
.Additional resources
2929

3030
* xref:../../../security/certificate_types_descriptions/user-provided-certificates-for-api-server.adoc#cert-types-user-provided-certificates-for-the-api-server[User-provisioned certificates for the API server]
3131
32-
include::modules/telco-troubleshooting-certs-auto.adoc[leveloffset=+1]
32+
include::modules/troubleshooting-certs-auto.adoc[leveloffset=+1]
3333

3434
[role="_additional-resources"]
3535
.Additional resources
@@ -44,21 +44,21 @@ include::modules/telco-troubleshooting-certs-auto.adoc[leveloffset=+1]
4444
* xref:../../../security/certificate_types_descriptions/control-plane-certificates.adoc#cert-types-control-plane-certificates_cert-types-control-plane-certificates[Control plane certificates]
4545
* xref:../../../security/certificate_types_descriptions/ingress-certificates.adoc#cert-types-ingress-certificates_cert-types-ingress-certificates[Ingress certificates]
4646
47-
include::modules/telco-troubleshooting-certs-auto-etcd.adoc[leveloffset=+2]
47+
include::modules/troubleshooting-certs-auto-etcd.adoc[leveloffset=+2]
4848

4949
[role="_additional-resources"]
5050
.Additional resources
5151

5252
* xref:../../../security/certificate_types_descriptions/etcd-certificates.adoc#cert-types-etcd-certificates_cert-types-etcd-certificates[etcd certificates]
5353
54-
include::modules/telco-troubleshooting-certs-auto-node.adoc[leveloffset=+2]
54+
include::modules/troubleshooting-certs-auto-node.adoc[leveloffset=+2]
5555

5656
[role="_additional-resources"]
5757
.Additional resources
5858

5959
* xref:../../../security/certificate_types_descriptions/node-certificates.adoc#cert-types-node-certificates_cert-types-node-certificates[Node certificates]
6060
61-
include::modules/telco-troubleshooting-certs-auto-service-ca.adoc[leveloffset=+2]
61+
include::modules/troubleshooting-certs-auto-service-ca.adoc[leveloffset=+2]
6262

6363
[role="_additional-resources"]
6464
.Additional resources
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="troubleshooting-cluster-maintenance"]
3+
= Cluster maintenance
4+
include::_attributes/common-attributes.adoc[]
5+
:context: troubleshooting-cluster-maintenance
6+
7+
toc::[]
8+
9+
When deploying {product-title} on bare-metal infrastructure, you must pay more attention to certain configurations which can have a significant impact on cluster stability.
10+
You can troubleshoot more effectively by completing these tasks:
11+
12+
* Monitor for failed or failing hardware components
13+
* Periodically check the status of the cluster Operators
14+
15+
[NOTE]
16+
====
17+
For hardware monitoring, contact your hardware vendor to find the appropriate logging tool for your specific hardware.
18+
====
19+
20+
include::modules/troubleshooting-clusters-check-cluster-operators.adoc[leveloffset=+1]
21+
include::modules/troubleshooting-clusters-check-for-failed-pods.adoc[leveloffset=+1]
Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
11
:_mod-docs-content-type: ASSEMBLY
2-
[id="telco-troubleshooting-general-troubleshooting"]
2+
[id="troubleshooting-general-troubleshooting"]
33
= General troubleshooting
44
include::_attributes/common-attributes.adoc[]
5-
:context: telco-troubleshooting-general-troubleshooting
5+
:context: troubleshooting-general-troubleshooting
66

77
toc::[]
88

99
When you encounter a problem, the first step is to find the specific area where the issue is happening.
10-
To narrow down the potential problematic areas, complete one or more tasks:
10+
To narrow down the potential problematic areas, complete one or more of the following tasks:
1111

1212
* Query your cluster
1313
* Check your pod logs
1414
* Debug a pod
1515
* Review events
1616
17-
include::modules/telco-troubleshooting-general-query-cluster.adoc[leveloffset=+1]
17+
include::modules/troubleshooting-general-query-cluster.adoc[leveloffset=+1]
1818

1919
[role="_additional-resources"]
2020
.Additional resources
2121

2222
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-get[oc get]
2323
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#reviewing-pod-status_investigating-pod-issues[Reviewing pod status]
2424
25-
include::modules/telco-troubleshooting-general-check-logs.adoc[leveloffset=+1]
25+
include::modules/troubleshooting-general-check-logs.adoc[leveloffset=+1]
2626

2727
[role="_additional-resources"]
2828
.Additional resources
@@ -32,37 +32,37 @@ include::modules/telco-troubleshooting-general-check-logs.adoc[leveloffset=+1]
3232
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#inspecting-pod-and-container-logs_investigating-pod-issues[Inspecting pod and container logs]
3333
3434
35-
include::modules/telco-troubleshooting-general-describe-pod.adoc[leveloffset=+1]
35+
include::modules/troubleshooting-general-describe-pod.adoc[leveloffset=+1]
3636

3737
[role="_additional-resources"]
3838
.Additional resources
3939

4040
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-describe[oc describe]
4141
42-
include::modules/telco-troubleshooting-general-review-events.adoc[leveloffset=+1]
42+
include::modules/troubleshooting-general-review-events.adoc[leveloffset=+1]
4343

4444
[role="_additional-resources"]
4545
.Additional resources
4646

4747
* xref:../../../security/container_security/security-monitoring.adoc#security-monitoring-events_security-monitoring[Watching cluster events]
4848
49-
include::modules/telco-troubleshooting-general-connect-to-pod.adoc[leveloffset=+1]
49+
include::modules/troubleshooting-general-connect-to-pod.adoc[leveloffset=+1]
5050

5151
[role="_additional-resources"]
5252
.Additional resources
5353

5454
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-rsh[oc rsh]
5555
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#accessing-running-pods_investigating-pod-issues[Accessing running pods]
5656
57-
include::modules/telco-troubleshooting-general-debug-pod.adoc[leveloffset=+1]
57+
include::modules/troubleshooting-general-debug-pod.adoc[leveloffset=+1]
5858

5959
[role="_additional-resources"]
6060
.Additional resources
6161

6262
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-debug[oc debug]
6363
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#starting-debug-pods-with-root-access_investigating-pod-issues[Starting debug pods with root access]
6464
65-
include::modules/telco-troubleshooting-general-run-command-on-pod.adoc[leveloffset=+1]
65+
include::modules/troubleshooting-general-run-command-on-pod.adoc[leveloffset=+1]
6666

6767
[role="_additional-resources"]
6868
.Additional resources
Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,23 @@
11
:_mod-docs-content-type: ASSEMBLY
2-
[id="telco-troubleshooting-intro"]
3-
= Troubleshooting and maintaining telco core CNF clusters
2+
[id="troubleshooting-intro"]
3+
= Troubleshooting and maintaining {product-title} clusters
44
include::_attributes/common-attributes.adoc[]
5-
:context: telco-troubleshooting-intro
5+
:context: troubleshooting-intro
66

77
toc::[]
88

99
Troubleshooting and maintenance are weekly tasks that can be a challenge if you do not have the tools to reach your goal, whether you want to update a component or investigate an issue.
1010
Part of the challenge is knowing where and how to search for tools and answers.
1111

12-
To maintain and troubleshoot a bare-metal environment where high-bandwidth network throughput is required, see the following procedures.
12+
To maintain and troubleshoot a bare-metal environment with high performance requirements, see the following procedures.
1313

1414
[IMPORTANT]
1515
====
16-
This troubleshooting information is not a reference for configuring {product-title} or developing Cloud-native Network Function (CNF) applications.
16+
This troubleshooting information is not a reference for configuring {product-title} or developing cloud-native applications.
1717
18-
For information about developing CNF applications for telco, see link:https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
18+
For information about developing cloud-native applications on {product-title}, see link:https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
1919
====
2020

21-
include::modules/telco-troubleshooting-cnfs.adoc[leveloffset=+1]
2221
include::modules/support-getting-support.adoc[leveloffset=+1]
2322
include::modules/support-knowledgebase-about.adoc[leveloffset=+2]
2423
include::modules/support-knowledgebase-search.adoc[leveloffset=+2]
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="troubleshooting-mco"]
3+
= Machine Config Operator
4+
include::_attributes/common-attributes.adoc[]
5+
:context: troubleshooting-mco
6+
7+
toc::[]
8+
9+
The Machine Config Operator provides useful information to cluster administrators and controls what is running directly on the bare-metal host.
10+
11+
The Machine Config Operator differentiates between groups of nodes in the cluster, allowing control plane nodes and worker nodes to run with different configurations.
12+
These groups of nodes run worker or application pods, which are called `MachineConfigPool` (`mcp`) groups.
13+
The same machine config is applied to all nodes or only to one MCP in the cluster.
14+
15+
For more information about the Machine Config Operator, see xref:../../../operators/operator-reference.adoc#machine-config-operator_cluster-operators-ref[Machine Config Operator].
16+
17+
include::modules/troubleshooting-mco-purpose.adoc[leveloffset=+1]
18+
include::modules/troubleshooting-mco-apply-several-mcs.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)