-
Notifications
You must be signed in to change notification settings - Fork 117
https://issues.redhat.com/browse/ACM-23917 MCOA #8364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 57 commits
744e3e0
ed6f37f
5f8d1e0
50bc00c
d4c0d5a
ccc2581
74237e9
5340907
c0876e4
7ea9620
0848959
92b16d1
6e3ac21
084be98
20de748
e925b24
7bd1e77
6eaa135
6475970
a739b8f
a9d0fcb
49e5730
462f5d9
913e41c
41232e2
4489634
3f9e16a
2f18cb7
d9b9257
e926995
e810918
1e75131
7aab063
8c8bb38
25e8d9c
85d624d
c1ed37b
77c94bb
2ea769f
2e41be6
11e90d6
53d70fe
a02ffbe
e3e0cd2
d4afbd2
320a3e6
3bf5941
7145825
82fc602
1396a6f
0d4069d
cd8dd5d
375a1e9
5a45610
76fb870
8f0369e
80efa3d
527a13a
d02263d
8f18b7c
92a1590
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| [#export-metrics-external] | ||
| = Exporting metrics to external endpoints for the multicluster observability add-on | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For tech debt, we need to make some changes here: multicluster observability add-on-- I think we should have coded this? |
||
| To configure an external metrics endpoint, add your custom `remoteWrite` specification to your `PrometheusAgents` resources. When you configure your `remoteWrite` specification, the managed cluster sends metrics are directly to the external endpoint. | ||
|
|
||
| Update related multicluster observability add-on configurations by relabeling default metrics. Exporting metrics from your managed cluster helps you improve resiliency during network partitions between your managed and hub clusters for up to two hours. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the stuff that should be early in the topic. |
||
|
|
||
| *Required access:* Cluster administrator | ||
|
|
||
| .Prerequisites | ||
|
|
||
| - You have installed and enabled the multicluster observability add-on. For more information, see xref:../observability/obs_enable_mcoa.adoc#enable-mcoa[Enabling the multicluster observability add-on]. | ||
|
|
||
dockerymick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| .Procedure | ||
|
|
||
| Complete the following steps to export metrics to external endpoints for the `PrometheusAgent` resource: | ||
|
|
||
| . Create the TLS `Secret` config maps within the `open-cluster-management-observability` namespace. | ||
| . Add the secret names to the `secrets` specification of the `PrometheusAgent` resource. | ||
| . Add the `remoteWrite` specification to the `PrometheusAgent` resource. Your `PrometheusAgent` resource might resemble the following YAML file, where the `up` metric is exported to a custom endpoint: | ||
|
|
||
| + | ||
| [source,yaml] | ||
| ---- | ||
| apiVersion: monitoring.rhobs/v1alpha1 | ||
| kind: PrometheusAgent | ||
| metadata: | ||
| name: mcoa-default-platform-metrics-collector-global | ||
| namespace: open-cluster-management-observability | ||
| spec: | ||
| secrets: | ||
| - custom-endpoint-ca | ||
| - custom-endpoint-cert | ||
| remoteWrite: | ||
| - name: custom-endpoint | ||
| tlsConfig: | ||
| caFile: /etc/prometheus/secrets/custom-endpoint-ca/ca.crt | ||
| certFile: /etc/prometheus/secrets/custom-endpoint-cert/tls.crt | ||
| keyFile: /etc/prometheus/secrets/custom-endpoint-cert/tls.key | ||
| url: 'https://my-custom-remote-write-endpoint.io/api/v1/receive' | ||
| writeRelabelConfigs: | ||
| - action: keep | ||
| regex: ^up$ | ||
| sourceLabels: | ||
| - __name__ | ||
| - name: acm-observability | ||
| ... | ||
| ---- | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| [#enable-mcoa] | ||
| = Enabling the multicluster observability add-on | ||
|
|
||
| Enable the multicluster observability add-on on your hub cluster to configure your metrics for your platform and user workloads. You are required to enable platform workloads and it is optional to enable user workloads. | ||
|
|
||
| When you enable `platform` and `userWorkloads` specifications, the `MultiClusterObservability` operator stops deploying the metrics collectors to your managed clusters and deploys the `multicluster-observability-addon-manager` in the `open-cluster-management-observability` namespace. The `multicluster-observability-addon-manager` deploys the new metrics collectors based on the `PrometheusAgent` resource defined on your hub cluster. | ||
|
|
||
| *Required access:* Cluster administrator | ||
|
|
||
| .Prerequisites | ||
|
|
||
| - You must enable the Observability service on your hub cluster. For more details, see xref:../observability/observability_enable.adoc[Enabling the Observability service]. | ||
| - You must install the Red Hat OpenShift Cluster Observability Operator. For more information, see link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html-single/about_red_hat_openshift_cluster_observability_operator/index[Cluster Observability Operator overview]. | ||
|
|
||
dockerymick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| .Procedure | ||
|
|
||
| Complete the following steps to enable the multicluster observability add-on on your hub cluster: | ||
|
|
||
| . To enable platform monitoring and user workload monitoring, add the `platform` and `userWorkloads` specification to your `MultiClusterObservability` resource. Run the following command: | ||
|
|
||
| + | ||
| [source,bash] | ||
| ---- | ||
| oc patch mco observability -n open-cluster-management-observability --type=merge -p '{"spec":{"capabilities":{"platform":{"metrics":{"default":{"enabled": true}}},"userWorkloads":{"metrics":{"default":{"enabled": true}}}}}}' | ||
| ---- | ||
| + | ||
| Your `MultiClusterObservability` resource might resemble the following file example: | ||
|
|
||
| + | ||
| [source,yaml] | ||
| ---- | ||
| apiVersion: observability.open-cluster-management.io/v1beta2 | ||
| kind: MultiClusterObservability | ||
| metadata: | ||
| name: observability | ||
| spec: | ||
| capabilities: | ||
| platform: | ||
| metrics: | ||
| default: | ||
| enabled: true | ||
| userWorkloads: | ||
| metrics: | ||
| default: | ||
| enabled: true | ||
| ---- | ||
|
|
||
| . To verify that the default configuration resources for the multicluster observability add-on are created, open your `multicluster-observability-addon` `ClusterManagementAddon` resource. Run the following command: | ||
|
|
||
| + | ||
| [source,bash] | ||
dockerymick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ---- | ||
| oc get prometheusagents -n open-cluster-management-observability | ||
| ---- | ||
|
|
||
| . Verify that the default configurations are added to your placements. Run the following command: | ||
|
|
||
| + | ||
| [source,bash] | ||
| ---- | ||
| oc get cma multicluster-observability-addon -o yaml | yq '.spec.installStrategy.placements' | ||
| ---- | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| [#add-custom-metrics-mcoa] | ||
| = Adding custom metrics for the multicluster observability add-on | ||
|
|
||
| Add your own custom metrics to be collected from your managed clusters by configuring the `ScrapeConfig` resource. The `ScrapeConfig` must be added to the `Placement` configurations of the `ClusterManagementAddOn` resource for deploying on the corresponding managed clusters. The Prometheus operator on each managed cluster adds the new `ScrapeConfig` resource to the `PrometheusAgent` resource. | ||
|
|
||
| *Requierd access:* Cluster administrator | ||
|
|
||
| .Prerequisites | ||
|
|
||
| - You have installed and enabled the multicluster observability add-on. For more information, see xref:../observability/obs_enable_mcoa.adoc#enable-mcoa[Enabling the multicluster observability add-on]. | ||
|
|
||
| Complete the following steps to add custom metrics for the multicluster observability add-on: | ||
|
|
||
| . Create a new `ScrapeConfig` resource in the `open-cluster-management-observability` namespace that includes values for the required parameters, `jobName`, `metricsPath`, and `params`. | ||
|
|
||
| . Add the appropriate label for the `app.kubernetes.io/component` parameter to specify whether the metrics are for platform monitoring or user workload monitoring. Use one of the following label values, `platform-metrics-collector` or `user-workload-metrics-collector`. | ||
dockerymick marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| + | ||
| *Note:* When you use the `platform-metrics-collector` label, the multicluster observability add-on automatically sets the `scrapeClass` and `targets` parameters to enable federation from the platform Prometheus of your {ocp-short} managed cluster. You can override the `scrapeClass` and `targets` parameters by adding the value that you need. | ||
|
|
||
| . *Optional:* Manually set the `scrapeClass` and `staticConfigs` specifications for your `user-workload-metrics-collector` `ScrapeConfig` resource. | ||
|
|
||
| . Add the `ScrapeConfig` resource reference to the placements of the `ClusterManagementAddOn` resource where you want the resource to be deployed. | ||
|
|
||
| + | ||
| *Note:* Ensure that you reference the `ScrapeConfig` resource only after you create it. Otherwise, the add-on status updates to a `Deploying` status because the resource does not exist. | ||
|
|
||
| + | ||
| Your `ScrapeConfig` resource might resemble the following YAML file: | ||
|
|
||
| + | ||
| [source,yaml] | ||
| ---- | ||
| apiVersion: monitoring.rhobs/v1alpha1 | ||
| kind: ScrapeConfig | ||
| metadata: | ||
| name: add-custom-metrics | ||
| namespace: open-cluster-management-observability | ||
| labels: | ||
| - app.kubernetes.io/component: platform-metrics-collector | ||
| spec: | ||
| jobName: some-job-name | ||
| metricsPath: /federate | ||
| params: | ||
| match[]: | ||
| - '{__name__="up"}' | ||
| ---- | ||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,59 @@ | ||||||||||
| [#config-apis-mcoa] | ||||||||||
| = Configuring APIs for the multicluster observability add-on | ||||||||||
|
|
||||||||||
| Configure the default metrics-specific APIs of the multicluster observability add-on. When you reference a new placement in your `ClusterManagementAddon` resource, the `multicluster-observability-addon-manager` automatically creates specific default `PrometheusAgent` resources. The `multicluster-observability-addon-manager` adds the `PrometheusAgent` resources reference in the related `Placement` configurations. | ||||||||||
|
|
||||||||||
| While there is one `PrometheusAgent` resource created by the placement, the default `ScrapeConfigs` and `PrometheusRules` are common to all placements. The following resources are the default configuration resources for the multicluster observability add-on: `PrometheusAgent`, `ScrapeConfigs`, and `PrometheusRules`. | ||||||||||
|
|
||||||||||
| *Required access:* Cluster administrator | ||||||||||
|
|
||||||||||
| .Prerequisites | ||||||||||
|
|
||||||||||
| - You have installed and enabled the multicluster observability add-on. For more information, see xref:../observability/obs_enable_mcoa.adoc#enable-mcoa[Enabling the multicluster observability add-on]. | ||||||||||
|
|
||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| .Procedure | ||||||||||
|
|
||||||||||
| Complete the following to configure the APIs for the multicluster observability add-on: | ||||||||||
|
|
||||||||||
| . *Optional* Override the default scrape interval for your `PrometheusAgent` resource by changing the `scrapeInterval` parameter. The default value is `300s`. You can also override the `scrapeInterval` of the `scrapeConfig` resource. | ||||||||||
|
|
||||||||||
| . Configure the `ScrapeConfig` resource to define a set of metrics for independent federation from Prometheus of your managed clusters. Complete the following steps: | ||||||||||
|
|
||||||||||
| .. Add the name of the job that you want to reference for the `jobName` parameter. | ||||||||||
| .. To ensure that Prometheus federates metrics, add the `/federate` URL path for the `metricsPath` parameter. | ||||||||||
| .. Add the metric name and labels that you want to collect. See the following YAML file example where the `ScrapeConfig` resource collects the `up` metric: | ||||||||||
|
|
||||||||||
| + | ||||||||||
| [source,yaml] | ||||||||||
| ---- | ||||||||||
| apiVersion: monitoring.rhobs/v1alpha1 | ||||||||||
| kind: ScrapeConfig | ||||||||||
| metadata: | ||||||||||
| name: some-metrics-to-collect | ||||||||||
| namespace: open-cluster-management-observability | ||||||||||
| labels: | ||||||||||
| - app.kubernetes.io/component: <platform-metrics-collector> or <user-workload-metrics-collector> | ||||||||||
| spec: | ||||||||||
| jobName: some-job-name | ||||||||||
| metricsPath: /federate | ||||||||||
| params: | ||||||||||
| match[]: | ||||||||||
| - '{__name__="up"}' | ||||||||||
| ---- | ||||||||||
|
|
||||||||||
| . Configure the `PrometheusRule` resource to limit the cardinality of your collected metrics on your hub cluster. Complete the following steps: | ||||||||||
|
|
||||||||||
| .. Define alerting and recording rules for platform and user workload monitoring on your managed clusters. | ||||||||||
|
|
||||||||||
| .. To target user workloads in your `PrometheusRule` resource, add the following annotation to define the namespace where you want to deploy the resource: `observability.open-cluster-management.io/target-namespace`. | ||||||||||
|
|
||||||||||
| + | ||||||||||
| *Notes:* | ||||||||||
| - When the `observability.open-cluster-management.io/target-namespace` is not set in your `PrometheusRule` resource, the `PrometheusRule` resource are deployed to the default installation namespace. If `openshift.io/cluster-monitoring` is set to `true`, the `PrometheusRule` resouces are monitored by {ocp-short} user workload monitoring stack. | ||||||||||
dockerymick marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| - Be sure to use the `monitoring.coreos.com` group for the `PrometheusRule` resource. | ||||||||||
|
|
||||||||||
| . Configure the `AddonDeploymentConfig` resource to customize the deployment of the multicluster observability add-on on managed clusters. *Note:* The values in the `AddonDeploymentConfig` resource are override direct modifications of other resources. Complete the following steps: | ||||||||||
|
|
||||||||||
| .. Define the namespace where you install the `PrometheusAgent` resource for the `agentInstallNamespace` parameter. The default namespace is `open-cluster-management-agent-addon`. | ||||||||||
| .. Add the `nodePlacement` specification with the associated `nodeSelector` and `tolerations` parameters. | ||||||||||
| .. Edit the `proxyConfig` specification by updating the `httpProxy` and `noProxy` configurations. | ||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,34 @@ | ||||||||||
| [#federate-uw-coo] | ||||||||||
| = Federating user workloads from Cluster Observability Operator | ||||||||||
|
|
||||||||||
| Federate your user workloads from the Red Hat OpenShift Cluster Observability Operator on your managed clusters. By default, user workload metrics are federated from the Prometheus user-workload on your {ocp-short} cluster. | ||||||||||
|
|
||||||||||
| *Required access:* Cluster administrator | ||||||||||
|
|
||||||||||
| .Prerequisites | ||||||||||
|
|
||||||||||
| - User workload monitoring is enabled on your managed cluster. | ||||||||||
| - User workload monitoring is enabled in the `MultiClusterObservability` custom resource. | ||||||||||
| - You have created `ScrapeConfig` resources with the `app.kubernetes.io/component: <user-workload-metrics-collector>` label. | ||||||||||
| - The `ScrapeConfig` resources are referenced in the configurations list of the `ClusterManagementAddOn` for the target placements. | ||||||||||
|
|
||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| .Procedure | ||||||||||
|
|
||||||||||
| Complete the following steps to federate user workloads from the Cluster Observability Operator on your managed clusters: | ||||||||||
|
|
||||||||||
| . Update your `ScrapeConfig` resource by adding the endpoint of the Cluster Observability Operator `MonitoringStack` resource. Your resource might resemble the following YAML file: | ||||||||||
|
|
||||||||||
| + | ||||||||||
| [source,yaml] | ||||||||||
| ---- | ||||||||||
| apiVersion: monitoring.coreos.com/v1alpha1 | ||||||||||
| kind: ScrapeConfig | ||||||||||
| spec: | ||||||||||
| scrapeClass: “” | ||||||||||
| scheme: HTTP | ||||||||||
| staticConfigs: | ||||||||||
| - targets: | ||||||||||
| - my-monitoring-stack.my-monitoring-ns.svc:9090 | ||||||||||
| ---- | ||||||||||
|
|
||||||||||
| . If you use a proxy with the Prometheus server, modify the `ScrapeConfig` resource to include your TLS configuration. Create a corresponding `scrapeClass` specification on the user workload `PrometheusAgent` resource. Then reference the `PrometheusAgent` resource within the `ScrapeConfig` resource. | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a short description:
Why do this?