From 6624b04555c2fd77bb0085ca1bf86e33ac80f61e Mon Sep 17 00:00:00 2001 From: Ilian Iliev Date: Tue, 4 Nov 2025 18:00:59 +0200 Subject: [PATCH 01/10] [RDSC-4241] Improving observability documentation --- .../redis-data-integration/observability.md | 83 +++++++++++++++++-- 1 file changed, 76 insertions(+), 7 deletions(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 5235950c5f..9c8ec603a0 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -25,8 +25,11 @@ to query the metrics and plot simple graphs or with [Grafana](https://grafana.com/) to produce more complex visualizations and dashboards. -RDI exposes two endpoints, one for [CDC collector metrics](#collector-metrics) and -another for [stream processor metrics](#stream-processor-metrics). +RDI exposes metrics for two main components: + - [CDC collector](#collector-metrics) + - [Stream processor](#stream-processor-metrics) + + The sections below explain these sets of metrics in more detail. See the [architecture overview]({{< relref "/integrate/redis-data-integration/architecture#overview" >}}) @@ -37,9 +40,67 @@ RDI metrics with the RDI monitoring screen in Redis Insight or with the [`redis-di status`]({{< relref "/integrate/redis-data-integration/reference/cli/redis-di-status" >}}) command from the CLI.{{< /note >}} -## Collector metrics +### Accessing the metrics + +How you access the metrics endpoints depends on your RDI installation method. + +#### VM Installation + +For VM installations, the metrics are available by default on the following endpoints: +- Collector metrics: `https:///collector-source/metrics` +- Stream processor metrics: `https:///metrics` +- Operator metrics: `https:///operator/metrics` + +Please note that for RDI versions prior to 1.16.0 the collector metrics are not accessible. + +#### Helm installation + +For Helm installations, the metrics are available via autodiscovery in the K8s cluster. To use them you need to do the following: +1. Make sure you have the Prometheus Operator installed in your K8s cluster. You can follow the + [Prometheus Operator installation guide](https://prometheus-operator.dev/docs/getting-started/installation/). + +2. Update your values.yaml file to enable metrics for the operator, collector and stream processor components. + + - For the collector, update the `collector` section, under the `dataPlane` section: + ```yaml + dataPlane: + collector: + # Enable service monitor + serviceMonitor: + enabled: true + + # Make sure to label the ServiceMonitor so that Prometheus can discover it + labels: + release: prometheus + ``` -The endpoint for the collector metrics is `https:///metrics/collector-source` + - For the stream processor, update the `rdiMetricsExporter` section: + ```yaml + rdiMetricsExporter: + # Enable service monitor + serviceMonitor: + enabled: true + + # Make sure to label the ServiceMonitor so that Prometheus can discover it + labels: + release: prometheus + ``` + + - For the operator, update the `operator` section: + ```yaml + operator: + prometheus: + enabled: true + labels: + release: prometheus + metrics: + enabled: true + ``` + +Note: please have in mind, that the Prometheus service discovery loop runs on regular intervals. Therefore, after deploying or updating RDI with the above configuration, it may take up to a few minutes for Prometheus to discover the new ServiceMonitors and start scraping metrics from the RDI components. + + +## Collector metrics These metrics are divided into three groups: @@ -89,10 +150,8 @@ The following table lists all collector metrics and their descriptions: {{< note >}} Many metrics include context labels that specify the phase (`snapshot` or `streaming`), database name, and other contextual information. Metrics with a value of `-1` typically indicate that the measurement is not applicable in the current state. {{< /note >}} - -## Stream processor metrics -The endpoint for the stream processor metrics is `https:///metrics/rdi` +## Stream processor metrics RDI reports metrics during the two main phases of the ingest pipeline, the *snapshot* phase and the *change data capture (CDC)* phase. (See the @@ -145,6 +204,16 @@ RDI reports with their descriptions. - **Last batch metrics**: Show real-time performance data for the most recently processed batch {{< /note >}} +## Operator metrics +Most of the metrics exposed by the RDI operator are standard controller-runtime [metrics](https://book.kubebuilder.io/reference/metrics-reference). +Those important for RDI operations are listed in the table below: + +| Metric Name | Metric Type | Metric Description | Alerting Recommendations | +|-------------|-------------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------| +| `leader_election_master_status` | Gauge | USe this one to determine the current leader (for HA and DR setup) | Informational - may be used for alerting there is no leader for prolonged period of time | +| `rdi_operator_pipeline_phase` | Gauge | Use this one to determine the current phase pipeline phase | Informational - may be used for alerting if operator is stuck in the `Resetting` phase for a prolonged period of time | + + ## Recommended alerting strategy The alerting strategy described in the sections below focuses on system failures and data integrity issues that require immediate attention. Most other metrics are informational, so you should monitor them for trends rather than trigger alerts. From eb47bb774d0170f6411d1e33eea3926f07c1cfff Mon Sep 17 00:00:00 2001 From: Ilian Iliev Date: Wed, 5 Nov 2025 14:32:20 +0200 Subject: [PATCH 02/10] [RDSC-4241] Improving observability documentation --- content/integrate/redis-data-integration/observability.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 9c8ec603a0..986fe4b2c9 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -208,10 +208,10 @@ RDI reports with their descriptions. Most of the metrics exposed by the RDI operator are standard controller-runtime [metrics](https://book.kubebuilder.io/reference/metrics-reference). Those important for RDI operations are listed in the table below: -| Metric Name | Metric Type | Metric Description | Alerting Recommendations | -|-------------|-------------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------| -| `leader_election_master_status` | Gauge | USe this one to determine the current leader (for HA and DR setup) | Informational - may be used for alerting there is no leader for prolonged period of time | -| `rdi_operator_pipeline_phase` | Gauge | Use this one to determine the current phase pipeline phase | Informational - may be used for alerting if operator is stuck in the `Resetting` phase for a prolonged period of time | +| Metric Name | Metric Type | Metric Description | Alerting Recommendations | +|-------------|-------------|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------| +| `rdi_operator_is_leader` | Gauge | Current leadership status (1 = leader, 0 = not leader) | Informational - may be used for alerting there is no leader for prolonged period of time. | +| `rdi_operator_pipeline_phase` | Gauge | Current Pipeline phase | Informational - may be used for alerting if operator is stuck in the `Resetting` phase for a prolonged period of time | ## Recommended alerting strategy From 63cbbbe47b1d8c17c8b194d77d91cb7cabd78346 Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:48:26 +0200 Subject: [PATCH 03/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 986fe4b2c9..03f67441fc 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -40,9 +40,9 @@ RDI metrics with the RDI monitoring screen in Redis Insight or with the [`redis-di status`]({{< relref "/integrate/redis-data-integration/reference/cli/redis-di-status" >}}) command from the CLI.{{< /note >}} -### Accessing the metrics +## Accessing the metrics -How you access the metrics endpoints depends on your RDI installation method. +The way you access the metrics endpoints depends on whether you are using a VM installation or a Helm installation for RDI. The sections below describe the correct approach for each installation type. #### VM Installation From 30287b4941c5018b7c8090557dd55d0d27f8b4b5 Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:48:35 +0200 Subject: [PATCH 04/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 03f67441fc..b912ef2bac 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -53,7 +53,7 @@ For VM installations, the metrics are available by default on the following endp Please note that for RDI versions prior to 1.16.0 the collector metrics are not accessible. -#### Helm installation +### Helm installation For Helm installations, the metrics are available via autodiscovery in the K8s cluster. To use them you need to do the following: 1. Make sure you have the Prometheus Operator installed in your K8s cluster. You can follow the From dfae2dc212d74a4ea804f80a5efb792761730fc8 Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:48:42 +0200 Subject: [PATCH 05/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index b912ef2bac..e15ac7a3c6 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -44,7 +44,7 @@ command from the CLI.{{< /note >}} The way you access the metrics endpoints depends on whether you are using a VM installation or a Helm installation for RDI. The sections below describe the correct approach for each installation type. -#### VM Installation +### VM Installation For VM installations, the metrics are available by default on the following endpoints: - Collector metrics: `https:///collector-source/metrics` From 47e366bee1acd8c8ff8f821c8b7f2b6b60487be1 Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:48:54 +0200 Subject: [PATCH 06/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index e15ac7a3c6..6f24996b5c 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -55,7 +55,7 @@ Please note that for RDI versions prior to 1.16.0 the collector metrics are not ### Helm installation -For Helm installations, the metrics are available via autodiscovery in the K8s cluster. To use them you need to do the following: +For Helm installations, the metrics are available via autodiscovery in the K8s cluster. Follow the steps below to use them: 1. Make sure you have the Prometheus Operator installed in your K8s cluster. You can follow the [Prometheus Operator installation guide](https://prometheus-operator.dev/docs/getting-started/installation/). From 2a22244c74aa7594c6e20bc5c367606e9933dfd2 Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:49:13 +0200 Subject: [PATCH 07/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 6f24996b5c..35511b94f9 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -56,8 +56,8 @@ Please note that for RDI versions prior to 1.16.0 the collector metrics are not ### Helm installation For Helm installations, the metrics are available via autodiscovery in the K8s cluster. Follow the steps below to use them: -1. Make sure you have the Prometheus Operator installed in your K8s cluster. You can follow the - [Prometheus Operator installation guide](https://prometheus-operator.dev/docs/getting-started/installation/). +1. Make sure you have the Prometheus Operator installed in your K8s cluster (see the + [Prometheus Operator installation guide](https://prometheus-operator.dev/docs/getting-started/installation/) for more information about this). 2. Update your values.yaml file to enable metrics for the operator, collector and stream processor components. From 3a92588a441f5fcd543b61271d78ea04996042f3 Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:49:32 +0200 Subject: [PATCH 08/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 35511b94f9..539e55d284 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -97,8 +97,8 @@ For Helm installations, the metrics are available via autodiscovery in the K8s c enabled: true ``` -Note: please have in mind, that the Prometheus service discovery loop runs on regular intervals. Therefore, after deploying or updating RDI with the above configuration, it may take up to a few minutes for Prometheus to discover the new ServiceMonitors and start scraping metrics from the RDI components. - +{{< note >}}The Prometheus service discovery loop runs at regular intervals. This means that after deploying or updating RDI with the above configuration, it may take a few minutes for Prometheus to discover the new ServiceMonitors and start scraping metrics from the RDI components. +{{< /note >}} ## Collector metrics From a0d9a11959a32393ce8d775c754c7308d7aeaa6c Mon Sep 17 00:00:00 2001 From: ilianiliev-redis Date: Wed, 5 Nov 2025 15:49:40 +0200 Subject: [PATCH 09/10] Update content/integrate/redis-data-integration/observability.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- content/integrate/redis-data-integration/observability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 539e55d284..4eb5207866 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -206,7 +206,7 @@ RDI reports with their descriptions. ## Operator metrics Most of the metrics exposed by the RDI operator are standard controller-runtime [metrics](https://book.kubebuilder.io/reference/metrics-reference). -Those important for RDI operations are listed in the table below: +The metrics that are relevant for RDI operations are listed in the table below: | Metric Name | Metric Type | Metric Description | Alerting Recommendations | |-------------|-------------|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------| From 08deb402e6b0be6b8f57664ed626568daeb1188b Mon Sep 17 00:00:00 2001 From: Ilian Iliev Date: Thu, 6 Nov 2025 17:36:01 +0200 Subject: [PATCH 10/10] [RDSC-4241] Improving observability documentation --- content/integrate/redis-data-integration/observability.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 4eb5207866..e3d7be797d 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -28,6 +28,7 @@ dashboards. RDI exposes metrics for two main components: - [CDC collector](#collector-metrics) - [Stream processor](#stream-processor-metrics) + - [Operator](#operator-metrics) The sections below explain these sets of metrics in more detail.