diff --git a/.github/workflows/deploy-docs-staging.yaml b/.github/workflows/deploy-docs-staging.yaml
index 27dc69c9..d9a3c30b 100644
--- a/.github/workflows/deploy-docs-staging.yaml
+++ b/.github/workflows/deploy-docs-staging.yaml
@@ -49,8 +49,8 @@ jobs:
- name: Deploy to S3
run: |
- aws s3 sync ./site s3://openobserve-website-staging/docs --exclude=".git/*"
+ aws s3 sync ./site s3://openobserve-website-staging/docs --exclude=".git/*" --delete
- name: Invalidate CloudFront cache
run: |
- aws cloudfront create-invalidation --distribution-id E2GZJM0TJIDFRM --paths "/docs/*"
\ No newline at end of file
+ aws cloudfront create-invalidation --distribution-id E2GZJM0TJIDFRM --paths "/docs/*"
diff --git a/docs/images/config-remote-destination-header.png b/docs/images/config-remote-destination-header.png
new file mode 100644
index 00000000..a831376a
Binary files /dev/null and b/docs/images/config-remote-destination-header.png differ
diff --git a/docs/images/config-remote-destination-headers.png b/docs/images/config-remote-destination-headers.png
new file mode 100644
index 00000000..051e3248
Binary files /dev/null and b/docs/images/config-remote-destination-headers.png differ
diff --git a/docs/images/config-remote-destination-method.png b/docs/images/config-remote-destination-method.png
new file mode 100644
index 00000000..7f4282bf
Binary files /dev/null and b/docs/images/config-remote-destination-method.png differ
diff --git a/docs/images/config-remote-destination-output-format.png b/docs/images/config-remote-destination-output-format.png
new file mode 100644
index 00000000..08f46792
Binary files /dev/null and b/docs/images/config-remote-destination-output-format.png differ
diff --git a/docs/images/config-remote-destination.png b/docs/images/config-remote-destination.png
new file mode 100644
index 00000000..9a130e34
Binary files /dev/null and b/docs/images/config-remote-destination.png differ
diff --git a/docs/images/current-cluster-query-result.png b/docs/images/current-cluster-query-result.png
new file mode 100644
index 00000000..752c7e96
Binary files /dev/null and b/docs/images/current-cluster-query-result.png differ
diff --git a/docs/images/current-cluster-query.png b/docs/images/current-cluster-query.png
new file mode 100644
index 00000000..64137626
Binary files /dev/null and b/docs/images/current-cluster-query.png differ
diff --git a/docs/images/federated-search-multi-select.png b/docs/images/federated-search-multi-select.png
new file mode 100644
index 00000000..13dd22a5
Binary files /dev/null and b/docs/images/federated-search-multi-select.png differ
diff --git a/docs/images/federated-search-result.png b/docs/images/federated-search-result.png
new file mode 100644
index 00000000..be0e0867
Binary files /dev/null and b/docs/images/federated-search-result.png differ
diff --git a/docs/images/federated-search.png b/docs/images/federated-search.png
new file mode 100644
index 00000000..0fd3bc4e
Binary files /dev/null and b/docs/images/federated-search.png differ
diff --git a/docs/images/remote-destination-config-from-management.png b/docs/images/remote-destination-config-from-management.png
new file mode 100644
index 00000000..7893188d
Binary files /dev/null and b/docs/images/remote-destination-config-from-management.png differ
diff --git a/docs/images/remote-destination-config-from-pipeline-editor.png b/docs/images/remote-destination-config-from-pipeline-editor.png
new file mode 100644
index 00000000..4e41f927
Binary files /dev/null and b/docs/images/remote-destination-config-from-pipeline-editor.png differ
diff --git a/docs/images/remote-destination-from-pipeline-editor.png b/docs/images/remote-destination-from-pipeline-editor.png
new file mode 100644
index 00000000..a93bc4d9
Binary files /dev/null and b/docs/images/remote-destination-from-pipeline-editor.png differ
diff --git a/docs/images/use-pipeline-destination.png b/docs/images/use-pipeline-destination.png
new file mode 100644
index 00000000..99511510
Binary files /dev/null and b/docs/images/use-pipeline-destination.png differ
diff --git a/docs/user-guide/.pages b/docs/user-guide/.pages
index 16c9d0d2..acb26453 100644
--- a/docs/user-guide/.pages
+++ b/docs/user-guide/.pages
@@ -16,6 +16,7 @@ nav:
- Management: management
- Profile: profile
- Performance: performance
+ - Federated Search: federated-search
- Best Practices: best-practices
- Migration: migration
diff --git a/docs/user-guide/actions/actions-in-openobserve.md b/docs/user-guide/actions/actions-in-openobserve.md
index 20eb164e..edaa739b 100644
--- a/docs/user-guide/actions/actions-in-openobserve.md
+++ b/docs/user-guide/actions/actions-in-openobserve.md
@@ -5,8 +5,11 @@ description: >-
---
This guide explains what Actions are, their types, and use cases.
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
+
## What are Actions
-**Actions** in OpenObserve are user-defined Python scripts that support custom automation workflows. They can be applied to log data directly from the Logs UI or used as alert destinations.
+Actions in OpenObserve are user-defined Python scripts that support custom automation workflows. They can be applied to log data directly from the Logs UI or used as alert destinations.
- Previously, OpenObserve supported log transformations only through VRL (Vector Remap Language). Python scripts written for Actions expand the capabilities of log transformations in OpenObserve.
- Additionally, earlier, when an alert gets triggered, users used to get notified via email or webhook. But, with Actions as alert destinations, users can take an immediate action by adding an automation workflow using Actions.
diff --git a/docs/user-guide/actions/create-and-use-real-time-actions.md b/docs/user-guide/actions/create-and-use-real-time-actions.md
index b4ac08d4..9427ca09 100644
--- a/docs/user-guide/actions/create-and-use-real-time-actions.md
+++ b/docs/user-guide/actions/create-and-use-real-time-actions.md
@@ -8,6 +8,9 @@ description: >-
This guide provides instruction on how to create Real-time Actions.
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
+
## Create a Real-time Action
??? info "Prerequisite"
Create a Service Account and Assign a Role
diff --git a/docs/user-guide/actions/create-and-use-scheduled-actions.md b/docs/user-guide/actions/create-and-use-scheduled-actions.md
index 41436e35..b4813701 100644
--- a/docs/user-guide/actions/create-and-use-scheduled-actions.md
+++ b/docs/user-guide/actions/create-and-use-scheduled-actions.md
@@ -6,6 +6,9 @@ description: >-
---
This guide provides step-by-step instructions for creating and using Scheduled Actions in OpenObserve.
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
+
**Scheduled Actions** in OpenObserve allow you to execute custom Python scripts at a specific time, either **once** or on a **recurring schedule** defined using a cron expression.
Scheduled Actions run based on time, making them suitable for:
diff --git a/docs/user-guide/federated-search/.pages b/docs/user-guide/federated-search/.pages
new file mode 100644
index 00000000..30ec3685
--- /dev/null
+++ b/docs/user-guide/federated-search/.pages
@@ -0,0 +1,5 @@
+nav:
+
+- Federated Search Overview: index.md
+- How to Use Federated Search: how-to-use-federated-search.md
+- Federated Search Architecture: federated-search-architecture.md
diff --git a/docs/user-guide/federated-search/federated-search-architecture.md b/docs/user-guide/federated-search/federated-search-architecture.md
new file mode 100644
index 00000000..30cbddf9
--- /dev/null
+++ b/docs/user-guide/federated-search/federated-search-architecture.md
@@ -0,0 +1,147 @@
+---
+title: Federated Search in OpenObserve - Architecture
+description: Technical explanation of OpenObserve deployment modes, normal cluster query execution, and how federated search works across single and multiple clusters.
+---
+This document explains the technical architecture of OpenObserve deployments, how queries execute in normal clusters, and how [federated search](../) coordinates queries across clusters in a supercluster.
+
+!!! info "Availability"
+ This feature is available in Enterprise Edition. Not available in Open Source and Cloud.
+
+## Understanding OpenObserve deployments
+Before diving into how federated search works, you need to understand how OpenObserve can be deployed. OpenObserve scales from a single machine to a globally distributed infrastructure.
+
+## Single node deployment
+The simplest deployment: one instance of OpenObserve runs all functions on one machine. Data stores locally, and the node processes queries directly. This works for testing or small deployments.
+
+## Single cluster deployment
+When you need scale, multiple specialized nodes work together as a cluster. Each node type has a specific role:
+
+- **Router**: Entry point that forwards queries to queriers
+- **Querier**: Processes queries in parallel with other queriers
+- **Ingester**: Receives and stores data in object storage
+- **Compactor**: Optimizes files and enforces retention
+- **Alertmanager**: Executes alerts and sends notifications
+
+A single cluster handles more data and provides higher availability than a single node.
+
+## Supercluster deployment
+When you need to operate across multiple geographical regions, multiple clusters connect as a supercluster. This is where federated search becomes relevant.
+
+!!! note "Key point"
+ Each cluster in a supercluster operates independently with its own data storage. Data ingested into one cluster stays in that cluster. However, configuration metadata synchronizes across all clusters, allowing unified management.
+
+## Region and cluster hierarchy
+In a supercluster, regions organize clusters geographically. A region may contain one or more clusters.
+
+**Example:**
+
+
+```bash
+Region: us-test-3
+ ├─ Cluster: dev3
+ └─ Cluster: dev3-backup
+
+Region: us-test-4
+ └─ Cluster: dev4
+```
+Each cluster has independent data storage. Data stays where it was ingested.
+
+## How queries execute
+Understanding query execution helps you understand how federated search works whether querying one cluster or multiple clusters.
+
+### Normal cluster query execution
+This section explains how any OpenObserve cluster processes queries internally, regardless of whether it is a standalone cluster or part of a supercluster. Understanding this internal process is essential because:
+
+- This is how standalone clusters work
+- This is what happens when you query your current cluster in a supercluster without federated search coordination
+- During federated search, each individual cluster uses this same internal process to search its own data
+
+When a cluster receives a query:
+
+1. Router forwards the query to an available querier.
+2. That querier becomes the leader querier.
+3. Leader querier parses SQL, identifies data files, creates execution plan.
+4. Leader querier distributes work among available queriers. These queriers become worker queriers.
+5. All worker queriers search their assigned files in parallel.
+6. Worker queriers send results to the leader querier.
+7. Leader querier merges results and returns final answer.
+
+### Query execution for your current cluster in a supercluster
+Your current cluster is the cluster you are logged into. When you select your current cluster from the Region dropdown, this is not federated search.
+
+For example, if you are logged into Cluster A and you select Cluster A from the Region dropdown, the query executes using the normal cluster query execution process described above. No cross-cluster communication occurs, and no federated search coordination is needed.
+
+### Federated search for one different cluster in a supercluster
+When you select a different cluster from the Region dropdown, not the cluster you are logged into, federated search coordination is used:
+
+
+**Step 1: Coordination setup**
+
+Your current cluster becomes the leader cluster.
+
+
+**Step 2: Query distribution**
+
+Leader cluster sends the query to the selected cluster via gRPC.
+
+
+**Step 3: Query processing**
+
+The selected cluster processes the query using its normal cluster query execution process.
+
+
+**Step 4: Result return**
+
+The selected cluster sends its results back to the leader cluster.
+
+
+**Step 5: Result presentation**
+
+The leader cluster displays the results.
+
+### Federated search for multiple clusters in a supercluster
+
+When you select no cluster or multiple clusters from the Region dropdown, federated search extends the query across all selected clusters:
+
+
+**Step 1: Coordination setup**
+
+Your current cluster becomes the leader cluster. The leader cluster identifies all selected clusters, or all clusters if none selected, that contain data for the queried stream. These other clusters become worker clusters.
+
+
+**Step 2: Query distribution**
+
+The leader cluster sends the query to all worker clusters via gRPC. All clusters now have the same query to execute.
+
+
+**Step 3: Parallel processing**
+
+Each cluster processes the query using its normal cluster query execution process. The leader cluster searches its own data if it contains data for that stream. Worker clusters search their own data. All processing happens simultaneously.
+
+
+**Step 4: Result aggregation**
+
+Each cluster aggregates its own results internally using its leader querier and worker queriers. Worker clusters send their aggregated results to the leader cluster. The leader cluster merges all results from all clusters and returns the unified response.
+
+## Metadata synchronization
+In a supercluster, clusters share configuration and schema information in real-time while keeping actual data separate. This synchronization happens via NATS, a messaging system that coordinates communication between clusters.
+
+While stream schemas are synchronized across all clusters in real-time, the actual data for a stream only exists in the cluster or clusters where it was ingested.
+
+| **Synchronized across clusters** | **NOT synchronized (stays local)** |
+|----------------------------------|-----------------------------------|
+| Schema definitions | Log data |
+| User-defined functions | Metric data |
+| Dashboards and folders | Trace data |
+| Alerts and notifications | Raw ingested data |
+| Scheduled tasks and reports | Parquet files and WAL files |
+| User and organization settings | Search indices |
+| System configurations | |
+| Job metadata | |
+| Enrichment metadata | |
+
+This design maintains data residency compliance while enabling unified configuration management.
+
+## Limitations
+
+**No cluster identification in results:** Query results do not indicate which cluster provided specific data. To identify the source, query each cluster individually.
\ No newline at end of file
diff --git a/docs/user-guide/federated-search/how-to-use-federated-search.md b/docs/user-guide/federated-search/how-to-use-federated-search.md
new file mode 100644
index 00000000..a319cdee
--- /dev/null
+++ b/docs/user-guide/federated-search/how-to-use-federated-search.md
@@ -0,0 +1,78 @@
+---
+title: Federated Search in OpenObserve - How-to Guide
+description: Step-by-step instructions for querying your current cluster and performing federated searches across one or more clusters in a supercluster setup.
+---
+This document explains how to query your current cluster and how to perform [federated searches](../) across one or more different clusters in a supercluster setup.
+
+!!! info "Availability"
+ This feature is available in Enterprise Edition. Not available in Open Source and Cloud.
+
+## How to query your current cluster in a supercluster
+
+Query your current cluster when you know the data is in your cluster or when you need the fastest query performance.
+
+!!! note "What you need to know:"
+
+ - This is not federated search
+ - You are querying the current cluster.
+ - No cross-cluster communication occurs.
+ - Results will include data from the current cluster only.
+
+**Steps:**
+
+
+1. Navigate to the **Logs** page.
+2. Enter your query in the SQL Query Editor.
+3. Select a time range.
+4. Select one specific cluster from the **Region** dropdown.
+5. Select **Run query**.
+
+> For detailed explanation, see **Normal cluster query execution** in the [Federated Search Architecture](../federated-search/federated-search-architecture/) page.
+
+
+**Result**
+Data from the selected cluster only.
+
+
+
+## How to query one or more different clusters in a supercluster
+
+Use federated search when you need data from multiple clusters.
+
+!!! note "What you need to know"
+
+ - Multiple clusters will process your query simultaneously.
+ - Results will combine data from all selected clusters.
+
+**Steps**
+
+
+
+1. Navigate to the **Logs** page.
+2. Enter your query in the SQL Query Editor.
+3. Select a time range.
+4. Leave the **Region** dropdown unselected, or select multiple clusters.
+5. Select **Run query**.
+
+> For detailed explanation, see **Federated search for one different cluster** and **Federated search for multiple clusters** in the [Federated Search Architecture](../federated-search-architecture/) page.
+
+
+**Result**
+Combined data from all selected clusters.
+
+## Region selection reference
+
+Use this quick reference to understand how region selection affects query execution:
+
+| **Region/Cluster Selection** | **Behavior** | **Query Type** | **Communication** |
+|------------------------------|--------------|----------------|-------------------|
+| None selected | Queries all clusters | Federated search | Cross-cluster via gRPC |
+| Your current cluster selected | Queries only your current cluster | Normal cluster query (NOT federated) | Internal only, no cross-cluster |
+| One different cluster selected (same region) | Queries only that cluster | Federated search | Cross-cluster via gRPC |
+| One different cluster selected (different region) | Queries only that cluster | Federated search | Cross-cluster via gRPC |
+| Multiple clusters selected | Queries all selected clusters | Federated search | Cross-cluster via gRPC |
+
+
+**Next step**
+
+- [Federated Search Architecture](../federated-search-architecture/)
\ No newline at end of file
diff --git a/docs/user-guide/federated-search/index.md b/docs/user-guide/federated-search/index.md
new file mode 100644
index 00000000..b25b2c13
--- /dev/null
+++ b/docs/user-guide/federated-search/index.md
@@ -0,0 +1,65 @@
+---
+title: Federated Search in OpenObserve - Overview
+description: Learn what federated search is, key concepts, prerequisites, and when to use it.
+---
+This document provides an overview of federated search in OpenObserve.
+
+!!! info "Availability"
+ This feature is available in Enterprise Edition. Not available in Open Source and Cloud.
+
+## What is federated search?
+
+Federated search enables querying across multiple OpenObserve clusters that are connected as a supercluster, all from one interface.
+
+
+Without federated search, investigating issues across regions requires logging into each cluster separately, running the same query multiple times, and manually combining results. This wastes time during critical incidents.
+With federated search, you query once and receive unified results from all clusters.
+
+!!! note "Prerequisites"
+
+ - OpenObserve Enterprise edition
+ - Multiple clusters configured as a supercluster
+
+## How to verify if your environment is in a supercluster
+Check whether the Region dropdown appears on the Logs page. If visible, your clusters are configured as a supercluster.
+
+
+## Key concepts in federated search
+
+Before using federated search, understand these core concepts:
+
+- **Node:** A single instance of OpenObserve running on one machine or server.
+- **Cluster:** A group of OpenObserve nodes working together to handle data ingestion, storage, and querying. Each cluster has its own data storage.
+- **Region:** A geographical location that contains one or more clusters. For example, Region us-east may contain cluster prod-east-1 and cluster prod-east-2.
+- **Supercluster:** Multiple OpenObserve clusters across different geographical regions connected to work as a unified system. This enables federated search capability.
+- **Data distribution:** Data ingested into a specific cluster stays in that cluster's storage. It is not replicated to other clusters. This ensures data residency compliance.
+- **Metadata synchronization:** Configuration information such as schemas, dashboards, and alerts synchronize across all clusters in a supercluster. This allows unified management while keeping data distributed.
+- **Federated search:** The capability to query data across different clusters in a supercluster. Federated search activates when you:
+
+ - Select one or more different clusters, meaning clusters other than your current cluster: The selected clusters' data is searched via federated coordination.
+ - Select none: All clusters search simultaneously via federated coordination and results are combined.
+
+> **Important**: Querying your current cluster uses normal cluster query execution, not federated search architecture.
+
+> For detailed technical explanations of deployment modes, architecture, and how queries execute, see the [Federated Search Architecture](../federated-search-architecture/) page.
+
+## When to use federated search
+
+| **Use case** | **Cluster selection** | **Reason** |
+|--------------|----------------------|------------|
+| Data is in one specific different cluster | Select that different cluster | Access only that cluster's data via federated search |
+| Multi-region deployments | Select none or multiple clusters | Query all regions at once via federated search |
+| Centralized search across teams | Select none or multiple clusters | Unified visibility across all clusters via federated search |
+
+
+## When not to use federated search
+
+| **Use case** | **Cluster selection** | **Reason** |
+|--------------|----------------------|------------|
+| Data is in your current cluster | Select your current cluster | Uses normal cluster query without cross-cluster communication |
+
+
+**Next steps**
+
+- [How to Use Federated Search](../how-to-use-federated-search/)
+- [Federated Search Architecture](../federated-search-architecture/)
\ No newline at end of file
diff --git a/docs/user-guide/identity-and-access-management/role-based-access-control.md b/docs/user-guide/identity-and-access-management/role-based-access-control.md
index d4f86ef0..0aaf3876 100644
--- a/docs/user-guide/identity-and-access-management/role-based-access-control.md
+++ b/docs/user-guide/identity-and-access-management/role-based-access-control.md
@@ -5,15 +5,16 @@ description: >-
---
This guide provides an overview of Role-Based Access Control (RBAC), its features, and how it is implemented in OpenObserve.
-## Overview
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
-OpenObserve uses RBAC to manage what actions users can perform based on their assigned roles. Instead of giving all users the same level of access, RBAC ensures that each user can only access the features and data relevant to their role.
+ - **Enterprise version**: RBAC requires manual configuration using [OpenFGA](https://openfga.dev/api/service). Learn more about [enabling RBAC in OpenObserve Enterprise](enable-rbac-in-openobserve-enterprise.md).
+ - **Cloud version**: RBAC is preconfigured and does not require setup.
+ - **Open-source version**: RBAC is not supported. All users have unrestricted access to all features.
-RBAC is available in **OpenObserve Enterprise** and **Cloud** versions but is not supported in the open-source version:
+## Overview
-- **Enterprise version**: RBAC requires manual configuration using [OpenFGA](https://openfga.dev/api/service). Learn more about [enabling RBAC in OpenObserve Enterprise](enable-rbac-in-openobserve-enterprise.md).
-- **Cloud version**: RBAC is preconfigured and does not require setup.
-- **Open-source version**: RBAC is not supported. All users have unrestricted access to all features.
+OpenObserve uses RBAC to manage what actions users can perform based on their assigned roles. Instead of giving all users the same level of access, RBAC ensures that each user can only access the features and data relevant to their role.
## How OpenObserve Implements RBAC
diff --git a/docs/user-guide/identity-and-access-management/sso.md b/docs/user-guide/identity-and-access-management/sso.md
index 8e2139c4..49b3f9ba 100644
--- a/docs/user-guide/identity-and-access-management/sso.md
+++ b/docs/user-guide/identity-and-access-management/sso.md
@@ -5,11 +5,13 @@ description: >-
---
-> `Applicable to enterprise version`
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
+## SSO in OpenObserve
OpenObserve, integrates Single Sign-On (SSO) capabilities using Dex, an OpenID Connect Identity (OIDC) and OAuth 2.0 provider. Dex does not have a user database and instead uses external identity providers like LDAP, Google, GitHub, etc. for authentication.
-## Setup OpenObserve
+## Configure SSO in OpenObserve
You must set following environment variables to enable SSO in OpenObserve.
diff --git a/docs/user-guide/management/aggregation-cache.md b/docs/user-guide/management/aggregation-cache.md
index dd2c2462..5587521b 100644
--- a/docs/user-guide/management/aggregation-cache.md
+++ b/docs/user-guide/management/aggregation-cache.md
@@ -5,7 +5,6 @@ description: Learn how streaming aggregation works in OpenObserve Enterprise.
---
This page explains what streaming aggregation is and shows how to use it to improve query performance with aggregation cache in OpenObserve.
-> This is an enterprise feature.
=== "Overview"
@@ -148,7 +147,7 @@ This page explains what streaming aggregation is and shows how to use it to impr
- [approx_percentile_cont](https://datafusion.apache.org/user-guide/sql/aggregate_functions.html#approx-percentile-cont)
- [approx_percentile_cont_with_weight](https://datafusion.apache.org/user-guide/sql/aggregate_functions.html#approx-percentile-cont-with-weight)
- [approx_topk](https://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk/)
- - [approx_topk_distinct](http://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk-distinct/)
+ - [approx_topk_distinct](https://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk-distinct/)
---
diff --git a/docs/user-guide/management/audit-trail.md b/docs/user-guide/management/audit-trail.md
index f83c2934..030e645e 100644
--- a/docs/user-guide/management/audit-trail.md
+++ b/docs/user-guide/management/audit-trail.md
@@ -6,8 +6,9 @@ description: >-
---
# Audit Trail
-> **Note:** This feature is applicable to the Enterprise Edition.
-
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
+
## What is Audit Trail
Audit Trail records user actions across all organizations in OpenObserve. It captures non-ingestion API calls and helps you monitor activity and improve security.
@@ -31,7 +32,7 @@ When audit logging is enabled using the `O2_AUDIT_ENABLED` environment variable,
!!! note "Example"
The following example shows a captured audit event from the `audit` stream:

-
+
!!! note "Use cases"
Because audit events are stored in a log stream, you can:
diff --git a/docs/user-guide/management/cipher-keys.md b/docs/user-guide/management/cipher-keys.md
index 0690d786..14857fdc 100644
--- a/docs/user-guide/management/cipher-keys.md
+++ b/docs/user-guide/management/cipher-keys.md
@@ -7,7 +7,8 @@ description: >-
This page explains how to create and manage **Cipher Keys** in OpenObserve and how to use them to decrypt encrypted log data during search queries.
The **Cipher Keys** feature is essential for handling sensitive data stored in encrypted formats while still enabling effective log search and analysis, without storing decrypted data on disk.
-> **Note:** This feature is applicable to the OpenObserve [Enterprise Edition](../../../openobserve-enterprise-edition-installation-guide/).
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
## Create Cipher Keys
diff --git a/docs/user-guide/management/sensitive-data-redaction.md b/docs/user-guide/management/sensitive-data-redaction.md
index 0190e1e2..195ce602 100644
--- a/docs/user-guide/management/sensitive-data-redaction.md
+++ b/docs/user-guide/management/sensitive-data-redaction.md
@@ -4,7 +4,9 @@ description: Learn how to redact or drop sensitive data using regex patterns dur
---
This document explains how to configure and manage regex patterns for redacting or dropping sensitive data in OpenObserve.
-> Note: This feature is applicable to the OpenObserve [Enterprise Edition](../../../openobserve-enterprise-edition-installation-guide/).
+
+!!! info "Availability"
+ This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
## Overview
The **Sensitive Data Redaction** feature helps prevent accidental exposure of sensitive data by applying regex-based detection to values ingested into streams and to values already stored in streams. Based on this detection, sensitive values can be either **redacted** or **dropped**. This ensures data is protected before it is stored and hidden when displayed in query results. You can configure these actions to run at ingestion time or at query time.
diff --git a/docs/user-guide/pipelines/.pages b/docs/user-guide/pipelines/.pages
index be678f61..46684ef9 100644
--- a/docs/user-guide/pipelines/.pages
+++ b/docs/user-guide/pipelines/.pages
@@ -2,6 +2,7 @@ nav:
- Pipelines Overview: index.md
- Pipelines in OpenObserve: pipelines.md
- Create and Use Pipelines: use-pipelines.md
+ - Remote Destination: remote-destination.md
- Import and Export Pipelines: import-and-export-pipelines.md
- Manage Pipelines: manage-pipelines.md
- Configurable Delay in Scheduled Pipelines: configurable-delay-in-scheduled-pipelines.md
diff --git a/docs/user-guide/pipelines/remote-destination.md b/docs/user-guide/pipelines/remote-destination.md
new file mode 100644
index 00000000..d42e2a25
--- /dev/null
+++ b/docs/user-guide/pipelines/remote-destination.md
@@ -0,0 +1,197 @@
+---
+title: Pipeline Remote Destinations
+description: Configure and manage remote destinations to send transformed pipeline data to external systems with persistent queuing, retry logic, and high-throughput performance.
+---
+This document explains how to configure remote destinations in OpenObserve pipelines to send transformed data to external systems. It covers the setup process, technical architecture of the persistent queue mechanism, Write-Ahead Log (WAL) file operations, failure handling, retry logic, and performance optimization through environment variables.
+
+=== "How to"
+ ## What is a remote destination?
+ A remote destination allows you to send transformed pipeline data to external systems outside your OpenObserve instance. When you select **Remote** as your destination type in a pipeline, the system routes data to an external endpoint of your choice while ensuring data integrity and reliability through a persistent queue mechanism.
+
+ ## Configuring a remote destination
+
+ ??? "Step 1: Access the Management page"
+ ### Step 1: Access the Management page
+
+ Navigate to the **Pipeline Destination** configuration page using either method:
+
+ - **From the pipeline editor**: While setting up your pipeline, select Remote as the destination type > click the **Create New Destination** toggle.
+ 
+ 
+ - **From Management**: Click the settings icon in the navigation menu > **Pipeline Destinations** > **Add Destination**.
+ 
+
+ ??? "Step 2: Create the destination"
+ ### Step 2: Create the destination
+ In the **Add Destination** form:
+
+ 1. **Name**: Provide a descriptive name for the external destination. For example, `remote_destination_dev`.
+ 2. **URL**: Specify the endpoint where data should be sent.
+ 
+ !!! note "To send the transformed data to another OpenObserve instance:"
+ Use the following URL format: `https:///api///_json`
+ **Example**: To send data to a stream called `remote_pipeline` in the `default` organization on a different OpenObserve instance: `https://your-o2-instance.example.com/api/default/remote_pipeline/_json`
+
+ After transformation, the transformed data will be sent to the `remote_pipeline` stream under the `default` organization in the destination OpenObserve instance.
+ !!! note "To send data to an external endpoint:"
+ Ensure that you provide the complete URL of your external service endpoint.
+ 3. **Method**: Select the HTTP method based on your requirement.
+ 
+ !!! note "To send the transformed data to another OpenObserve instance:"
+ Select **POST**.
+ !!! note "To send data to an external endpoint:"
+ Select the method required by your external service.
+ 4. **Output Format**: Select the data format for transmission.
+ 
+ !!! note "When to select JSON (default):"
+ Standard JSON format. Use this when the destination API requires standard JSON arrays or objects. **Use JSON, when you send the transformed data to another OpenObserve instance**.
+ !!! note "When to select NDJSON (Newline Delimited JSON):"
+ Each event is sent as a separate JSON object on its own line. Use this when sending the transformed data to observability platforms that expect NDJSON, for example, Datadog and Splunk.
+ **Important**: Always verify the data format expected by your destination system before selecting. Check the destination's API documentation or ingestion requirements to ensure compatibility.
+ 5. **Headers**: To send data to an external endpoint, you may need to provide authentication credentials, if required. In the **Header** field, enter Authorization and in the **Value** field, provide the authentication token.
+ !!! note "To send the transformed data to another OpenObserve instance:"
+ 
+ 1. Log in to the destination OpenObserve instance.
+ 2. Navigate to **Data Sources** > **Databases**.
+ 3. Copy the authorization token value displayed there.
+ 4. Paste this token in the **Value** field.
+ 
+ !!! note "To send data to an external endpoint:"
+ Add the authentication headers required by your external service. This could be API keys, bearer tokens, or other authentication methods depending on the service.
+ 6. **Skip TLS Verify**: Use this toggle to enable or disable Transport Layer Security (TLS) verification. Enable this toggle to bypass security and certificate verification checks. **Use with caution, as disabling verification may expose data to security risks.**
+ 7. Click **Save** to create the destination.
+
+ ??? "Step 3: Use in your pipeline"
+ ### Step 3: Use in your pipeline
+
+ After creating the remote destination, you can select it from the **Destination** dropdown when configuring the remote destination node in your pipeline. The dropdown displays all previously created remote destinations with their names and URLs for easy identification.
+ 
+
+ ## Environment variables for remote destination
+
+ | **Environment Variable** | **Description** |
+ | --- | --- |
+ | ZO_PIPELINE_REMOTE_STREAM_CONCURRENT_COUNT | • Defines the number of concurrent threads the exporter uses to send data from Write-Ahead Log (WAL) files to the remote destination. • Controls export parallelism. Higher values increase throughput but also increase CPU usage. • Set this value to match or slightly exceed the number of CPU cores. Increase when export speed lags behind ingestion, and decrease if CPU usage stays above 80 percent. |
+ | ZO_PIPELINE_FILE_PUSH_BACK_INTERVAL | • Specifies how long a reader waits before checking the queue again after catching up to the writer. • Balances latency and CPU utilization. Lower values reduce event latency but raise CPU load; higher values lower CPU usage but increase latency. • Use 1 second for low-latency pipelines. Increase to 5–10 seconds in resource-limited systems or when small delays are acceptable. |
+ | ZO_PIPELINE_SINK_TASK_SPAWN_INTERVAL_MS | • Determines how often the scheduler assigns new export tasks to reader threads, measured in milliseconds. • Controls backlog clearing speed and CPU overhead. Shorter intervals improve responsiveness but raise CPU usage. • Use 10–50 ms to clear persistent backlogs faster. Use 200–500 ms to reduce CPU load in low-throughput environments. Keep 100 ms for balanced performance. |
+ | ZO_PIPELINE_MAX_RETRY_COUNT | • Sets the maximum number of retry attempts per WAL file after export failure. • Prevents endless retries for failed exports and limits disk growth when destinations are unreachable. • Increase to 10 when the destination is unreliable or often unavailable. Keep the default of 6 for stable networks. |
+ | ZO_PIPELINE_MAX_RETRY_TIME_IN_HOURS | • Defines the longest allowed interval between retry attempts during exponential backoff. • Ensures failed files are retried at least once in the defined period and prevents retries from spacing out indefinitely. • Keep the default 24 hours for typical conditions. Increase to 48 hours if the destination experiences long outages. |
+=== "Overview"
+
+ This section explains the technical architecture and internal mechanisms of remote destinations. After configuring a remote destination, understanding the underlying systems helps with troubleshooting, performance optimization, and operational decisions.
+
+ ## How remote destinations work
+ The remote destination feature allows you to send pipeline data to external systems. However, the core challenge is that the data must not be lost if the system crashes, restarts, or if the destination becomes temporarily unavailable.
+
+ To resolve this issue, OpenObserve writes data to disk first, then sends it to the remote destination. This creates a safety buffer.
+
+ ## How data flows in a pipeline
+ Data moves through five stages:
+ Pipeline > Transformations > Disk Storage > Transmission > Cleanup
+
+ !!! note "Stage details:"
+
+ - **Stage 1 - Pipeline Source:** Data enters from the source stream via the source node of the pipeline.
+ - **Stage 2 - Transformations:** All configured functions and conditions are applied to process the data.
+ - **Stage 3 - Disk Storage:** After all transformations complete, the processed data is written to Write-Ahead Log files on disk. This write happens before any transmission attempt.
+ - **Stage 4 - Network Transmission:** Data is read from disk and sent to the remote destination via HTTP.
+ - **Stage 5 - Cleanup:** After successful transmission and acknowledgment from the destination, the disk files are deleted.
+ **Note**: Disk storage occurs only after all transformations finish. WAL files contain the final processed version of the data, not the original input.
+
+ ## Write-Ahead Log files
+ Write-Ahead Log files, or WAL files, are the disk storage mechanism used in stage 3. These are files written to disk that temporarily hold data between processing and transmission.
+
+ - **How many WAL files are created in advance**: The total number of WAL files created depends on how many remote destinations you have configured. For each remote destination, OpenObserve creates the number of files specified in the `ZO_MEMTABLE_BUCKET_NUM` environment variable.
+ - **Where these files are stored**T: he files are written to the `/data/remote_stream_wal/` directory on disk.
+
+ !!! note "Note"
+ During normal operation, when data flows through the pipeline, these files are simultaneously being written to and read from. Files in this state are called **active files**.
+
+ ## How WAL files operate
+ The WAL file system uses a multi-threaded architecture to achieve high throughput.
+
+ ### Writer thread
+ The system uses multiple writer threads, equal to the `ZO_MEMTABLE_BUCKET_NUM` setting, to add data to WAL files.
+ Each writer thread:
+
+ - Receives transformed data from the pipeline
+ - Writes data sequentially to the current active WAL file
+ - Moves to the next file when the current file reaches capacity
+ - Operates continuously as long as data flows through the pipeline
+
+ **Important**: A file reaches capacity when either of two conditions is met: the file size reaches `ZO_PIPELINE_MAX_FILE_SIZE_ON_DISK_MB` or the file has been open for `ZO_PIPELINE_MAX_FILE_RETENTION_TIME_SECONDS`.
+
+ ### Reader threads
+ Multiple reader threads, 30 by default, handle transmission to the remote destination. Each reader thread:
+
+ - Selects a WAL file that contains unsent data
+ - Reads data from the file
+ - Sends the data to the remote destination via HTTP
+ - Tracks successful transmission progress
+
+ Multiple readers enable parallel transmission. While one reader sends data from file A, another reader can simultaneously send data from file B. This parallel processing allows the system to handle high data volumes. In production deployments, the system consistently achieves throughput of 30-40 MB per second.
+
+ ### FIFO ordering
+ The system maintains First-In-First-Out ordering. The oldest data, meaning the data that was transformed earliest, is always transmitted first. This guarantee ensures that data arrives at the destination in the same temporal order it was processed.
+
+ The reader threads coordinate to maintain this ordering even while operating in parallel. Files are assigned to readers based on age, ensuring older files are prioritized.
+
+ ## WAL file lifecycle
+ WAL files are deleted under four conditions:
+
+ **Condition 1: Successful Transmission**
+
+ All data in the file has been sent and the destination has acknowledged receipt. The file is immediately deleted. This is the normal deletion path during healthy operation.
+
+ **Condition 2: Disk Space Limit**
+
+ When remote destination WAL files consume 50% of available disk space (default), the system stops writing new files and deletes the oldest files to free space. Deletion occurs regardless of transmission status. This limit prevents remote destination operations from consuming disk space needed by other OpenObserve components like data ingestion and query processing. The disk space limit is configurable via the `ZO_PIPELINE_WAL_SIZE_LIMIT` environment variable. On a 1 TB disk with the default 50% limit, remote destination files will not exceed approximately 500 GB.
+
+ **Condition 3: Data Retention Policy**
+
+ WAL files containing data older than the stream's retention period are deleted regardless of transmission status. Each pipeline inherits retention settings from its associated stream. If a stream has 30-day retention, WAL files with data older than 31 days are deleted even if never transmitted. This aligns remote destination data lifecycle with overall retention policy.
+
+ **Condition 4: Retry Exhaustion**
+ After repeated transmission failures, the system stops retrying the file. By default, this happens after 6 failed attempts. The file then remains on disk but is no longer scheduled for transmission.
+
+ - This behavior can be changed using the ZO_PIPELINE_REMOVE_WAL_FILE_AFTER_MAX_RETRY configuration. When set to true, failed files are permanently deleted instead of being kept on disk.
+ - The retry limit is configurable via ZO_PIPELINE_MAX_RETRY_COUNT.
+
+ ## Failure handling and retry
+
+ When transmission fails, the system waits before retrying. Wait times increase with each failure: 5 minutes after the first failure, 10 minutes after the second, 20 minutes after the third, and so on, doubling each time. This is called exponential backoff. It gives a failed or overloaded destination time to recover instead of immediately retrying, which would consume bandwidth and potentially worsen the problem.
+
+ - **Maximum wait time**: Retry intervals cannot exceed 24 hours (configurable via ZO_PIPELINE_MAX_RETRY_TIME_IN_HOURS), ensuring files are retried at least once daily.
+ - **Random variation**: The system adds small random delays to retry times. This prevents many failed files from retrying at the exact same moment and overwhelming the destination. This is known as preventing the "thundering herd" problem, where multiple requests hitting a recovering system simultaneously can cause it to fail again.
+ - **Retry limit**: After 6 failed attempts (configurable via ZO_PIPELINE_MAX_RETRY_COUNT), the system stops retrying. File handling then follows the rules described in Condition 4 of the section.
+
+ ## Persistent queue architecture
+ The combination of disk-based storage, multi-threaded processing, FIFO ordering, and retry logic implements a system pattern known as a persistent queue. A persistent queue is a queue that stores items on disk so it survives restarts and failures, preserves order, and resumes transmission without duplication.
+
+ Internally, OpenObserve achieves this pattern through the same components described earlier. Write-Ahead Log files act as the queue storage, the exporter manages the queue, a single writer thread adds transformed records, and multiple reader threads transmit them to the destination in order. Together, these elements ensure fault-tolerant and consistent data flow across restarts and retries.
+
+ ## Storage organization of WAL files
+
+ OpenObserve stores remote destination Write-Ahead Log (WAL) files separately from the files used in normal data ingestion. This separation ensures that export operations do not interfere with the system’s core ingestion and query processes.
+
+ Remote destination WAL files are stored in the `/data/remote_stream_wal/` directory, while the standard ingestion process uses the `/data/wal/` directory.
+
+
+ ## Performance
+ Remote destinations in OpenObserve support high-throughput workloads:
+
+ - **Production-validated**: 30-40 MB/second sustained throughput (tested on 4 vCPU nodes)
+ - **Peak capacity**: 80+ MB/second during traffic spikes
+ - **Mixed workloads**: Efficiently handles both low-volume streams (1-2 events/hour) and high-volume streams (30+ MB/second) simultaneously
+
+ The system prevents disk pileup on ingester nodes by matching export rates with ingestion rates under normal operating conditions.
+
+ ## Environment variables for remote destination
+
+ | **Environment Variable** | **Description** |
+ | --- | --- |
+ | ZO_PIPELINE_REMOTE_STREAM_CONCURRENT_COUNT | • Defines the number of concurrent threads the exporter uses to send data from Write-Ahead Log (WAL) files to the remote destination.• Controls export parallelism. Higher values increase throughput but also increase CPU usage.• Set this value to match or slightly exceed the number of CPU cores. Increase when export speed lags behind ingestion, and decrease if CPU usage stays above 80 percent. |
+ | ZO_PIPELINE_FILE_PUSH_BACK_INTERVAL | **• **Specifies how long a reader waits before checking the queue again after catching up to the writer.• Balances latency and CPU utilization. Lower values reduce event latency but raise CPU load; higher values lower CPU usage but increase latency.• Use 1 second for low-latency pipelines. Increase to 5–10 seconds in resource-limited systems or when small delays are acceptable. |
+ | ZO_PIPELINE_SINK_TASK_SPAWN_INTERVAL_MS | • Determines how often the scheduler assigns new export tasks to reader threads, measured in milliseconds.• Controls backlog clearing speed and CPU overhead. Shorter intervals improve responsiveness but raise CPU usage.• Use 10–50 ms to clear persistent backlogs faster. Use 200–500 ms to reduce CPU load in low-throughput environments. Keep 100 ms for balanced performance. |
+ | ZO_PIPELINE_MAX_RETRY_COUNT | • Sets the maximum number of retry attempts per WAL file after export failure.• Prevents endless retries for failed exports and limits disk growth when destinations are unreachable.• Increase to 10 when the destination is unreliable or often unavailable. Keep the default of 6 for stable networks. |
+ | ZO_PIPELINE_MAX_RETRY_TIME_IN_HOURS | • Defines the longest allowed interval between retry attempts during exponential backoff.• Ensures failed files are retried at least once in the defined period and prevents retries from spacing out indefinitely.• Keep the default 24 hours for typical conditions. Increase to 48 hours if the destination experiences long outages. |
diff --git a/input.css b/input.css
index 2fe164b6..050698a2 100644
--- a/input.css
+++ b/input.css
@@ -109,3 +109,7 @@ main {
-webkit-text-fill-color: transparent;
background-clip: text;
}
+
+.header-list-style {
+ list-style: none;
+}
diff --git a/overrides/css/output.css b/overrides/css/output.css
index a36955eb..e86118ca 100644
--- a/overrides/css/output.css
+++ b/overrides/css/output.css
@@ -944,6 +944,10 @@ video {
gap: 1.5rem;
}
+.tw-gap-y-2 {
+ row-gap: 0.5rem;
+}
+
.tw-space-x-0\.5 > :not([hidden]) ~ :not([hidden]) {
--tw-space-x-reverse: 0;
margin-right: calc(0.125rem * var(--tw-space-x-reverse));
@@ -1555,6 +1559,10 @@ main {
background-clip: text;
}
+.header-list-style {
+ list-style: none;
+}
+
.hover\:tw-border-gray-200:hover {
--tw-border-opacity: 1;
border-color: rgb(229 231 235 / var(--tw-border-opacity, 1));
diff --git a/overrides/partials/footer.html b/overrides/partials/footer.html
index ad633bd1..ec5d3f51 100644
--- a/overrides/partials/footer.html
+++ b/overrides/partials/footer.html
@@ -764,12 +764,12 @@
3000 Sand Hill Rd Building 1, Suite 260, Menlo Park, CA 94025