Skip to content

Commit 77dc50d

Browse files
DebashisBorgohainO2Monil-KTX
authored andcommitted
Create federated search documentation (#216)
1 parent 2539342 commit 77dc50d

File tree

10 files changed

+292
-0
lines changed

10 files changed

+292
-0
lines changed
258 KB
Loading
202 KB
Loading
200 KB
Loading
369 KB
Loading

docs/images/federated-search.png

198 KB
Loading

docs/user-guide/.pages

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ nav:
1616
- Management: management
1717
- Profile: profile
1818
- Performance: performance
19+
- Federated Search: federated-search
1920
- Best Practices: best-practices
2021
- Migration: migration
2122

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
nav:
2+
3+
- Federated Search Overview: index.md
4+
- How to Use Federated Search: how-to-use-federated-search.md
5+
- Federated Search Architecture: federated-search-architecture.md
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
title: Federated Search in OpenObserve - Architecture
3+
description: Technical explanation of OpenObserve deployment modes, normal cluster query execution, and how federated search works across single and multiple clusters.
4+
---
5+
This document explains the technical architecture of OpenObserve deployments, how queries execute in normal clusters, and how [federated search](../) coordinates queries across clusters in a supercluster.
6+
7+
> This feature is available in Enterprise Edition.
8+
9+
## Understanding OpenObserve deployments
10+
Before diving into how federated search works, you need to understand how OpenObserve can be deployed. OpenObserve scales from a single machine to a globally distributed infrastructure.
11+
12+
## Single node deployment
13+
The simplest deployment: one instance of OpenObserve runs all functions on one machine. Data stores locally, and the node processes queries directly. This works for testing or small deployments.
14+
15+
## Single cluster deployment
16+
When you need scale, multiple specialized nodes work together as a cluster. Each node type has a specific role:
17+
18+
- **Router**: Entry point that forwards queries to queriers
19+
- **Querier**: Processes queries in parallel with other queriers
20+
- **Ingester**: Receives and stores data in object storage
21+
- **Compactor**: Optimizes files and enforces retention
22+
- **Alertmanager**: Executes alerts and sends notifications
23+
24+
A single cluster handles more data and provides higher availability than a single node.
25+
26+
## Supercluster deployment
27+
When you need to operate across multiple geographical regions, multiple clusters connect as a supercluster. This is where federated search becomes relevant.
28+
29+
!!! note "Key point"
30+
Each cluster in a supercluster operates independently with its own data storage. Data ingested into one cluster stays in that cluster. However, configuration metadata synchronizes across all clusters, allowing unified management.
31+
32+
## Region and cluster hierarchy
33+
In a supercluster, regions organize clusters geographically. A region may contain one or more clusters.
34+
<br>
35+
**Example:**
36+
<br>
37+
38+
```bash
39+
Region: us-test-3
40+
├─ Cluster: dev3
41+
└─ Cluster: dev3-backup
42+
43+
Region: us-test-4
44+
└─ Cluster: dev4
45+
```
46+
Each cluster has independent data storage. Data stays where it was ingested.
47+
48+
## How queries execute
49+
Understanding query execution helps you understand how federated search works whether querying one cluster or multiple clusters.
50+
51+
### Normal cluster query execution
52+
This section explains how any OpenObserve cluster processes queries internally, regardless of whether it is a standalone cluster or part of a supercluster. Understanding this internal process is essential because:
53+
54+
- This is how standalone clusters work
55+
- This is what happens when you query your current cluster in a supercluster without federated search coordination
56+
- During federated search, each individual cluster uses this same internal process to search its own data
57+
58+
When a cluster receives a query:
59+
60+
1. Router forwards the query to an available querier.
61+
2. That querier becomes the leader querier.
62+
3. Leader querier parses SQL, identifies data files, creates execution plan.
63+
4. Leader querier distributes work among available queriers. These queriers become worker queriers.
64+
5. All worker queriers search their assigned files in parallel.
65+
6. Worker queriers send results to the leader querier.
66+
7. Leader querier merges results and returns final answer.
67+
68+
### Query execution for your current cluster in a supercluster
69+
Your current cluster is the cluster you are logged into. When you select your current cluster from the Region dropdown, this is not federated search.
70+
<br>
71+
For example, if you are logged into Cluster A and you select Cluster A from the Region dropdown, the query executes using the normal cluster query execution process described above. No cross-cluster communication occurs, and no federated search coordination is needed.
72+
73+
### Federated search for one different cluster in a supercluster
74+
When you select a different cluster from the Region dropdown, not the cluster you are logged into, federated search coordination is used:
75+
<br>
76+
77+
**Step 1: Coordination setup**
78+
<br>
79+
Your current cluster becomes the leader cluster.
80+
<br>
81+
82+
**Step 2: Query distribution**
83+
<br>
84+
Leader cluster sends the query to the selected cluster via gRPC.
85+
<br>
86+
87+
**Step 3: Query processing**
88+
<br>
89+
The selected cluster processes the query using its normal cluster query execution process.
90+
<br>
91+
92+
**Step 4: Result return**
93+
<br>
94+
The selected cluster sends its results back to the leader cluster.
95+
<br>
96+
97+
**Step 5: Result presentation**
98+
<br>
99+
The leader cluster displays the results.
100+
101+
### Federated search for multiple clusters in a supercluster
102+
103+
When you select no cluster or multiple clusters from the Region dropdown, federated search extends the query across all selected clusters:
104+
<br>
105+
106+
**Step 1: Coordination setup**
107+
<br>
108+
Your current cluster becomes the leader cluster. The leader cluster identifies all selected clusters, or all clusters if none selected, that contain data for the queried stream. These other clusters become worker clusters.
109+
<br>
110+
111+
**Step 2: Query distribution**
112+
<br>
113+
The leader cluster sends the query to all worker clusters via gRPC. All clusters now have the same query to execute.
114+
<br>
115+
116+
**Step 3: Parallel processing**
117+
<br>
118+
Each cluster processes the query using its normal cluster query execution process. The leader cluster searches its own data if it contains data for that stream. Worker clusters search their own data. All processing happens simultaneously.
119+
<br>
120+
121+
**Step 4: Result aggregation**
122+
<br>
123+
Each cluster aggregates its own results internally using its leader querier and worker queriers. Worker clusters send their aggregated results to the leader cluster. The leader cluster merges all results from all clusters and returns the unified response.
124+
125+
## Metadata synchronization
126+
In a supercluster, clusters share configuration and schema information in real-time while keeping actual data separate. This synchronization happens via NATS, a messaging system that coordinates communication between clusters.
127+
<br>
128+
While stream schemas are synchronized across all clusters in real-time, the actual data for a stream only exists in the cluster or clusters where it was ingested.
129+
130+
| **Synchronized across clusters** | **NOT synchronized (stays local)** |
131+
|----------------------------------|-----------------------------------|
132+
| Schema definitions | Log data |
133+
| User-defined functions | Metric data |
134+
| Dashboards and folders | Trace data |
135+
| Alerts and notifications | Raw ingested data |
136+
| Scheduled tasks and reports | Parquet files and WAL files |
137+
| User and organization settings | Search indices |
138+
| System configurations | |
139+
| Job metadata | |
140+
| Enrichment metadata | |
141+
142+
This design maintains data residency compliance while enabling unified configuration management.
143+
144+
## Limitations
145+
146+
**No cluster identification in results:** Query results do not indicate which cluster provided specific data. To identify the source, query each cluster individually.
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: Federated Search in OpenObserve - How-to Guide
3+
description: Step-by-step instructions for querying your current cluster and performing federated searches across one or more clusters in a supercluster setup.
4+
---
5+
This document explains how to query your current cluster and how to perform [federated searches](../) across one or more different clusters in a supercluster setup.
6+
> This feature is available in Enterprise Edition.
7+
8+
## How to query your current cluster in a supercluster
9+
10+
Query your current cluster when you know the data is in your cluster or when you need the fastest query performance.
11+
12+
!!! note "What you need to know:"
13+
14+
- This is not federated search
15+
- You are querying the current cluster.
16+
- No cross-cluster communication occurs.
17+
- Results will include data from the current cluster only.
18+
<br>
19+
**Steps:**
20+
![current-cluster-query](current-cluster-query.png)
21+
22+
1. Navigate to the **Logs** page.
23+
2. Enter your query in the SQL Query Editor.
24+
3. Select a time range.
25+
4. Select one specific cluster from the **Region** dropdown.
26+
5. Select **Run query**.
27+
28+
> For detailed explanation, see **Normal cluster query execution** in the [Federated Search Architecture](../federated-search/federated-search-architecture/) page.
29+
<br>
30+
31+
**Result**<br>
32+
Data from the selected cluster only.
33+
![current-cluster-query-result](current-cluster-query-result.png)
34+
35+
36+
## How to query one or more different clusters in a supercluster
37+
38+
Use federated search when you need data from multiple clusters.
39+
40+
!!! note "What you need to know"
41+
42+
- Multiple clusters will process your query simultaneously.
43+
- Results will combine data from all selected clusters.
44+
45+
**Steps**
46+
<br>
47+
![federated-search](federated-search.png)
48+
49+
1. Navigate to the **Logs** page.
50+
2. Enter your query in the SQL Query Editor.
51+
3. Select a time range.
52+
4. Leave the **Region** dropdown unselected, or select multiple clusters.
53+
5. Select **Run query**.
54+
55+
> For detailed explanation, see **Federated search for one different cluster** and **Federated search for multiple clusters** in the [Federated Search Architecture](../federated-search-architecture/) page.
56+
<br>
57+
58+
**Result**<br>
59+
Combined data from all selected clusters.
60+
![federated-search-result](federated-search-result.png)
61+
## Region selection reference
62+
63+
Use this quick reference to understand how region selection affects query execution:
64+
65+
| **Region/Cluster Selection** | **Behavior** | **Query Type** | **Communication** |
66+
|------------------------------|--------------|----------------|-------------------|
67+
| None selected | Queries all clusters | Federated search | Cross-cluster via gRPC |
68+
| Your current cluster selected | Queries only your current cluster | Normal cluster query (NOT federated) | Internal only, no cross-cluster |
69+
| One different cluster selected (same region) | Queries only that cluster | Federated search | Cross-cluster via gRPC |
70+
| One different cluster selected (different region) | Queries only that cluster | Federated search | Cross-cluster via gRPC |
71+
| Multiple clusters selected | Queries all selected clusters | Federated search | Cross-cluster via gRPC |
72+
73+
74+
**Next step**
75+
76+
- [Federated Search Architecture](../federated-search-architecture/)
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
title: Federated Search in OpenObserve - Overview
3+
description: Learn what federated search is, key concepts, prerequisites, and when to use it.
4+
---
5+
This document provides an overview of federated search in OpenObserve.
6+
7+
> This feature is available in Enterprise Edition.
8+
9+
## What is federated search?
10+
11+
Federated search enables querying across multiple OpenObserve clusters that are connected as a supercluster, all from one interface.
12+
<br>
13+
14+
Without federated search, investigating issues across regions requires logging into each cluster separately, running the same query multiple times, and manually combining results. This wastes time during critical incidents.
15+
With federated search, you query once and receive unified results from all clusters.
16+
17+
!!! note "Prerequisites"
18+
19+
- OpenObserve Enterprise edition
20+
- Multiple clusters configured as a supercluster
21+
22+
## How to verify if your environment is in a supercluster
23+
Check whether the Region dropdown appears on the Logs page. If visible, your clusters are configured as a supercluster.
24+
![federated-search](../../images/federated-search.png)
25+
26+
## Key concepts in federated search
27+
28+
Before using federated search, understand these core concepts:
29+
30+
- **Node:** A single instance of OpenObserve running on one machine or server.
31+
- **Cluster:** A group of OpenObserve nodes working together to handle data ingestion, storage, and querying. Each cluster has its own data storage.
32+
- **Region:** A geographical location that contains one or more clusters. For example, Region us-east may contain cluster prod-east-1 and cluster prod-east-2.
33+
- **Supercluster:** Multiple OpenObserve clusters across different geographical regions connected to work as a unified system. This enables federated search capability.
34+
- **Data distribution:** Data ingested into a specific cluster stays in that cluster's storage. It is not replicated to other clusters. This ensures data residency compliance.
35+
- **Metadata synchronization:** Configuration information such as schemas, dashboards, and alerts synchronize across all clusters in a supercluster. This allows unified management while keeping data distributed.
36+
- **Federated search:** The capability to query data across different clusters in a supercluster. Federated search activates when you:
37+
38+
- Select one or more different clusters, meaning clusters other than your current cluster: The selected clusters' data is searched via federated coordination.
39+
- Select none: All clusters search simultaneously via federated coordination and results are combined.
40+
41+
> **Important**: Querying your current cluster uses normal cluster query execution, not federated search architecture.
42+
43+
> For detailed technical explanations of deployment modes, architecture, and how queries execute, see the [Federated Search Architecture](../federated-search-architecture/) page.
44+
45+
## When to use federated search
46+
47+
| **Use case** | **Cluster selection** | **Reason** |
48+
|--------------|----------------------|------------|
49+
| Data is in one specific different cluster | Select that different cluster | Access only that cluster's data via federated search |
50+
| Multi-region deployments | Select none or multiple clusters | Query all regions at once via federated search |
51+
| Centralized search across teams | Select none or multiple clusters | Unified visibility across all clusters via federated search |
52+
53+
54+
## When not to use federated search
55+
56+
| **Use case** | **Cluster selection** | **Reason** |
57+
|--------------|----------------------|------------|
58+
| Data is in your current cluster | Select your current cluster | Uses normal cluster query without cross-cluster communication |
59+
60+
61+
**Next steps**
62+
63+
- [How to Use Federated Search](../how-to-use-federated-search/)
64+
- [Federated Search Architecture](../federated-search-architecture/)

0 commit comments

Comments
 (0)