diff --git a/_install-and-configure/configuring-opensearch/discovery-gateway-settings.md b/_install-and-configure/configuring-opensearch/discovery-gateway-settings.md index 8aded3b039b..49ee631bce4 100644 --- a/_install-and-configure/configuring-opensearch/discovery-gateway-settings.md +++ b/_install-and-configure/configuring-opensearch/discovery-gateway-settings.md @@ -13,66 +13,7 @@ To learn more about static and dynamic settings, see [Configuring OpenSearch]({{ ## Discovery settings -The discovery process is used when a cluster is formed. It consists of discovering nodes and electing a cluster manager node. - -### Static discovery settings - -The following **static** discovery settings must be configured before a cluster starts: - -- `discovery.seed_hosts` (Static, list): Provides a list of the addresses of the cluster-manager-eligible nodes in the cluster. Each address has the format `host:port` or `host`. If a hostname resolves to multiple addresses via DNS, OpenSearch uses all of them. This setting is essential in order for nodes to find each other during cluster formation. Default is `["127.0.0.1", "[::1]"]`. - -- `discovery.seed_providers` (Static, list): Specifies which types of seed hosts provider to use to obtain the addresses of the seed nodes used to start the discovery process. By default, this uses the settings-based seed hosts provider, which obtains seed node addresses from the `discovery.seed_hosts` setting. - -- `discovery.type` (Static, string): Specifies whether OpenSearch should form a multiple-node cluster or operate as a single node. When set to `single-node`, OpenSearch forms a single-node cluster and suppresses certain timeouts. This setting is useful for development and testing environments. Valid values are `multi-node` (default) and `single-node`. - -- `cluster.initial_cluster_manager_nodes` (Static, list): Establishes the initial set of cluster-manager-eligible nodes in a new cluster. This setting is required when bootstrapping a cluster for the first time and should contain the node names (as defined by `node.name`) of the initial cluster-manager-eligible nodes. This list should be empty for nodes joining an existing cluster. Default is `[]` (empty list). - - -### Dynamic discovery settings - -The following **dynamic** discovery settings can be updated while the cluster is running: - -- `cluster.auto_shrink_voting_configuration` (Dynamic, Boolean): Controls whether the voting configuration automatically shrinks when nodes are removed from the cluster. If `true`, the voting configuration adjusts to maintain optimal cluster manager election behavior by removing nodes that are no longer part of the cluster. If `false`, you must remove the nodes that are no longer part of the cluster using the [Voting Configuration Exclusions API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-voting-configuration-exclusions/). Default is `true`. - -- `cluster.max_voting_config_exclusions` (Dynamic, integer): Sets the maximum number of voting configuration exclusions that can be in place simultaneously during cluster manager node operations. This setting is used during node removal and cluster maintenance operations to temporarily exclude nodes from voting. Default is `10`. - -### Static cluster coordination settings - -The following cluster coordination settings control cluster formation and node joining behavior: - -- `cluster.join.timeout` (Static, time unit): The amount of time a node waits after sending a request to join a cluster before it considers the request to have failed and retries. This timeout does not apply when `discovery.type` is set to `single-node`. Default is `60s`. - -- `cluster.publish.info_timeout` (Static, time unit): The amount of time the cluster manager node waits for each cluster state update to be completely published to all nodes before logging a message indicating that some nodes are responding slowly. This setting helps identify slow-responding nodes during cluster state updates. Default is `10s`. - -### Cluster election settings - -The following settings control cluster manager election behavior: - -- `cluster.election.back_off_time` (Static, time unit): Sets the incremental delay added to election retry attempts after each failure. Uses linear backoff, in which each failed election increases the wait time by this amount before the next attempt. Default is `100ms`. **Warning**: Changing this setting from the default may cause your cluster to fail to elect a cluster manager node. - -- `cluster.election.duration` (Static, time unit): Sets how long each election is allowed to take before a node considers it to have failed and schedules a retry. This controls the maximum duration of the election process. Default is `500ms`. **Warning**: Changing this setting from the default may cause your cluster to fail to elect a cluster manager node. - -- `cluster.election.initial_timeout` (Static, time unit): Sets the upper bound for how long a node will wait initially, or after the elected cluster manager fails, before attempting its first election. This controls the initial election delay. Default is `100ms`. **Warning**: Changing this setting from the default may cause your cluster to fail to elect a cluster manager node. - -- `cluster.election.max_timeout` (Static, time unit): Sets the maximum upper bound for how long a node will wait before attempting an election, preventing excessively sparse elections during long network partitions. This caps the maximum election delay. Default is `10s`. **Warning**: Changing this setting from the default may cause your cluster to fail to elect a cluster manager node. - -### Expert-level discovery settings - -The following discovery settings are for expert-level configuration. **Warning**: Changing these settings from their defaults may cause cluster instability: - -- `discovery.cluster_formation_warning_timeout` (Static, time unit): Sets how long a node will try to form a cluster before logging a warning that the cluster did not form. If a cluster has not formed after this timeout has elapsed, the node will log a warning message that starts with the phrase "cluster manager not discovered" and describes the current state of the discovery process. Default is `10s`. - -- `discovery.find_peers_interval` (Static, time unit): Sets how long a node will wait before attempting another discovery round. This controls the frequency of peer discovery attempts during cluster formation. Default is `1s`. - -- `discovery.probe.connect_timeout` (Static, time unit): Sets how long to wait when attempting to connect to each address during node discovery. This timeout applies to the initial connection attempt to potential cluster members. Default is `3s`. - -- `discovery.probe.handshake_timeout` (Static, time unit): Sets how long to wait when attempting to identify the remote node via a handshake during the discovery process. This timeout applies to the node identification phase after a successful connection. Default is `1s`. - -- `discovery.request_peers_timeout` (Static, time unit): Sets how long a node will wait after asking its peers for information before considering the request to have failed. This timeout applies to peer information requests during the discovery process. Default is `3s`. - -- `discovery.seed_resolver.max_concurrent_resolvers` (Static, integer): Specifies how many concurrent DNS lookups to perform when resolving the addresses of seed nodes during cluster discovery. This setting controls the parallelism of DNS resolution for seed hosts. Default is `10`. - -- `discovery.seed_resolver.timeout` (Static, time unit): Specifies how long to wait for each DNS lookup performed when resolving the addresses of seed nodes. This timeout applies to individual DNS resolution operations during cluster discovery. Default is `5s`. +The discovery process is used when a cluster is formed. It consists of discovering nodes and electing a cluster manager node. For comprehensive information about discovery and cluster formation settings, see [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/). ## Gateway settings diff --git a/_tuning-your-cluster/discovery-cluster-formation/bootstrapping.md b/_tuning-your-cluster/discovery-cluster-formation/bootstrapping.md new file mode 100644 index 00000000000..6a7c3971574 --- /dev/null +++ b/_tuning-your-cluster/discovery-cluster-formation/bootstrapping.md @@ -0,0 +1,222 @@ +--- +layout: default +title: Cluster bootstrapping +parent: Discovery and cluster formation +nav_order: 40 +--- + +# Cluster bootstrapping + +When starting an OpenSearch cluster for the very first time, you must explicitly define the initial set of cluster-manager-eligible nodes that will participate in the first cluster manager election. This process is called _cluster bootstrapping_ and is critical for preventing split-brain scenarios during initial cluster formation. + +Cluster bootstrapping is required in the following situations: + +- Starting a brand-new cluster for the very first time +- No existing cluster state exists on any node +- Initial cluster manager election needs to take place + +Bootstrapping is not required in the following situations: + +- Nodes joining an existing cluster - they get configuration from the current cluster manager +- Cluster restarts - nodes that have previously joined a cluster store the necessary information +- Full cluster restarts - existing cluster state is preserved and used for recovery + +## Configuring the bootstrap nodes + +Use the `cluster.initial_cluster_manager_nodes` setting to define which nodes should participate in the initial cluster manager election. Set this configuration in `opensearch.yml` on each cluster-manager-eligible node: + +```yaml +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + - cluster-manager-3 +``` +{% include copy.html %} + +Alternatively, you can specify the bootstrap configuration when starting OpenSearch: + +```bash +./bin/opensearch -Ecluster.initial_cluster_manager_nodes=cluster-manager-1,cluster-manager-2,cluster-manager-3 +``` +{% include copy.html %} + +You can identify nodes in the bootstrap configuration using any of these methods: + +1. Use the value of `node.name` (recommended): + + ```yaml + cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + ``` + {% include copy.html %} + +2. Use the node's hostname if `node.name` is not explicitly set: + + ```yaml + cluster.initial_cluster_manager_nodes: + - server1.example.com + - server2.example.com + ``` + {% include copy.html %} + +3. Use the node's publish IP address: + + ```yaml + cluster.initial_cluster_manager_nodes: + - 192.168.1.10 + - 192.168.1.11 + ``` + {% include copy.html %} + +4. Use the node's IP address and port when multiple nodes share the same IP: + + ```yaml + cluster.initial_cluster_manager_nodes: + - 192.168.1.10:9300 + - 192.168.1.10:9301 + ``` + {% include copy.html %} + +## Critical bootstrapping requirements + +Proper bootstrapping ensures that all cluster-manager-eligible nodes start with a consistent and accurate configuration, preventing cluster splits and ensuring a stable initial election process. + +### Identical configuration across all nodes + +All cluster-manager-eligible nodes must have the same `cluster.initial_cluster_manager_nodes` setting. This ensures that only one cluster forms during bootstrapping. + +**Correct configuration**: + +```yaml +# Node 1 +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + - cluster-manager-3 + +# Node 2 +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + - cluster-manager-3 + +# Node 3 +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + - cluster-manager-3 +``` +{% include copy.html %} + +**Incorrect configuration**: + +```yaml +# Node 1 – different list +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + +# Node 2 – different list +cluster.initial_cluster_manager_nodes: + - cluster-manager-2 + - cluster-manager-3 +``` + +When nodes have inconsistent bootstrap lists, multiple independent clusters may form. + +### Exact name matching + +Node names in the bootstrap configuration must exactly match each node's `node.name` value. + +**Common naming issues**: + +* If a node's name is `server1.example.com`, the bootstrap list must also use `server1.example.com`, not `server1`. +* Node names are case-sensitive. +* The names must match exactly, with no added characters or whitespace. + +If a node's name does not exactly match an entry in the bootstrap configuration, the log will contain an error message. In this example, the node name `cluster-manager-1.example.com` does not match the bootstrap entry `cluster-manager-1`: + +``` +[cluster-manager-1.example.com] cluster manager not discovered yet, this node has +not previously joined a bootstrapped cluster, and this node must discover +cluster-manager-eligible nodes [cluster-manager-1, cluster-manager-2] to +bootstrap a cluster: have discovered [{cluster-manager-2.example.com}...] +``` + +## Naming your cluster + +Choose a descriptive cluster name to distinguish your cluster from others: + +```yaml +cluster.name: production-search-cluster +``` +{% include copy.html %} + +When naming your cluster, follow these guidelines: + +- Each cluster must have a unique name to avoid conflicts. + +- Ensure that all nodes verify that the cluster name matches before joining. + +- Avoid the default `opensearch` name in production environments. + +- Choose descriptive names that reflect the cluster's purpose. + +## Development mode auto-bootstrapping + +OpenSearch can automatically bootstrap clusters in development environments under the following conditions: + +- No discovery settings are explicitly configured. +- Multiple nodes are running on the same machine. +- OpenSearch detects that it is running in a development environment. + +### Settings that disable auto-bootstrapping + +If any of these settings are configured, you must explicitly configure `cluster.initial_cluster_manager_nodes`: + +- `discovery.seed_providers` +- `discovery.seed_hosts` +- `cluster.initial_cluster_manager_nodes` + +### Auto-bootstrapping limitations + +Auto-bootstrapping is intended only for development. Do not use it in production because: + +- Nodes may not discover each other quickly enough, leading to delays. + +- Network conditions can cause discovery to fail. + +- Behavior can be unpredictable and is not guaranteed. + +- There is a risk of forming multiple clusters, resulting in split-brain scenarios. + +## Troubleshooting bootstrap issues + +If you accidentally start nodes on different hosts without proper configuration, they may form separate clusters. You can detect this by checking cluster UUIDs: + +```bash +curl -X GET "localhost:9200/" +``` +{% include copy.html %} + +If each node reports a different `cluster_uuid`, they belong to separate clusters. To correct this and form a single cluster, use the following steps: + +1. Stop all nodes. +2. Delete all data from each node's data directory. +3. Configure proper bootstrap settings. +4. Restart all nodes and verify single cluster formation. + +## Bootstrap verification + +After starting your cluster, verify successful bootstrap using the [monitoring commands]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/#monitoring-discovery-and-cluster-formation) for checking cluster health and formation: + +- Verify cluster health status and node count. +- Confirm that one node is elected as cluster manager. +- Ensure that all nodes report the same cluster UUID. + +## Related documentation + +- [Voting configuration management]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/voting-configuration/): How OpenSearch manages voting after bootstrap +- [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/): Complete settings reference +- [Creating a cluster]({{site.url}}{{site.baseurl}}/tuning-your-cluster/): Step-by-step cluster setup guide \ No newline at end of file diff --git a/_tuning-your-cluster/discovery-cluster-formation/discovery.md b/_tuning-your-cluster/discovery-cluster-formation/discovery.md new file mode 100644 index 00000000000..6f648275ee3 --- /dev/null +++ b/_tuning-your-cluster/discovery-cluster-formation/discovery.md @@ -0,0 +1,161 @@ +--- +layout: default +title: Node discovery and seed hosts +parent: Discovery and cluster formation +nav_order: 10 +--- + +# Node discovery and seed hosts + +Node discovery is the process by which OpenSearch nodes locate and connect to other nodes to form or join a cluster. This process is essential when starting a node for the first time or when a node loses connection to the cluster manager and needs to rejoin the cluster. + +The discovery process operates in two distinct phases: + +1. **Initial seed discovery**: Each starting node connects to a predefined list of seed addresses and attempts to identify whether the nodes at those addresses are cluster-manager-eligible. + +2. **Peer discovery**: Once connected to seed nodes, the node exchanges lists of known cluster-manager-eligible peers. This creates a cascading discovery process where each newly discovered node provides additional peer information. + +The discovery process continues until one of the following conditions is met: + +- **For non-cluster-manager-eligible nodes**: Discovery continues until an elected cluster manager is found. +- **For cluster-manager-eligible nodes**: Discovery continues until either an elected cluster manager is found or enough cluster-manager-eligible nodes are discovered to complete a cluster manager election. + +If neither condition is met within the configured time, the node retries the discovery process after the interval specified by `discovery.find_peers_interval` (default is `1s`). + +## Seed hosts providers + +OpenSearch uses _seed hosts providers_ to supply the initial list of addresses for node discovery. These providers define how nodes obtain the seed addresses needed to start the discovery process. + +You can configure seed hosts providers using the `discovery.seed_providers` setting, which accepts a list of provider types. This allows you to combine multiple discovery methods for your cluster. The default provider is `settings`, which uses static configuration. + +### Settings-based seed hosts provider + +The settings-based provider uses static configuration to define a list of seed node addresses. This is the most common approach for on-premises deployments with known node addresses. + +Configure seed hosts using the `discovery.seed_hosts` setting in `opensearch.yml`: + +```yaml +discovery.seed_hosts: + - 192.168.1.10:9300 + - 192.168.1.11 # Port defaults to transport.port + - seeds.example.com # DNS hostnames are resolved +``` +{% include copy.html %} + +Each seed host address can be specified in the following ways. + +| Format | Example | Notes | +| ----------------------- | ------------------------ | ---------------------------------------- | +| IP address with port | `192.168.1.10:9300` | Specifies a custom transport port | +| IP address without port | `192.168.1.11` | Uses the default transport port | +| Hostname with port | `node1.example.com:9300` | Specifies a custom transport port | +| Hostname without port | `node1.example.com` | Uses the default transport port | +| IPv6 address | `[2001:db8::1]:9300` | Brackets are required for IPv6 addresses | + +When no port is specified, OpenSearch uses the first port from these settings in order: + +1. `transport.profiles.default.port` +2. `transport.port` + +If neither setting is configured, the default port `9300` is used. + +When you specify hostnames as seed addresses OpenSearch performs the following DNS resolution steps: + +- OpenSearch performs DNS lookups to resolve hostnames to IP addresses. +- If a hostname resolves to multiple IP addresses, OpenSearch attempts to connect to all resolved addresses. +- DNS lookups are subject to JVM DNS caching settings. +- Resolution occurs during each discovery round, allowing for dynamic IP changes. + +The DNS resolution behavior is controlled by these settings: + +- `discovery.seed_resolver.max_concurrent_resolvers`: Maximum concurrent DNS lookups (default is `10`). +- `discovery.seed_resolver.timeout`: Timeout for each DNS lookup (default is `5s`). + +### File-based seed hosts provider + +The file-based provider reads seed host addresses from an external file, allowing for dynamic updates without restarting nodes. This is particularly useful in containerized environments where IP addresses may not be known at startup. + +Enable the file-based provider in `opensearch.yml`: + +```yaml +discovery.seed_providers: file +``` +{% include copy.html %} + +You can also combine it with the settings-based provider: + +```yaml +discovery.seed_providers: [settings, file] +``` +{% include copy.html %} + +Create a file named `unicast_hosts.txt` in your OpenSearch configuration directory (`$OPENSEARCH_PATH_CONF/unicast_hosts.txt`). The file should follow this format: + +``` +# Static IP addresses +10.0.1.10 +10.0.1.11:9305 + +# Hostnames +node1.example.com +node2.example.com:9301 + +# IPv6 addresses (brackets required) +[2001:db8::1]:9300 +[2001:db8::2] + +# Comments start with # and must be on separate lines +# This is a comment +``` +{% include copy.html %} + +The file should follow this format: + +- Each line contains a single host address. +- Specify a `host:port` or just `host` (uses default port). +- Lines starting with `#` are treated as comments. +- IPv6 addresses must be enclosed in brackets, with optional port after the brackets. +- Empty lines are ignored. + +OpenSearch automatically detects changes to the `unicast_hosts.txt` file and reloads the seed host list without requiring a node restart. This allows you to: + +- Add new seed hosts as your cluster grows. +- Remove decommissioned nodes from the seed list. +- Update IP addresses after infrastructure changes. + +Note that file-based discovery supplements (rather than replaces) any seed hosts configured in the `discovery.seed_hosts` setting. + +## Configuration examples + +The following examples demonstrate how to configure different discovery mechanisms. + +### Combining discovery providers + +You can use multiple seed host providers simultaneously: + +```yaml +discovery.seed_providers: [settings, file] +discovery.seed_hosts: + - 10.0.1.10:9300 # Always include this seed host +# Additional hosts loaded from unicast_hosts.txt +``` +{% include copy.html %} + +This configuration uses both static seed hosts and dynamically loaded hosts from a file. + +### Single-node development setup + +For development or testing environments: + +```yaml +discovery.type: single-node +``` +{% include copy.html %} + +When `discovery.type` is set to `single-node`, OpenSearch bypasses the normal discovery process and forms a single-node cluster immediately. + +## Related documentation + +- To troubleshoot discovery issues, use the monitoring commands detailed in the [Discovery and cluster formation]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/#monitoring-discovery-and-cluster-formation) overview. + +- For a complete list of discovery-related settings, see [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/). diff --git a/_tuning-your-cluster/discovery-cluster-formation/index.md b/_tuning-your-cluster/discovery-cluster-formation/index.md new file mode 100644 index 00000000000..2076de16c40 --- /dev/null +++ b/_tuning-your-cluster/discovery-cluster-formation/index.md @@ -0,0 +1,97 @@ +--- +layout: default +title: Discovery and cluster formation +nav_order: 5 +has_children: true +permalink: /tuning-your-cluster/discovery-cluster-formation/ +--- + +# Discovery and cluster formation + +Discovery and cluster formation are fundamental processes in OpenSearch that enable nodes to locate each other, elect a cluster manager, establish a functioning cluster, and maintain coordination as the cluster state evolves. Understanding these mechanisms is essential for configuring reliable, well-performing OpenSearch clusters. + +When you start an OpenSearch cluster, several coordinated processes work together: + +- **Node discovery**: Nodes locate and identify other nodes in the network that should be part of the cluster. +- **Cluster manager election**: Eligible nodes participate in selecting a cluster manager node through a consensus-based voting process. +- **Cluster formation**: Once a cluster manager is elected, the cluster state is established and nodes join the cluster. +- **State management**: The cluster manager maintains and distributes the authoritative cluster state to all nodes. +- **Health monitoring**: Nodes continuously monitor each other's health and detect failures. + +All inter-node communication for these processes uses OpenSearch's [transport layer]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/), ensuring secure and efficient data exchange. + +## Discovery process + +Discovery is how nodes find other nodes when starting up or when connection to the cluster manager is lost. This process involves: + +1. **Seed hosts**: A configurable list of known node addresses that serve as entry points for discovery. +2. **Host providers**: Mechanisms for supplying seed host information, including static configuration and dynamic cloud-based discovery. +3. **Node identification**: Verification that discovered nodes are eligible to participate in the cluster. + +## Cluster manager election + +OpenSearch uses a sophisticated voting mechanism to ensure exactly one cluster manager exists at any time: + +- **Voting configuration**: The set of cluster-manager-eligible nodes that participate in elections. +- **Quorum requirements**: Elections require a majority of voting nodes to prevent split-brain scenarios. +- **Automatic reconfiguration**: The voting configuration adjusts as nodes join and leave the cluster. + +## Cluster state management + +The elected cluster manager is responsible for: + +- Maintaining the definitive cluster state (for example, node membership, index metadata, and shard allocation). +- Publishing state updates to all nodes in the cluster. +- Coordinating shard allocation and rebalancing. +- Managing cluster-wide settings and policies. + +## Core components + +The following topics provide detailed guidance on each stage of discovery and cluster formation: + +[Node discovery and seed hosts]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/discovery/): Learn how OpenSearch nodes discover each other through seed host providers and configure static or dynamic host discovery. + +[Voting and quorum]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/voting-quorums/): Understand how OpenSearch uses quorum-based voting to elect cluster managers and prevent split-brain conditions. + +[Voting configuration management]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/voting-configuration/): Learn how OpenSearch automatically manages voting configurations and handles bootstrap requirements for new clusters. + +[Cluster bootstrapping]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/bootstrapping/): Configure initial cluster startup and learn the requirements for safely bringing up a new cluster. + +[Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/): Complete reference for all configuration options that control discovery and cluster formation behavior, including fault detection and cluster state publishing settings. + +## Monitoring discovery and cluster formation + +You can use these API commands to monitor cluster formation and health. + +### Check cluster health + +```json +GET /_cluster/health +``` +{% include copy-curl.html %} + +Returns the health status of the cluster, including the number of nodes and shard information. + +### View cluster nodes + +```json +GET /_cat/nodes +``` +{% include copy-curl.html %} + +Returns information about nodes in the cluster, including roles and which node is the cluster manager. + +### Check voting configuration + +```json +GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config +``` +{% include copy-curl.html %} + +Returns the current voting configuration, showing which nodes participate in cluster manager elections. + +## Next steps + +- Start with [Node discovery and seed hosts]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/discovery/) to understand the foundation of cluster formation +- Review [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/) for configuration options +- See [Creating a cluster]({{site.url}}{{site.baseurl}}/tuning-your-cluster/) for hands-on cluster setup guidance \ No newline at end of file diff --git a/_tuning-your-cluster/discovery-cluster-formation/settings.md b/_tuning-your-cluster/discovery-cluster-formation/settings.md new file mode 100644 index 00000000000..19a59e3f7b5 --- /dev/null +++ b/_tuning-your-cluster/discovery-cluster-formation/settings.md @@ -0,0 +1,216 @@ +--- +layout: default +title: Discovery and cluster formation settings +parent: Discovery and cluster formation +nav_order: 100 +--- + +# Discovery and cluster formation settings + +This page provides a comprehensive reference for all settings that control discovery and cluster formation behavior in OpenSearch. These settings determine how nodes find each other, elect a cluster manager, and maintain cluster coordination. + +## Core discovery settings + +The following settings control the fundamental discovery process: + +- `discovery.seed_hosts` (Static, list): Provides a list of addresses for cluster-manager-eligible nodes in the cluster. This setting is essential for nodes to find each other during cluster formation. Each address can be specified as `host:port` or just `host`. The `host` can be a hostname (resolved via DNS - if multiple IPs are resolved, OpenSearch attempts to connect to all), IPv4 address, or IPv6 address (must be enclosed in square brackets). If no port is specified, OpenSearch determines the port by checking `transport.profiles.default.port`, then `transport.port`. If neither is configured, the default port `9300` is used. Default is `["127.0.0.1", "[::1]"]`. + +- `discovery.seed_providers` (Static, list): Specifies which seed hosts providers to use for obtaining seed node addresses during discovery. Available providers are `settings` (uses addresses from the `discovery.seed_hosts` setting) and `file` (reads addresses from the `unicast_hosts.txt` file). You can specify multiple providers to combine discovery methods. Default is `["settings"]`. + +- `discovery.type` (Static, string): Specifies whether OpenSearch should form a multiple-node cluster or operate as a single node. When set to `single-node`, OpenSearch forms a single-node cluster and suppresses certain timeouts. This setting is useful for development and testing environments. Valid values are `multi-node` (default) and `single-node`. + +- `cluster.initial_cluster_manager_nodes` (Static, list): Sets the initial cluster-manager-eligible nodes for bootstrapping a brand-new cluster. This setting is required when bootstrapping a cluster for the first time and should contain the node names (as defined by `node.name`) of the initial cluster-manager-eligible nodes. This list should be empty for nodes joining an existing cluster. Default is `[]` (empty). + +## Discovery process settings + +These settings control timing and behavior during the discovery process: + +- `discovery.find_peers_interval` (Static, time unit): Sets how long a node waits before attempting another discovery round when the initial attempt fails. Default is `1s`. + +- `discovery.cluster_formation_warning_timeout` (Static, time unit): Sets how long a node attempts to form a cluster before logging a warning message. The warning will start with "cluster manager not discovered" and describe the current discovery state. Default is `10s`. + +### DNS resolution settings + +The following settings control DNS lookup behavior for seed hosts: + +- `discovery.seed_resolver.max_concurrent_resolvers` (Static, integer): Specifies how many concurrent DNS lookups to perform when resolving seed node addresses. Default is `10`. + +- `discovery.seed_resolver.timeout` (Static, time unit): Specifies the timeout for each DNS lookup when resolving seed node addresses. Default is `5s`. + +### Connection settings + +The following settings control connection attempts during discovery: + +- `discovery.probe.connect_timeout` (Static, time unit): Sets the timeout when attempting to connect to each address during discovery. Default is `3s`. + +- `discovery.probe.handshake_timeout` (Static, time unit): Sets the timeout when attempting to identify a remote node via handshake during discovery. Default is `1s`. + +- `discovery.request_peers_timeout` (Static, time unit): Sets how long a node waits for peer information requests during discovery before considering the request failed. Default is `3s`. + +## Cluster manager election settings + +These settings control the cluster manager election process: + +- `cluster.election.back_off_time` (Static, time unit): Sets the incremental delay added after each election failure (linear backoff). Each failed election increases the wait time by this amount before the next attempt. Default is `100ms`. **Warning**: Changing this from the default may prevent cluster manager election. + +- `cluster.election.duration` (Static, time unit): Sets the maximum duration allowed for each election attempt before considering it failed and scheduling a retry. Default is `500ms`. **Warning**: Changing this from the default may prevent cluster manager election. + +- `cluster.election.initial_timeout` (Static, time unit): Sets the initial upper bound for how long a node waits before attempting its first election, either at startup or after the current cluster manager fails. Default is `100ms`. **Warning**: Changing this from the default may prevent cluster manager election. + +- `cluster.election.max_timeout` (Static, time unit): Sets the maximum upper bound for election delays to prevent excessively sparse elections during long network partitions. Default is `10s`. **Warning**: Changing this from the default may prevent cluster manager election. + +## Voting configuration settings + +The following settings control the voting mechanism for cluster manager elections: + +- `cluster.auto_shrink_voting_configuration` (Dynamic, boolean): Controls whether the voting configuration automatically removes departed nodes, provided at least 3 nodes remain. When set to `false`, you must manually remove departed nodes using the voting configuration exclusions API. Default is `true`. + +- `cluster.max_voting_config_exclusions` (Dynamic, integer): Sets the maximum number of voting configuration exclusions allowed simultaneously. This is used during cluster manager node maintenance operations. Default is `10`. + +## Fault detection settings + +OpenSearch continuously monitors cluster health through two types of health checks: + +- [Follower checks](#follower-checks) (sent by the cluster manager to non-cluster-manager nodes) +- [Leader checks](#leader-checks) (sent by non-cluster-manager nodes to the cluster manager) + +OpenSearch allows occasional check failures and uses the following guidelines for taking action: + +- Transient issues (single check failures) are ignored; multiple consecutive failures are required for action. +- Network disconnects trigger immediate response. +- All timeouts and retry counts are configurable. + +### Follower checks + +The elected cluster manager periodically checks each node in the cluster: + +1. Sends periodic health check requests to all nodes. +2. Waits for responses within the configured timeout. +3. Tracks consecutive check failures for each node. +4. Removes nodes that fail consecutive checks (based on retry count). + +If the cluster manager detects a node has disconnected (network-level disconnect), it bypasses the timeout and retry settings and immediately attempts to remove the node from the cluster. + +### Leader checks + +Each non-cluster-manager node periodically checks the health of the elected cluster manager: + +1. Send periodic health check requests to the cluster manager. +2. Wait for responses within the configured timeout. +3. Track consecutive check failures for the cluster manager. +4. Start new cluster manager election if consecutive checks fail. + +If a node detects the cluster manager has disconnected, it bypasses timeout and retry settings and immediately restarts its discovery phase to find or elect a new cluster manager. + +The following settings control health monitoring and failure detection. + +### Follower check settings + +These settings control how the cluster manager monitors other nodes: + +- `cluster.fault_detection.follower_check.interval` (Static, time unit): Sets the interval between follower checks from the cluster manager to other nodes. Default is `1s`. **Warning**: Changing this may cause cluster instability. + +- `cluster.fault_detection.follower_check.timeout` (Static, time unit): Sets how long the cluster manager waits for a response to follower checks before considering the check failed. Default is `10s`. **Warning**: Changing this may cause cluster instability. + +- `cluster.fault_detection.follower_check.retry_count` (Static, integer): Sets how many consecutive follower check failures must occur before the cluster manager considers a node faulty and removes it from the cluster. Default is `3`. **Warning**: Changing this may cause cluster instability. + +### Leader check settings + +These settings control how non-cluster-manager nodes monitor the cluster manager: + +- `cluster.fault_detection.leader_check.interval` (Static, time unit): Sets the interval between leader checks from nodes to the cluster manager. Default is `1s`. **Warning**: Changing this may cause cluster instability. + +- `cluster.fault_detection.leader_check.timeout` (Static, time unit): Sets how long nodes wait for a response to leader checks before considering the cluster manager failed. Default is `10s`. **Warning**: Changing this may cause cluster instability. + +- `cluster.fault_detection.leader_check.retry_count` (Static, integer): Sets how many consecutive leader check failures must occur before a node considers the cluster manager faulty and attempts to find or elect a new cluster manager. Default is `3`. **Warning**: Changing this may cause cluster instability. + +## Cluster state publishing settings + +The following settings control how cluster state updates are distributed: + +- `cluster.publish.timeout` (Static, time unit): Sets how long the cluster manager waits for cluster state updates to be published to all nodes before timing out. This setting is ignored when `discovery.type` is set to `single-node`. Default is `30s`. + +- `cluster.publish.info_timeout` (Static, time unit): Sets how long the cluster manager waits before logging a message about slow-responding nodes during cluster state publishing. Default is `10s`. + +- `cluster.follower_lag.timeout` (Static, time unit): Sets how long the cluster manager waits for acknowledgments of cluster state updates from lagging nodes. Nodes that don't respond within this time are considered failed and removed from the cluster. Default is `90s`. + +## Cluster coordination settings + +The following settings control cluster joining and coordination: + +- `cluster.join.timeout` (Static, time unit): Sets how long a node waits after sending a join request before considering it failed and retrying. This setting is ignored when `discovery.type` is set to `single-node`. Default is `60s`. + +- `cluster.no_cluster_manager_block` (Dynamic, string): Specifies which operations are rejected when there is no active cluster manager. Valid values are `all` (all operations including read/write and cluster state APIs are rejected) and `write` (only write operations are rejected; read operations succeed based on the last known cluster state but may return stale data). This setting doesn't affect node-based APIs (cluster stats, node info, node stats). For full cluster functionality, an active cluster manager is required. Default is `write`. + +## Configuration examples + +The following are configuration examples for discovery. + +### Basic production cluster + +```yaml +# Cluster identification +cluster.name: production-cluster + +# Discovery configuration +discovery.seed_hosts: + - 10.0.1.10:9300 + - 10.0.1.11:9300 + - 10.0.1.12:9300 + +# Initial cluster manager nodes (only for bootstrapping) +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + - cluster-manager-3 + +# Voting configuration +cluster.auto_shrink_voting_configuration: true +cluster.max_voting_config_exclusions: 10 +``` +{% include copy.html %} + +### Development single-node setup + +```yaml +# Single-node development cluster +cluster.name: dev-cluster +discovery.type: single-node + +# Optional: Disable cluster formation timeouts for faster startup +cluster.publish.timeout: 5s +cluster.join.timeout: 10s +``` +{% include copy.html %} + +### High-availability production cluster + +```yaml +# Production cluster with dedicated cluster manager nodes +cluster.name: production-cluster + +# Discovery configuration +discovery.seed_hosts: + - cluster-manager-1.example.com:9300 + - cluster-manager-2.example.com:9300 + - cluster-manager-3.example.com:9300 + +# Bootstrap configuration (only for initial setup) +cluster.initial_cluster_manager_nodes: + - cluster-manager-1 + - cluster-manager-2 + - cluster-manager-3 + +# Production-optimized settings +cluster.auto_shrink_voting_configuration: true +cluster.max_voting_config_exclusions: 3 +cluster.publish.timeout: 60s +cluster.join.timeout: 120s +``` +{% include copy.html %} + +## Related documentation + +- [Node discovery and seed hosts]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/discovery/): Learn about discovery mechanisms and seed host providers +- [Creating a cluster]({{site.url}}{{site.baseurl}}/tuning-your-cluster/): Step-by-step cluster setup guide +- [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/): General configuration guidance \ No newline at end of file diff --git a/_tuning-your-cluster/discovery-cluster-formation/voting-configuration.md b/_tuning-your-cluster/discovery-cluster-formation/voting-configuration.md new file mode 100644 index 00000000000..ecaf375f0a1 --- /dev/null +++ b/_tuning-your-cluster/discovery-cluster-formation/voting-configuration.md @@ -0,0 +1,79 @@ +--- +layout: default +title: Voting configuration management +parent: Discovery and cluster formation +nav_order: 30 +--- + +# Voting configuration management + +Every OpenSearch cluster maintains a voting configuration, which defines the set of cluster-manager-eligible nodes whose responses count when making critical cluster decisions. Understanding how OpenSearch manages voting configurations is essential for maintaining cluster stability and availability. + +The _voting configuration_ is the authoritative list of cluster-manager-eligible nodes that participate in: + +- **Cluster manager elections** - Choosing which node leads the cluster. +- **Cluster state updates** - Approving changes to cluster metadata and shard allocation. +- **Critical cluster decisions** - Any operation requiring cluster-wide consensus. + +Decisions are made only after a majority (more than half) of the nodes in the voting configuration respond positively. + +## Voting configuration compared to cluster membership + +The voting configuration typically matches all cluster-manager-eligible nodes that are part of the cluster, but there are times when they differ. During normal operations in a stable cluster, all healthy cluster-manager-eligible nodes participate in voting. However, differences can occur during node transitions (for example, when a node is joining or leaving), in failure scenarios where some eligible nodes are unreachable, or when administrators manually exclude nodes for maintenance. + +## Automatic voting configuration management + +OpenSearch automatically adjusts the voting configuration to maintain resilience as the cluster changes. When a new cluster-manager-eligible node joins, the cluster manager evaluates the state, adds the new node to the voting configuration, and publishes the updated state to all nodes. Larger configurations provide better fault tolerance, so OpenSearch prefers to include all available eligible nodes. + +Node removal behavior depends on the `cluster.auto_shrink_voting_configuration` setting, which is enabled by default. When it's enabled, OpenSearch automatically removes departed nodes while ensuring at least three voting nodes remain. This enhances availability and allows the cluster to continue operating after node failures. When it's disabled (false), you must manually remove departed nodes using the voting exclusions API, giving administrators precise control over when and how the configuration changes. + +When possible, OpenSearch replaces departed voting nodes with other eligible nodes instead of reducing the configuration size. This replacement strategy maintains the same number of voting nodes and preserves fault tolerance without disruption. + +## Viewing the current voting configuration + +Use the cluster state API to inspect the current voting configuration: + +```bash +curl -X GET "localhost:9200/_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config" +``` +{% include copy.html %} + +Example response: + +```json +{ + "metadata": { + "cluster_coordination": { + "last_committed_config": [ + "KfEEGG7_SsKZVFqI4ko2FA" + ] + } + } +} +``` + +The `last_committed_config` array contains the node IDs of all nodes in the current voting configuration. + +## Even numbers of cluster-manager-eligible nodes + +OpenSearch handles even numbers of cluster-manager-eligible nodes intelligently to prevent split-brain scenarios. A split-brain scenario occurs in a distributed system when a network failure divides the cluster into two or more isolated groups of nodes that can't communicate with each other. With an even number of voting nodes, a network partition could split the cluster into two equal halves, leaving neither side with a majority to elect a cluster manager. For example, in a four-node cluster, decisions require three votes. If the network splits into two and two, neither side can reach three votes, and the cluster becomes unavailable. + +To prevent this, OpenSearch automatically excludes one node from the voting configuration, creating an odd number of voting nodes (for example, three out of four eligible). This adjustment ensures that one partition can maintain a majority and continue operating while the other cannot. As a result, OpenSearch improves resilience against split-brain conditions without reducing overall fault tolerance. + +## Bootstrap configuration for new clusters + +When starting a brand-new cluster for the very first time, OpenSearch requires an initial bootstrap configuration to determine which nodes should participate in the first cluster manager election. This bootstrap process establishes the initial voting configuration that the cluster will use. + +The bootstrap configuration is only required for new clusters and is ignored once the cluster has been successfully started. After the initial bootstrap, OpenSearch automatically manages the voting configuration as described in the sections above. + +For complete details on configuring cluster bootstrapping, including setup procedures, requirements, troubleshooting, and examples, see [Cluster bootstrapping]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/bootstrapping/). + +## Monitoring voting configuration changes + +To track voting configuration changes and check cluster formation status, use the monitoring commands detailed in the [Discovery and cluster formation]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/#monitoring-discovery-and-cluster-formation) overview. + +## Related documentation + +- [Voting and quorums]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/voting-quorums/): Understanding quorum-based decision making +- [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/): Configure voting behavior +- [Cluster bootstrapping]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/bootstrapping/): Initial cluster startup procedures \ No newline at end of file diff --git a/_tuning-your-cluster/discovery-cluster-formation/voting-quorums.md b/_tuning-your-cluster/discovery-cluster-formation/voting-quorums.md new file mode 100644 index 00000000000..ab59b3c287c --- /dev/null +++ b/_tuning-your-cluster/discovery-cluster-formation/voting-quorums.md @@ -0,0 +1,162 @@ +--- +layout: default +title: Voting and quorum +parent: Discovery and cluster formation +nav_order: 20 +--- + +# Voting and quorum + +OpenSearch uses a sophisticated quorum-based decision-making system to ensure cluster reliability and prevent split-brain scenarios. Understanding how voting and quorum work is essential for maintaining a stable, fault-tolerant OpenSearch cluster. + +Two fundamental tasks require coordination among cluster-manager-eligible nodes: + +1. **Electing a cluster manager** -- Choosing which node will coordinate the cluster. +2. **Updating cluster state** -- Making changes to cluster metadata, shard allocation, and configuration. + +OpenSearch achieves robust coordination by requiring a _quorum_ (majority) of cluster-manager-eligible nodes to agree on these decisions. This approach provides several key benefits: + +- **Fault tolerance**: Some nodes can fail without stopping cluster operations. +- **Split-brain prevention**: The cluster cannot make conflicting decisions when partitioned. +- **Consistency**: All decisions are made by a clear majority of nodes. + +A decision succeeds only when **more than half** of the nodes in the voting configuration respond positively. This ensures that even if the cluster becomes partitioned, only one partition can have a majority and continue making decisions. + +## Voting configuration + +The _voting configuration_ is the set of cluster-manager-eligible nodes whose responses are counted when making cluster decisions. OpenSearch automatically manages this configuration as nodes join and leave the cluster. + +OpenSearch implements dynamic voting configuration management: + +- As nodes join or leave, OpenSearch updates the voting configuration to maintain optimal fault tolerance. +- The voting configuration typically includes all cluster-manager-eligible nodes currently in the cluster. +- During node transitions, the voting configuration may temporarily differ from the current node set. + +The voting configuration follows these rules: + +- Decisions require more than half of voting nodes to respond. +- OpenSearch adds nodes to the voting configuration when they join. +- Nodes are removed from voting configuration when they leave gracefully. +- No two partitions can both have a voting majority. + +## Fault tolerance guidelines + +To maintain cluster availability, follow these critical guidelines. + +Never stop half or more of the nodes in the voting configuration at the same time. This is the most important rule for cluster availability. +{: .important} + +The number of cluster-manager-eligible nodes determines your fault tolerance: + +- 3 nodes: Can tolerate 1 node failure (2 nodes maintain majority). +- 4 nodes: Can tolerate 1 node failure (3 nodes maintain majority). +- 5 nodes: Can tolerate 2 node failures (3 nodes maintain majority). +- 6 nodes: Can tolerate 2 node failures (4 nodes maintain majority). +- 2 nodes: Can tolerate 0 node failures (both must remain available). +- 1 node: Can tolerate 0 node failures (single point of failure). + +## Cluster manager elections + +OpenSearch uses an election process to select the cluster manager node, both at startup and when the current cluster manager fails. + +The election process is as follows: + +1. Election is triggered by one of the following events: + - Cluster startup (no current cluster manager) + - Current cluster manager failure or disconnection + - Network partition resolution + +2. Any cluster-manager-eligible node can start an election. + +3. Elections are randomly scheduled on each node to reduce conflicts. + +4. A node becomes cluster manager only with majority support from the voting configuration. + +5. If elections fail (because of timing conflicts), nodes retry with exponential backoff. + +Election behavior is controlled by the `cluster.election.*` settings. For more information, see [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/). + +## Cluster maintenance operations + +Understanding quorum requirements helps you perform maintenance safely. + +### Rolling restarts + +OpenSearch can remain available during rolling restarts: + +- Restart nodes one at a time: Restart nodes individually, waiting for each to rejoin. +- Maintain majority: Ensure majority of voting nodes remain available. +- Wait for stabilization: Allow voting configuration to update after each node rejoins. + +### Planned maintenance + +For maintenance requiring multiple nodes: + +1. Check voting configuration: Verify current voting nodes using the cluster API. +2. Plan shutdown order: Ensure majority remains available throughout maintenance. +3. Wait between changes: Allow time for voting configuration updates. +4. Monitor cluster health: Verify cluster remains green during maintenance. + +### Emergency procedures + +If you must stop multiple nodes simultaneously: + +- Use voting exclusions: Temporarily exclude nodes from voting before shutdown. +- Restore carefully: Bring nodes back online in the correct order. +- Clear exclusions: Remove voting exclusions once nodes are stable. + +## Monitoring voting configuration + +To monitor voting configuration, cluster health, and cluster manager elections, use the monitoring commands detailed in the [Discovery and cluster formation]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/#monitoring-discovery-and-cluster-formation) overview. + +## Cluster state publishing + +Once a cluster manager is elected, it becomes responsible for distributing cluster state updates to all nodes. Understanding how state publishing works helps you configure appropriate timeouts and monitor cluster coordination. + +The cluster manager is the only node that can make changes to the cluster state. It processes cluster state updates one batch at a time using a two-phase commit process: + +1. Phase 1: Broadcasting and acknowledgment + 1. Cluster manager computes changes and creates updated cluster state. + 2. Broadcasts updated state to all nodes in the cluster. + 3. Nodes acknowledge receipt but do not yet apply the new state. + 4. Cluster manager waits for majority of cluster-manager-eligible nodes to acknowledge. +2. Phase 2: Commitment and application + 1. Cluster manager declares state committed once majority acknowledges. + 2. Broadcasts commit message instructing nodes to apply the new state. + 3. Nodes apply the updated state and send second acknowledgment. + 4. Cluster manager waits for all nodes to confirm application. + +### Publishing timeouts and failure handling + +The cluster manager allows a limited time for each state update to complete, controlled by `cluster.publish.timeout` (default: `30s`), which is measured from when publication begins. If this timeout is reached before the change is committed, the cluster state update is rejected, the cluster manager steps down after considering itself failed, and a new cluster manager election begins. If the commitment succeeds before the timeout, the change is considered successful, and the cluster manager waits for any remaining acknowledgments or until the timeout expires before proceeding to the next update. + +After a successful commitment, some nodes might be slow to apply the update. These lagging nodes are given additional time, controlled by `cluster.follower_lag.timeout` (default: `90s`). If a node fails to apply the update within this time, it is considered failed, removed from the cluster, and the cluster continues operating without it. + +### State publishing optimizations + +OpenSearch optimizes cluster state publishing by typically sending **differential updates (diffs)** instead of full state copies. This approach reduces network bandwidth and publication time because only the changed portions are transmitted to nodes that already hold the current state. For example, when index mappings are updated, only the mapping changes are distributed rather than the entire state. + +In some cases, OpenSearch falls back to publishing the **full cluster state**. This happens when nodes need complete information, such as when a node rejoins the cluster, when a new node joins for the first time, or when a node’s state is outdated and must be synchronized with the current cluster view. + + +### Monitoring state publishing + +To monitor cluster state publishing, use the monitoring commands detailed in the [Discovery and cluster formation]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/#monitoring-discovery-and-cluster-formation) overview. + + +### OpenSearch as a peer-to-peer system + +Understanding OpenSearch's architecture helps explain state publishing importance: + +- High-throughput APIs (index, delete, search) communicate directly between nodes. +- Cluster manager role is limited to maintaining global cluster state and coordinating shard allocation. +- State changes (node joins/leaves, shard reassignment) require cluster-wide coordination. +- State publishing ensures all nodes have consistent view of cluster topology. + +This design keeps the cluster manager from becoming a bottleneck for data operations while ensuring consistent cluster coordination. + +## Related documentation + +- [Discovery and cluster formation settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/settings/): Configure voting and election behavior +- [Node discovery and seed hosts]({{site.url}}{{site.baseurl}}/tuning-your-cluster/discovery-cluster-formation/discovery/): How nodes find each other +- [Creating a cluster]({{site.url}}{{site.baseurl}}/tuning-your-cluster/): Step-by-step cluster setup guide \ No newline at end of file