Skip to content

Commit 8a4860b

Browse files
authored
TQ: Support proxying Nexus-related API requests (#9403)
When committing configurations, the trust quorum protocol relies on Nexus to be up and for Nexus to interact with each trust quorum node. This allows Nexus to continuously try to commit nodes in an RPW and record in the database which nodes have acked. This mirrors many of our existing designs where Nexus observes and records information about successful operations and retries via RPW. Specifically it matches how we do things in the `Reconfigurator` and the `TUF Repo Depot`. Unfortunately, Nexus cannot communicate with new sleds that are not yet running a sled-agent, but still stuck in the bootstrap-agent. This is because the bootstrap agents (and trust quorum protocol) only communicate over the bootstrap network, which Nexus does not have access to. Nodes must already be part of an existing configuration, running sled-agent, and on the underlay network to talk to Nexus. In this common case, Nexus sends trust quorum related messages to the sled-agent which then calls the api of its local trust quorum `NodeTask`. This is not possible for newly added sleds. While the trust quroum coordinator node will tell new nodes to `Prepare` a configuration over the bootstrap networtk, these new nodes do not have any mechanism to receive commits from Nexus. Therefore we must proxy these commit related operations to an existing member of the trust quorum when adding a new node. We also added the ability to proxy `NodeStatus` requests to aid in debugging. This PR therefore adds the ability to proxy certain requests from one node to another so that we can commit nodes to the latest trust quorum configuration, setup their encrypted storage, and boot their sled-agent. It's worth noting that this is not the only way we could have solved this problem. There are a few possibilities in the design space. 1. We could have had the coordinator always send commit operations and collect acknowledgements as during the `Prepare` phase. Unfortunately, if the coordinator dies before all nodes ack then Nexus would not be able to ensure commit at all nodes. To make this reliable, Nexus would still need to be able to reach out to uncommitted nodes and tell them to commit. Since we already have to do the latter there is no reason to do the former. 2. We could commit at the coordinator (or a few nodes), and then have them gossip around information about commit. This is actually a promising design, and is essentially what we do for the early network config. Nexus could then wait for the sled-agent to start for those nodes and ask them directly if they committed. This would still require talking to all nodes and it adds some extra complexity, but it still seems somewhat reasonable. The rationale for our current choice of proxying was largely one of fitting our existing patterns. It's also very useful for Nexus to be able directly ask a trust quorum node on another sled about its status to diagnose issues. So we went with the proxy mechanism as implemented here. Well, why did we introduce another level of messages at the `Task` layer instead of re-using the `CommitAdvance` functionality or adding new variants to the `PeerMsg` in the `trust_quorum_protocol` crate? The rationale here is largely that the trust quorum protocol as written in RFD 238 and specified in TLA+ doesn't include this behavior. It expects commits from the `Node` "API", meaning from `Nexus`. I didn't want to change that behavior unnecessarily due to urgency, and an existing solid design. It was also easier to build proxy operations this way since tracking operations in async code with oneshot channels is easier than trying to insert similar tracking into the `sans-io` code. In short, we left the `trust-quorum-protocol` crate alone, and added some async helpers to the `trust_quorum` crate. One additional change was made in this PR. While adding the `tq_proxy` test I noticed that we were unnecessarily using `wait_for_condition` on initial commits, after we knew about succesful prepares. These commits should always complete immediately and so I simplified this code in a few existing tests.
1 parent 18058fc commit 8a4860b

File tree

10 files changed

+1186
-195
lines changed

10 files changed

+1186
-195
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

trust-quorum/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ camino.workspace = true
1616
chacha20poly1305.workspace = true
1717
ciborium.workspace = true
1818
daft.workspace = true
19+
debug-ignore.workspace = true
1920
derive_more.workspace = true
2021
futures.workspace = true
2122
gfss.workspace = true

trust-quorum/protocol/src/node.rs

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ use crate::{
3232
use daft::{Diffable, Leaf};
3333
use gfss::shamir::Share;
3434
use omicron_uuid_kinds::RackUuid;
35+
use serde::{Deserialize, Serialize};
3536
use slog::{Logger, error, info, o, warn};
37+
use slog_error_chain::SlogInlineError;
3638

3739
/// An entity capable of participating in trust quorum
3840
///
@@ -379,6 +381,16 @@ impl Node {
379381
}
380382

381383
/// A peer node has disconnected from this one
384+
///
385+
/// Note: It is safe if a call to `on_disconnect` is missed due to a new
386+
/// connection replacing the existing connection and resulting in a call
387+
/// to `on_connect` while the node thinks it still maintains a connection.
388+
///
389+
/// All active behavior such as retries occur in `on_connect`. If the
390+
/// contents of this method change such that it is no longer safe to call
391+
/// `on_connect` without first calling `on_disconnect` for an already
392+
/// connected peer, then we can call `on_disconnect` first from directly
393+
/// within `on_connect`. For now that is unnecessary.
382394
pub fn on_disconnect(
383395
&mut self,
384396
ctx: &mut impl NodeHandlerCtx,
@@ -1063,7 +1075,16 @@ impl Node {
10631075
}
10641076
}
10651077

1066-
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
1078+
#[derive(
1079+
Debug,
1080+
Clone,
1081+
thiserror::Error,
1082+
PartialEq,
1083+
Eq,
1084+
SlogInlineError,
1085+
Serialize,
1086+
Deserialize,
1087+
)]
10671088
pub enum CommitError {
10681089
#[error("invalid rack id")]
10691090
InvalidRackId(
@@ -1077,7 +1098,16 @@ pub enum CommitError {
10771098
Expunged { epoch: Epoch, from: BaseboardId },
10781099
}
10791100

1080-
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
1101+
#[derive(
1102+
Debug,
1103+
Clone,
1104+
thiserror::Error,
1105+
PartialEq,
1106+
Eq,
1107+
SlogInlineError,
1108+
Serialize,
1109+
Deserialize,
1110+
)]
10811111
pub enum PrepareAndCommitError {
10821112
#[error("invalid rack id")]
10831113
InvalidRackId(

trust-quorum/protocol/src/validators.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ use crate::{
1212
};
1313
use daft::{BTreeSetDiff, Diffable, Leaf};
1414
use omicron_uuid_kinds::RackUuid;
15+
use serde::{Deserialize, Serialize};
1516
use slog::{Logger, error, info, warn};
1617
use std::collections::BTreeSet;
1718

@@ -57,7 +58,9 @@ pub struct SledExpungedError {
5758
last_prepared_epoch: Option<Epoch>,
5859
}
5960

60-
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
61+
#[derive(
62+
Debug, Clone, thiserror::Error, PartialEq, Eq, Serialize, Deserialize,
63+
)]
6164
#[error("mismatched rack id: expected {expected:?}, got {got:?}")]
6265
pub struct MismatchedRackIdError {
6366
pub expected: RackUuid,

0 commit comments

Comments
 (0)