Skip to content

Commit d563f8d

Browse files
committed
TQ: Support proxying Nexus-related API requests
When committing configurations, the trust quorum protocol relies on Nexus to be up and for Nexus to interact with each trust quorum node. This allows Nexus to continuously try to commit nodes in an RPW and record in the database which nodes have acked. This mirrors many of our existing designs where Nexus observes and records information about successful operations and retries via RPW. Specifically it matches how we do things in the `Reconfigurator` and the `TUF Repo Depot`. Unfortunately, Nexus cannot communicate with new sleds that are not yet running a sled-agent, but still stuck in the bootstrap-agent. This is because the bootstrap agents (and trust quorum protocol) only communicate over the bootstrap network, which Nexus does not have access to. Nodes must already be part of an existing configuration, running sled-agent, and on the underlay network to talk to Nexus. In this common case, Nexus sends trust quorum related messages to the sled-agent which then calls the api of its local trust quorum `NodeTask`. This is not possible for newly added sleds. While the trust quroum coordinator node will tell new nodes to `Prepare` a configuration over the bootstrap networtk, these new nodes do not have any mechanism to receive commits from Nexus. Therefore we must proxy these commit related operations to an existing member of the trust quorum when adding a new node. We also added the ability to proxy `NodeStatus` requests to aid in debugging. This PR therefore adds the ability to proxy certain requests from one node to another so that we can commit nodes to the latest trust quorum configuration, setup their encrypted storage, and boot their sled-agent. It's worth noting that this is not the only way we could have solved this problem. There are a few possibilities in the design space. 1. We could have had the coordinator always send commit operations and collect acknowledgements as during the `Prepare` phase. Unfortunately, if the coordinator dies before all nodes ack then Nexus would not be able to ensure commit at all nodes. To make this reliable, Nexus would still need to be able to reach out to uncommitted nodes and tell them to commit. Since we already have to do the latter there is no reason to do the former. 2. We could commit at the coordinator (or a few nodes), and then have them gossip around information about commit. This is actually a promising design, and is essentially what we do for the early network config. Nexus could then wait for the sled-agent to start for those nodes and ask them directly if they committed. This would still require talking to all nodes and it adds some extra complexity, but it still seems somewhat reasonable. The rationale for our current choice of proxying was largely one of fitting our existing patterns. It's also very useful for Nexus to be able directly ask a trust quorum node on another sled about its status to diagnose issues. So we went with the proxy mechanism as implemented here. Well, why did we introduce another level of messages at the `Task` layer instead of re-using the `CommitAdvance` functionality or adding new variants to the `PeerMsg` in the `trust_quorum_protocol` crate? The rationale here is largely that the trust quorum protocol as written in RFD 238 and specified in TLA+ doesn't include this behavior. It expects commits from the `Node` "API", meaning from `Nexus`. I didn't want to change that behavior unnecessarily due to urgency, and an existing solid design. It was also easier to build proxy operations this way since tracking operations in async code with oneshot channels is easier than trying to insert similar tracking into the `sans-io` code. In short, we left the `trust-quorum-protocol` crate alone, and added some async helpers to the `trust_quorum` crate. One additional change was made in this PR. While adding the `tq_proxy` test I noticed that we were unnecessarily using `wait_for_condition` on initial commits, after we knew about succesful prepares. These commits should always complete immediately and so I simplified this code in a few existing tests.
1 parent 18058fc commit d563f8d

File tree

10 files changed

+1076
-137
lines changed

10 files changed

+1076
-137
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

trust-quorum/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ camino.workspace = true
1616
chacha20poly1305.workspace = true
1717
ciborium.workspace = true
1818
daft.workspace = true
19+
debug-ignore.workspace = true
1920
derive_more.workspace = true
2021
futures.workspace = true
2122
gfss.workspace = true

trust-quorum/protocol/src/node.rs

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ use crate::{
3232
use daft::{Diffable, Leaf};
3333
use gfss::shamir::Share;
3434
use omicron_uuid_kinds::RackUuid;
35+
use serde::{Deserialize, Serialize};
3536
use slog::{Logger, error, info, o, warn};
37+
use slog_error_chain::SlogInlineError;
3638

3739
/// An entity capable of participating in trust quorum
3840
///
@@ -1063,7 +1065,16 @@ impl Node {
10631065
}
10641066
}
10651067

1066-
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
1068+
#[derive(
1069+
Debug,
1070+
Clone,
1071+
thiserror::Error,
1072+
PartialEq,
1073+
Eq,
1074+
SlogInlineError,
1075+
Serialize,
1076+
Deserialize,
1077+
)]
10671078
pub enum CommitError {
10681079
#[error("invalid rack id")]
10691080
InvalidRackId(
@@ -1077,7 +1088,16 @@ pub enum CommitError {
10771088
Expunged { epoch: Epoch, from: BaseboardId },
10781089
}
10791090

1080-
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
1091+
#[derive(
1092+
Debug,
1093+
Clone,
1094+
thiserror::Error,
1095+
PartialEq,
1096+
Eq,
1097+
SlogInlineError,
1098+
Serialize,
1099+
Deserialize,
1100+
)]
10811101
pub enum PrepareAndCommitError {
10821102
#[error("invalid rack id")]
10831103
InvalidRackId(

trust-quorum/protocol/src/validators.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ use crate::{
1212
};
1313
use daft::{BTreeSetDiff, Diffable, Leaf};
1414
use omicron_uuid_kinds::RackUuid;
15+
use serde::{Deserialize, Serialize};
1516
use slog::{Logger, error, info, warn};
1617
use std::collections::BTreeSet;
1718

@@ -57,7 +58,9 @@ pub struct SledExpungedError {
5758
last_prepared_epoch: Option<Epoch>,
5859
}
5960

60-
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
61+
#[derive(
62+
Debug, Clone, thiserror::Error, PartialEq, Eq, Serialize, Deserialize,
63+
)]
6164
#[error("mismatched rack id: expected {expected:?}, got {got:?}")]
6265
pub struct MismatchedRackIdError {
6366
pub expected: RackUuid,

trust-quorum/src/connection_manager.rs

Lines changed: 63 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,15 @@
55
//! A mechanism for maintaining a full mesh of trust quorum node connections
66
77
use crate::established_conn::EstablishedConn;
8+
use crate::proxy;
89
use trust_quorum_protocol::{BaseboardId, Envelope, PeerMsg};
910

1011
// TODO: Move to this crate
1112
// https://github.com/oxidecomputer/omicron/issues/9311
1213
use bootstore::schemes::v0::NetworkConfig;
1314

1415
use camino::Utf8PathBuf;
16+
use derive_more::From;
1517
use iddqd::{
1618
BiHashItem, BiHashMap, TriHashItem, TriHashMap, bi_upcast, tri_upcast,
1719
};
@@ -60,7 +62,7 @@ pub enum MainToConnMsg {
6062
///
6163
/// All `WireMsg`s sent between nodes is prefixed with a 4 byte size header used
6264
/// for framing.
63-
#[derive(Debug, Serialize, Deserialize)]
65+
#[derive(Debug, Serialize, Deserialize, From)]
6466
pub enum WireMsg {
6567
/// Used for connection keep alive
6668
Ping,
@@ -79,6 +81,12 @@ pub enum WireMsg {
7981
/// of tiny information layered on top of trust quorum. You can still think
8082
/// of it as a bootstore, although, we no longer use that name.
8183
NetworkConfig(NetworkConfig),
84+
85+
/// Requests proxied to other nodes
86+
ProxyRequest(proxy::WireRequest),
87+
88+
/// Responses to proxy requests
89+
ProxyResponse(proxy::WireResponse),
8290
}
8391

8492
/// Messages sent from connection managing tasks to the main peer task
@@ -99,6 +107,8 @@ pub enum ConnToMainMsgInner {
99107
Received { from: BaseboardId, msg: PeerMsg },
100108
ReceivedNetworkConfig { from: BaseboardId, config: NetworkConfig },
101109
Disconnected { peer_id: BaseboardId },
110+
ProxyRequestReceived { from: BaseboardId, req: proxy::WireRequest },
111+
ProxyResponseReceived { from: BaseboardId, rsp: proxy::WireResponse },
102112
}
103113

104114
pub struct TaskHandle {
@@ -120,15 +130,11 @@ impl TaskHandle {
120130
self.abort_handle.abort()
121131
}
122132

123-
pub async fn send(&self, msg: PeerMsg) {
124-
let _ = self.tx.send(MainToConnMsg::Msg(WireMsg::Tq(msg))).await;
125-
}
126-
127-
pub async fn send_network_config(&self, config: NetworkConfig) {
128-
let _ = self
129-
.tx
130-
.send(MainToConnMsg::Msg(WireMsg::NetworkConfig(config)))
131-
.await;
133+
pub async fn send<T>(&self, msg: T)
134+
where
135+
T: Into<WireMsg>,
136+
{
137+
let _ = self.tx.send(MainToConnMsg::Msg(msg.into())).await;
132138
}
133139
}
134140

@@ -172,7 +178,10 @@ impl EstablishedTaskHandle {
172178
self.task_handle.abort();
173179
}
174180

175-
pub async fn send(&self, msg: PeerMsg) {
181+
pub async fn send<T>(&self, msg: T)
182+
where
183+
T: Into<WireMsg>,
184+
{
176185
let _ = self.task_handle.send(msg).await;
177186
}
178187
}
@@ -235,6 +244,12 @@ pub struct ConnMgrStatus {
235244
pub total_tasks_spawned: u64,
236245
}
237246

247+
/// The state of a proxy connection
248+
pub enum ProxyConnState {
249+
Connected,
250+
Disconnected,
251+
}
252+
238253
/// A structure to manage all sprockets connections to peer nodes
239254
///
240255
/// Each sprockets connection runs in its own task which communicates with the
@@ -399,7 +414,7 @@ impl ConnMgr {
399414
"peer_id" => %h.baseboard_id,
400415
"generation" => network_config.generation
401416
);
402-
h.task_handle.send_network_config(network_config.clone()).await;
417+
h.send(network_config.clone()).await;
403418
}
404419
}
405420

@@ -415,7 +430,42 @@ impl ConnMgr {
415430
"peer_id" => %h.baseboard_id,
416431
"generation" => network_config.generation
417432
);
418-
h.task_handle.send_network_config(network_config.clone()).await;
433+
h.send(network_config.clone()).await;
434+
}
435+
}
436+
437+
/// Forward an API request to another node
438+
///
439+
/// Return the state of the connection at this point in time so that the
440+
/// [`proxy::Tracker`] can manage the outstanding request on behalf of the
441+
/// user.
442+
pub async fn proxy_request(
443+
&mut self,
444+
destination: &BaseboardId,
445+
req: proxy::WireRequest,
446+
) -> ProxyConnState {
447+
if let Some(h) = self.established.get1(destination) {
448+
info!(self.log, "Sending {req:?}"; "peer_id" => %destination);
449+
h.send(req).await;
450+
ProxyConnState::Connected
451+
} else {
452+
ProxyConnState::Disconnected
453+
}
454+
}
455+
456+
/// Return a response to a proxied request to another node
457+
///
458+
/// There is no need to track whether this succeeds or fails. If the
459+
/// connection goes away the client on the other side will notice it and
460+
/// retry if needed.
461+
pub async fn proxy_response(
462+
&mut self,
463+
destination: &BaseboardId,
464+
rsp: proxy::WireResponse,
465+
) {
466+
if let Some(h) = self.established.get1(destination) {
467+
info!(self.log, "Sending {rsp:?}"; "peer_id" => %destination);
468+
h.send(rsp).await;
419469
}
420470
}
421471

trust-quorum/src/established_conn.rs

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,36 @@ impl EstablishedConn {
233233
panic!("Connection to main task channnel full");
234234
}
235235
}
236+
WireMsg::ProxyRequest(req) => {
237+
if let Err(_) = self.main_tx.try_send(ConnToMainMsg {
238+
task_id: self.task_id,
239+
msg: ConnToMainMsgInner::ProxyRequestReceived {
240+
from: self.peer_id.clone(),
241+
req,
242+
},
243+
}) {
244+
error!(
245+
self.log,
246+
"Failed to send received proxy msg to the main task"
247+
);
248+
panic!("Connection to main task channel full");
249+
}
250+
}
251+
WireMsg::ProxyResponse(rsp) => {
252+
if let Err(_) = self.main_tx.try_send(ConnToMainMsg {
253+
task_id: self.task_id,
254+
msg: ConnToMainMsgInner::ProxyResponseReceived {
255+
from: self.peer_id.clone(),
256+
rsp,
257+
},
258+
}) {
259+
error!(
260+
self.log,
261+
"Failed to send received proxy msg to the main task"
262+
);
263+
panic!("Connection to main task channel full");
264+
}
265+
}
236266
}
237267
}
238268
}

trust-quorum/src/ledgers.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ use slog::{Logger, info};
1717
use trust_quorum_protocol::PersistentState;
1818

1919
/// A wrapper type around [`PersistentState`] for use as a [`Ledger`]
20-
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
20+
#[derive(Debug, Clone, Serialize, Deserialize)]
2121
pub struct PersistentStateLedger {
2222
pub generation: u64,
2323
pub state: PersistentState,

trust-quorum/src/lib.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,10 @@
77
mod connection_manager;
88
pub(crate) mod established_conn;
99
mod ledgers;
10+
mod proxy;
1011
mod task;
1112

1213
pub(crate) use connection_manager::{
1314
ConnToMainMsg, ConnToMainMsgInner, MainToConnMsg, WireMsg,
1415
};
15-
pub use task::NodeTask;
16+
pub use task::{CommitStatus, Config, NodeApiError, NodeTask, NodeTaskHandle};

0 commit comments

Comments
 (0)