-
Notifications
You must be signed in to change notification settings - Fork 62
Add two features for local storage support #9386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Bump the sled-agent API to add two new features for local storage support: - Add an endpoint for Nexus to create and destroy local storage datasets. These will be allocated and deallocated as part of the higher level Disk lifecycle for the forthcoming local storage disk type. - Add the ability to delegate a specific zvol to a Propolis zone. This required accepting a new `DelegatedZvol` parameter during vmm registration.
| let mut devices = vec![ | ||
| zone::Device { name: "/dev/vmm/*".to_string() }, | ||
| zone::Device { name: "/dev/vmmctl".to_string() }, | ||
| zone::Device { name: "/dev/viona".to_string() }, | ||
| ]; | ||
|
|
||
| for delegated_zvol in &self.delegated_zvols { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, take it or leave it: this probably doesn't actually matter, but seems like a nice place for a
let mut devices = Vec::with_capacity(self.delegated_zvols.len() + 3);
// ... push the devicesto preallocate the necessary size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, going to leave this one :)
sled-agent/src/sim/sled_agent.rs
Outdated
| return Err(Error::internal_error(&format!( | ||
| "request block_size {} does not match FileStorageBackend \ | ||
| block_size {block_size}", | ||
| request.block_size, | ||
| ))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this error ought to include the ID of the invalid backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally did this in f7c53ab but then removed it later!
sled-agent/src/sim/storage.rs
Outdated
| .expect("Failed to get the dataset we just inserted"); | ||
| .expect("Failed to get the dataset we just inserted") | ||
| else { | ||
| panic!("just inserted this variant!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obnoxious turbo nit: feels like a place for
| panic!("just inserted this variant!"); | |
| unreachable!("just inserted this variant!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common/src/api/internal/shared.rs
Outdated
| /// The fully qualified name of the parent dataset | ||
| pub parent_dataset: String, | ||
|
|
||
| /// The volume name | ||
| pub name: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have anything with a stronger type than String we could use for either of these fields? I know a bunch of the dataset methods in sled-agent and illumos-utils work directly on strings, but I think that's been a source of confusion at times.
If not, would it make sense to add some newtypes and start using them? I don't know if it's worth doing any kind of minimal validation (e.g., can a volume name be the empty string? can it contain arbitrary utf8?), but that'd be a place to hang that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a "zfs dataset name" struct that was permissive enough - there's DatasetName in common/src/disk.rs but didn't seem generic enough to support the children of LocalStorage. We talked on a meet and I'm going to go with splitting this up to separate parent_dataset (aka the LocalStorage dataset), and a child dataset (that is the parent of the zvol itself).
For moving away from straight Strings, I did manage to find https://github.com/oxidecomputer/illumos-gate/blob/ebc7b84196d53327d76f6194783e79c0afcc86f2/usr/src/lib/libzfs/common/libzfs_dataset.c#L102-L110 which validates names for zfs datasets. We probably won't be able to use that function directly but we could copy the rules there to start that newtype you're talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1ffd876 changes the innards of DelegateZvol, let me know what you think
| Ok(_) => Ok(()), | ||
|
|
||
| Err(crate::ExecutionError::CommandFailure(info)) | ||
| if info.stderr.contains("dataset already exists") => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check it has the expected size and block size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8724549 does this - importantly we can't change these after the fact so calling with different parameters is an error.
| pub async fn ensure_dataset_volume( | ||
| name: String, | ||
| size: ByteCount, | ||
| block_size: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about newtypes - is any u32 valid for a block size, or should this be a BlockSize(u32) (or even a BlockSize::* enum if there are only a few choices that we allow)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the block size argument from the ensure request in 79c0bf3 - these should always only ever be 4096 byte blocks.
| // The FileStorageBackend path will be the full device path, so | ||
| // strip the beginning, including the first part of the external | ||
| // pool name. | ||
| let dataset = path.strip_prefix("/dev/zvol/rdsk/oxp_").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this string be a constant somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was moved into a function in the DelegatedZvol type in 1ffd876
sled-agent/src/sim/storage.rs
Outdated
| .datasets | ||
| .get(&id) | ||
| .expect("Failed to get the dataset we just inserted"); | ||
| .expect("Failed to get the dataset we just inserted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Untested, but I think we could get rid of this expect by replacing the insert() / get() with entry() / insert_entry():
let DatasetContents::Crucible(crucible) = self
.datasets
.entry(id)
.insert_entry(DatasetContents::Crucible(/* ... blah blah ...*/))
.into_mut()
else {
unreachable!("...");
};(but +1 on @hawkw's suggestion for unreachable!() over panic!())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it works, see 551a528
| match self.get_dataset(zpool_id, dataset_id) { | ||
| DatasetContents::Crucible(crucible) => crucible.data.clone(), | ||
| DatasetContents::LocalStorage(_) => { | ||
| panic!("asked for Crucible, got LocalStorage!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is fine since it's the simulator, but should get_crucible_dataset() return an Option<Arc<CrucibleData>> and let the caller unwrap should they so choose? (Same question about get_local_storage_dataset() below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it was fine to panic in both spots in the simulated sled agent, cause it would clearly tell you what the problem was in the panic message, and which routine was calling which get_*_dataset.
sled-agent/src/sled_agent.rs
Outdated
| dataset_id: DatasetUuid, | ||
| request: sled_agent_api::LocalStorageDatasetEnsureRequest, | ||
| ) -> Result<(), HttpError> { | ||
| let zpool_name = ZpoolName::External(zpool_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this confirm that zpool_id is a disk with a local storage dataset our config says is okay to access before attempting to call Zfs::ensure_dataset()? The config reconciler exposes a available_datasets_rx() watch channel that several sled-agent subsystems consume; we could give AvailableDatasetReceiver a has_local_storage_dataset(&self, zpool_id) method or something like that?
This would catch two potential problems:
- Disks our config has said we can't use (it would be surprising to get a request from Nexus for such a disk, but it's possible if we're racing, perhaps?)
- Disks our config says we should use, but we haven't been able to ensure the local storage dataset exists (could be a disk problem or a zfs error or ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good idea, done in 634ee2a
sled-agent/src/sled_agent.rs
Outdated
| ) -> Result<(), HttpError> { | ||
| let zpool_name = ZpoolName::External(zpool_id); | ||
|
|
||
| let name = format!("{zpool_name}/crypt/local_storage/{dataset_id}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, I think I'd try to attach this to AvailableDatasetsReceiver somehow to keep the "build up dataset strings" more confined than it is here in an HTTP handler. Might be able to squish this into the has_local_storage_dataset() method I suggested above; local_storage_dataset_name(&self, zpool_id, dataset_id) -> Result<_, _> or something like that maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, done in 634ee2a
- change DelegatedZvol to an enum - create a specific variant for local storage - use functions implemented on DelegatedZvol instead of hard coded strings (those functions still have hard coded strings but it's wrapped up in the type itself at least) - use the new DelegatedZvol type and functions when ensuring and deleting local storage datasets
jgallagher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like most of these changes a lot. Have a few more hopefully-minor suggestions.
illumos-utils/src/zfs.rs
Outdated
|
|
||
| #[derive(thiserror::Error, Debug)] | ||
| pub enum EnsureDatasetVolumeErrorInner { | ||
| #[error("{0}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit - if there's no useful context for the wrapper, you should use
| #[error("{0}")] | |
| #[error(transparent)] |
If you combine "{0}" with #[from], you'll get one level of doublespeak when printing the error chain (because this error type and its first source will both display the ExecutionError); #[error(transparent)] doesn't have that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in c6110fc. I really should re-consult your document about this!
sled-agent/src/instance.rs
Outdated
| opte_ports.push(port); | ||
| } | ||
|
|
||
| // Each delegated Zvol requires (optionally) delegating the parent dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this bit - this always builds an empty vec today, right? Is this a guard for future kinds of delegated zvols that might need this? If so, I think I'd prefer a couple other options:
- Replace this code block with just a comment and a
let datasets = []instead of an iter that always builds an empty vec. - Make this a method on
DelegatedZvolso this turns into something like
let datasets: Vec<_> = self
.delegated_zvols
.iter()
.filter(|zvol| zvol.needs_parent_dataset_delegated()) // or whatever name is reasonable
.collect();to match how we have the internal details hidden in delegated_zvol.zvol_device() in the next loop below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was a little wonky. Removed the block of code and left just a comment in bf71ddb
| .inner | ||
| .config_reconciler | ||
| .available_datasets_rx() | ||
| .all_mounted_local_storage_datasets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding all_mounted_local_storage_datasets() is probably fine, but I don't think it's what we want to use here. It gives back a Vec<PathInPool>, which (a) unnecessarily allocates a Vec when all we want is to check for a single dataset we expect to exist and (b) only lets us check whether the zpool_id has a mounted local storage dataset, not that it's specifically dataset_id. (I'll grant getting that wrong seems unlikely, but we should probably catch it if it happens!)
What if we had something like this in LatestReconciliationResult:
fn validate_local_storage_dataset_mounted(
&self,
zpool_id: ExternalZpoolUuid,
dataset_id: DatasetUuid,
) -> Result<(), LocalStorageDatasetMountError> {
let Some(result) = self.datasets.get(&dataset_id) else {
return Err(/* error for no dataset */);
};
match result {
ConfigReconcilerInventoryResult::Ok => {
// Could we have a `result` but still fail this lookup if the
// dataset has been expunged (i.e., removed from the config) but
// we haven't yet succeeded in enacting that? Seems unlikely,
// but easy enough to return an error in case it's possible.
let Some(config) = self.sled_config.datasets.get(&dataset_id)
else {
return Err(/* error for no dataset */);
};
if config.name.kind() != DatasetKind::LocalStorage {
return Err(/* error for dataset isn't the right kind */);
};
if config.name.pool() != ZpoolName::External(zpool_id) {
return Err(/* error for dataset on wrong zpool */);
}
Ok(())
}
ConfigReconcilerInventoryResult::Err { message } => {
return Err(/* dataset failed to mount with {message} */);
}
}
}that then bubbled out through the couple levels of handles? We could call it here, get a concrete error back, and return either for_unavail or for_bad_request depending on the particular error variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In create_local_storage_dataset, the argument dataset_id unfortunately doesn't mean the local storage dataset's uuid, it means the child of the local storage dataset. So instead of being the id for
oxp_ZPOOL/crypt/local_storage
it is the id for the child dataset:
oxp_ZPOOL/crypt/local_storage/THIS_ID
which additionally has the zvol under it. The intent of the added check was to check if the zpool had a mounted local storage dataset only, aka is the parent oxp_ZPOOL/crypt/local_storage available before requesting oxp_ZPOOL/crypt/local_storage/THIS_ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh right. I don't want to hold up this PR over this, because it may be fine and is certainly in line with other stuff we have around zpools and datasets, but I'm vaguely worried there's something kinda unclear here. Just jotting down thoughts:
The OmicronSledConfig has a specific DatasetUuid for each oxp_ZPOOL/crypt/local_storage dataset (whose config also includes quota / reservation / the zpool ID it's on). The local_storage_* APIs added in this PR don't reference that DatasetUuid at all; instead, they specify the zpool ID and assume there's exactly one local storage dataset on that zpool. Should these new APIs reference the local storage DatasetUuid instead of the zpool ID? Today we give every zpool exactly one local storage dataset - as long as we maintain that it maybe doesn't matter, but if we ever want to change it we might prefer a more explicit link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're describing is important if we want to do a v2 of the local storage dataset, yeah. I'm going to merge this as-is for now though.
| ) | ||
| } | ||
|
|
||
| pub fn file_backends( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is currently only called by the simulator, but the other VmmSpec methods are called by the real sled-agent. Is that going to change in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the other method are called inside code that only seems to run if there's a migration request, and instances with file backends won't be eligible for migration.
| pub(crate) fn all_mounted_local_storage_datasets( | ||
| &self, | ||
| ) -> impl Iterator<Item = PathInPool> + '_ { | ||
| self.all_mounted_datasets_of_kind(DatasetKind::LocalStorage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ug, this is going to walk right into a landmine I left, after just talking about how I don't like to do this:
omicron/sled-agent/config-reconciler/src/reconciler_task.rs
Lines 242 to 248 in 8a4860b
| let mountpoint = match &kind { | |
| DatasetKind::Debug => U2_DEBUG_DATASET, | |
| DatasetKind::TransientZoneRoot => ZONE_DATASET, | |
| _ => unreachable!( | |
| "private function called with unexpected kind {kind:?}" | |
| ), | |
| }; |
Can we either remove this method in favor of something like the validate_local_storage_dataset_mounted() proposed in an earlier comment, or fix the match to list all the DatasetKind variants explicitly and not panic on DatasetKind::LocalStorage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the match to list all variants explicitly in 7e3d484 - I first tried moving DatasetName's full_name routine to DatasetKind (to replace the mountpoint variant) but I think this should happen in another PR.
Bump the sled-agent API to add two new features for local storage support:
Add an endpoint for Nexus to create and destroy local storage datasets. These will be allocated and deallocated as part of the higher level Disk lifecycle for the forthcoming local storage disk type.
Add the ability to delegate a specific zvol to a Propolis zone. This required accepting a new
DelegatedZvolparameter during vmm registration.