-
Notifications
You must be signed in to change notification settings - Fork 161
underhill_core: specify which CPUs have outstanding IO, not just if they have interrupts #2512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the save/restore mechanism by tracking which CPUs have outstanding I/O operations, not just which CPUs have mapped device interrupts. This enables future optimizations where sidecar can start CPUs without outstanding I/O immediately, while deferring the startup of CPUs with active I/O to the kernel. For a 96-vCPU VM, this distinction is critical as all 96 CPUs may have interrupts mapped, but only a few may have actual outstanding I/O at save time.
Key changes:
- Added
cpus_with_outstanding_iofield to track CPUs with active I/O separately from those with mapped interrupts - Refactored
cpus_with_interruptsfunction into a new helper module that returns both sets of CPUs - Threaded the new parameter through the save/restore pipeline from underhill_core to openhcl_boot
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/loader/loader_defs/src/shim.rs | Added cpus_with_outstanding_io field to SavedState struct with appropriate documentation |
| openhcl/underhill_core/src/nvme_manager/save_restore_helpers.rs | New helper module with cpus_with_interrupts function that analyzes NVMe state to determine both CPUs with mapped interrupts and those with outstanding I/O |
| openhcl/underhill_core/src/nvme_manager/save_restore.rs | Removed old cpus_with_interrupts function (moved to helpers module) |
| openhcl/underhill_core/src/nvme_manager/mod.rs | Exported the new save_restore_helpers module |
| openhcl/underhill_core/src/dispatch/mod.rs | Updated to call new helper function and pass both CPU lists to write_persisted_info |
| openhcl/underhill_core/src/loader/vtl2_config/mod.rs | Added cpus_with_outstanding_io parameter to write_persisted_info function |
| openhcl/openhcl_boot/src/host_params/dt/mod.rs | Added cpus_with_outstanding_io field to PersistedPartitionTopology struct and threaded through topology parsing |
openhcl/underhill_core/src/nvme_manager/save_restore_helpers.rs
Outdated
Show resolved
Hide resolved
openhcl/underhill_core/src/nvme_manager/save_restore_helpers.rs
Outdated
Show resolved
Hide resolved
| mod tests; | ||
|
|
||
| pub use self::driver::NvmeDriver; | ||
| pub use self::driver::save_restore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pub use the next two lines if we're exposing the whole module?
| pub fn write_persisted_info( | ||
| parsed: &ParsedBootDtInfo, | ||
| cpus_with_mapped_interrupts: Vec<u32>, | ||
| interrupt_state: crate::nvme_manager::save_restore_helpers::VPInterruptState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider destructuring the interrupt state so that new fields won't get forgotten here?
|
|
||
| let (vps_with_outstanding_io, vps_with_mapped_interrupts_no_io): (Vec<_>, Vec<_>) = vp_state | ||
| .iter() | ||
| .map(|(&vp, &has_outstanding_io)| (vp, has_outstanding_io)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .map(|(&vp, &has_outstanding_io)| (vp, has_outstanding_io)) | |
| .copied() |
| let (vps_with_outstanding_io, vps_with_mapped_interrupts_no_io): (Vec<_>, Vec<_>) = vp_state | ||
| .iter() | ||
| .map(|(&vp, &has_outstanding_io)| (vp, has_outstanding_io)) | ||
| .partition(|&(_, has_outstanding_io)| has_outstanding_io); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .partition(|&(_, has_outstanding_io)| has_outstanding_io); | |
| .partition(|&(_, has_outstanding_io)| has_outstanding_io).map(|(vp, _)| vp); |
You can dedup the mappings done below here i think.
This PR supports upcoming changes to to have sidecar start CPUs without outstanding IO, but have the kernel start the CPUs with outstanding IO. A 96-vp VM will possibly have 96 CPUs with interrupts, but only have IO outstanding to a few of those at a time. This change allows the restore path to discern.
TEST:
Validated that this does not cause regression in local CI.