Skip to content

Conversation

@mattkur
Copy link
Contributor

@mattkur mattkur commented Dec 1, 2025

This PR supports upcoming changes to to have sidecar start CPUs without outstanding IO, but have the kernel start the CPUs with outstanding IO. A 96-vp VM will possibly have 96 CPUs with interrupts, but only have IO outstanding to a few of those at a time. This change allows the restore path to discern.

TEST:
Validated that this does not cause regression in local CI.

@mattkur mattkur requested a review from a team as a code owner December 1, 2025 23:21
Copilot AI review requested due to automatic review settings December 1, 2025 23:21
@mattkur mattkur requested a review from a team December 1, 2025 23:21
@github-actions github-actions bot added the unsafe Related to unsafe code label Dec 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copilot finished reviewing on behalf of mattkur December 1, 2025 23:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the save/restore mechanism by tracking which CPUs have outstanding I/O operations, not just which CPUs have mapped device interrupts. This enables future optimizations where sidecar can start CPUs without outstanding I/O immediately, while deferring the startup of CPUs with active I/O to the kernel. For a 96-vCPU VM, this distinction is critical as all 96 CPUs may have interrupts mapped, but only a few may have actual outstanding I/O at save time.

Key changes:

  • Added cpus_with_outstanding_io field to track CPUs with active I/O separately from those with mapped interrupts
  • Refactored cpus_with_interrupts function into a new helper module that returns both sets of CPUs
  • Threaded the new parameter through the save/restore pipeline from underhill_core to openhcl_boot

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vm/loader/loader_defs/src/shim.rs Added cpus_with_outstanding_io field to SavedState struct with appropriate documentation
openhcl/underhill_core/src/nvme_manager/save_restore_helpers.rs New helper module with cpus_with_interrupts function that analyzes NVMe state to determine both CPUs with mapped interrupts and those with outstanding I/O
openhcl/underhill_core/src/nvme_manager/save_restore.rs Removed old cpus_with_interrupts function (moved to helpers module)
openhcl/underhill_core/src/nvme_manager/mod.rs Exported the new save_restore_helpers module
openhcl/underhill_core/src/dispatch/mod.rs Updated to call new helper function and pass both CPU lists to write_persisted_info
openhcl/underhill_core/src/loader/vtl2_config/mod.rs Added cpus_with_outstanding_io parameter to write_persisted_info function
openhcl/openhcl_boot/src/host_params/dt/mod.rs Added cpus_with_outstanding_io field to PersistedPartitionTopology struct and threaded through topology parsing

@mattkur mattkur requested review from a team as code owners December 2, 2025 17:59
mod tests;

pub use self::driver::NvmeDriver;
pub use self::driver::save_restore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pub use the next two lines if we're exposing the whole module?

pub fn write_persisted_info(
parsed: &ParsedBootDtInfo,
cpus_with_mapped_interrupts: Vec<u32>,
interrupt_state: crate::nvme_manager::save_restore_helpers::VPInterruptState,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider destructuring the interrupt state so that new fields won't get forgotten here?


let (vps_with_outstanding_io, vps_with_mapped_interrupts_no_io): (Vec<_>, Vec<_>) = vp_state
.iter()
.map(|(&vp, &has_outstanding_io)| (vp, has_outstanding_io))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(|(&vp, &has_outstanding_io)| (vp, has_outstanding_io))
.copied()

let (vps_with_outstanding_io, vps_with_mapped_interrupts_no_io): (Vec<_>, Vec<_>) = vp_state
.iter()
.map(|(&vp, &has_outstanding_io)| (vp, has_outstanding_io))
.partition(|&(_, has_outstanding_io)| has_outstanding_io);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.partition(|&(_, has_outstanding_io)| has_outstanding_io);
.partition(|&(_, has_outstanding_io)| has_outstanding_io).map(|(vp, _)| vp);

You can dedup the mappings done below here i think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport_1.7.2511 unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants