Skip to content

Conversation

@JackThomson2
Copy link
Contributor

Description

Adding support for virtio-balloon features: Free page hinting and reporting.

TODO

Update documentation on the update balloon features
Update release notes

...

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@JackThomson2 JackThomson2 changed the title virtio-balloon: Add free page reporting reporting virtio-balloon: Add free page reporting hinting Oct 24, 2025
@JackThomson2 JackThomson2 force-pushed the free_page_hinting_reporting branch 2 times, most recently from c96f7be to 8ef7916 Compare October 24, 2025 14:38
@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 84.34066% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.84%. Comparing base (18da1d9) to head (f42d9b2).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/rpc_interface.rs 4.54% 21 Missing ⚠️
src/vmm/src/lib.rs 0.00% 16 Missing ⚠️
...rc/vmm/src/devices/virtio/balloon/event_handler.rs 60.86% 9 Missing ⚠️
src/vmm/src/devices/virtio/balloon/device.rs 96.01% 8 Missing ⚠️
src/vmm/src/devices/virtio/transport/pci/device.rs 0.00% 2 Missing ⚠️
src/vmm/src/devices/virtio/balloon/persist.rs 94.73% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #5491    +/-   ##
========================================
  Coverage   82.83%   82.84%            
========================================
  Files         269      269            
  Lines       27723    28048   +325     
========================================
+ Hits        22965    23236   +271     
- Misses       4758     4812    +54     
Flag Coverage Δ
5.10-m5n.metal 83.01% <84.34%> (-0.01%) ⬇️
5.10-m6a.metal 82.29% <84.34%> (+0.01%) ⬆️
5.10-m6g.metal 79.71% <84.34%> (+0.04%) ⬆️
5.10-m6i.metal 83.01% <84.34%> (+<0.01%) ⬆️
5.10-m7a.metal-48xl 82.28% <84.34%> (+0.01%) ⬆️
5.10-m7g.metal 79.71% <84.34%> (+0.04%) ⬆️
5.10-m7i.metal-24xl 82.98% <84.34%> (-0.01%) ⬇️
5.10-m7i.metal-48xl 82.98% <84.34%> (+<0.01%) ⬆️
5.10-m8g.metal-24xl 79.70% <84.34%> (+0.04%) ⬆️
5.10-m8g.metal-48xl 79.71% <84.34%> (+0.04%) ⬆️
6.1-m5n.metal 83.04% <84.34%> (+<0.01%) ⬆️
6.1-m6a.metal 82.33% <84.34%> (+0.01%) ⬆️
6.1-m6g.metal 79.70% <84.34%> (+0.04%) ⬆️
6.1-m6i.metal 83.04% <84.34%> (+<0.01%) ⬆️
6.1-m7a.metal-48xl 82.31% <84.34%> (+0.01%) ⬆️
6.1-m7g.metal 79.70% <84.34%> (+0.04%) ⬆️
6.1-m7i.metal-24xl 83.05% <84.34%> (+<0.01%) ⬆️
6.1-m7i.metal-48xl 83.05% <84.34%> (+<0.01%) ⬆️
6.1-m8g.metal-24xl 79.70% <84.34%> (+0.04%) ⬆️
6.1-m8g.metal-48xl 79.70% <84.34%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JackThomson2 JackThomson2 force-pushed the free_page_hinting_reporting branch 2 times, most recently from 5a1b473 to 8dcf4fd Compare October 27, 2025 14:09
@JackThomson2 JackThomson2 changed the title virtio-balloon: Add free page reporting hinting [RFC] virtio-balloon: Add free page reporting hinting Oct 28, 2025
with attempt:
return int(self.jailer.pid_file.read_text(encoding="ascii"))

@cached_property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can go to a separate commit as test refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pulled from the mem hot-plugging PR so will be able to drop it once that lands :)

Free page reporting is a mechanism in which the guest will notify the
host of pages which are not currently in use. This feature can only be
configured on boot and will continue to report continuously.

With free page reporting firecracker will `MADV_DONTNEED` on the ranges
reported. This allows the host to free up memory and reduce the RSS of
the VM. With UFFD this is sent as the `UFFD_EVENT_REMOVE` after the call
with `MADV_DONTNEED`.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Free page hinting is a mechanism which allows the guest driver to report
ranges of pages to the host device. A "hinting" run is triggered by the
device by issuing a new command id in the config space, after the update
to the id the device will hint ranges to the host which are unused. Once
the driver has exhausted all free ranges it notifies the device the run
has completed. The device can then issue another command allowing the
guest to reclaim these pages.

Adding support for hinting the firecracker balloon device, we offer
three points to manage the device; first to start a run, second to
monitor the status and a final to issue the command to allow the guest
to reclaim pages.

To note, there is a potential condition in the linux driver which would
allow a range to be reclaimed in an oom scenario before we remove the
range.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Adding API endpoints to manage free page hinting . With
three different endpoint: Start - To begin a new run for free page
hinting, Status - To track the state of the hinting run, Stop - To stop
the hinting run and allow the guest to reclaim the pages reported.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add metrics to track free page hinting and reporting. For both devices
track the number of ranges reported, the number of errors encountered
while freeing and the total amount of memory freed.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Adding new resources to the http api to enable testing of the hinting
functionality.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add option for fast_page_fault_helper to run in a oneshot mode, that
doesn't require the signal to be triggered before measuring the fault
latency.

This makes it possible to test fault latency on non-snapshotted vms as
only the first fault is measured.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add integration tests for free page hinting and reporting, both
functional and performance tests.

New functional tests to ensure that hinting and reporting are reducing
the RSS as expected in the guest. Updated reduce RSS test to touch
memory to reduce the chance of flakiness.

New performance tests for the balloon device. First being a test to
track the CPU overhead of hinting and reporting. Second being a test to
measure the faulting latency while reporting is running in the guest.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add integration tests for free page hinting and reporting. Asserting the
features are enabled correctly. Testing the config space updates
triggered by hinting are being set as expected.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
While the traditional balloon device would not be able to reclaim memory
when back by huge pages, it could still technically be used to to
restrict memory usage in the guest.

With the addition of hinting and reporting, they report ranges in bigger
sizes (4mb by default). Because of this, it is possible for the host
reclaim huge pages backing the guest.

Updates the performance tests for the balloon when back by huge pages,
added varients to the size reduction tests to ensure hinting and
reporting can reduce the RSS of the guest.

Move the inflation test to performance to ensure it runs sequentially in
CI otherwise the host can be exhausted of huge pages.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
@JackThomson2 JackThomson2 force-pushed the free_page_hinting_reporting branch from 8dcf4fd to f42d9b2 Compare November 11, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants