-
Notifications
You must be signed in to change notification settings - Fork 150
Various tmt + testing improvements #1790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a5d9732 to
df71cea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two valuable cleanups. First, it refactors the tmt test execution to directly discover and run tests instead of plans, which simplifies the configuration and removes duplication. This is a great improvement for maintainability. Second, it adds a command to show the tmt report on failure, which will significantly aid in debugging failing tests. I've noted a few minor inconsistencies in the updated logging and error messages where 'plan' was not fully replaced with 'test'. Please see my specific comments for details.
|
The failure comes from |
9b18985 to
03ff8a8
Compare
This tmt behavior also drives me crazy. Glad to see this change 👍 |
| --json pretty \ | ||
| --output "/boot/$kver.efi" | ||
| # To support multi arch | ||
| systemd_boot_file=$(ls /usr/lib/systemd/boot/efi/*.efi | xargs -n 1 basename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right though we're going to need the same fix in Dockerfile.cfsuki. Which I think again is a driving use case for having a higher level wrapper for this.
|
Strange, I don't understand why the soft reboot test is failing with testing-farm here but works in previous runs. |
Already worked on it. Not figure out the root cause in TF. Will debug it in pure TF to see what happens. |
|
Wait a minute...does testing-farm+tmt run all tests from a single plan on the same provisioned environment? It's sure looking like that's the case, see e.g. this log from the run here. It's quite impressive actually that it's just that test that fails because some of these tests are going to leak state. Yeah this seems to be the case. OK so this PR needs a rework. |
(Of course, we could optimize this if we relied on a factory reset; I'm sure we could do that for most tests) |
|
I've come to the reluctant conclusion there's no sane option to fix the "triple test definition" problem we have other than going to generating the tmt stuff. Working on that. |
03ff8a8 to
9652e72
Compare
9652e72 to
5aab4f5
Compare
|
One thing I will say is I am so glad that AI exists to learn things like fmf for me |
95c8140 to
ec438ac
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to the testing infrastructure, primarily by refactoring the tmt test execution into a dedicated xtask module. The changes make the test setup more robust and extensible, for example by adding support for synthetic upgrade tests.
My review focuses on a few key areas:
- A critical issue in the newly added
tmt.rsfile where an outdatedrandcrate API is used, which will prevent compilation. - A couple of medium-severity suggestions to improve maintainability and clarity in the new test scripts and
xtaskcode. One is about an inconsistency in a Dockerfile, and the other suggests refactoring a large loop for better readability and error handling.
Overall, the changes are a great step forward for the project's testing capabilities. Addressing the identified issues will make the new testing code even better.
| for plan in plans { | ||
| let plan_name = sanitize_plan_name(plan); | ||
| let vm_name = format!("bootc-tmt-{}-{}", random_suffix, plan_name); | ||
|
|
||
| println!("\n========================================"); | ||
| println!("Running plan: {}", plan); | ||
| println!("VM name: {}", vm_name); | ||
| println!("========================================\n"); | ||
|
|
||
| // Launch VM with bcvk | ||
|
|
||
| let launch_result = cmd!( | ||
| sh, | ||
| "bcvk libvirt run --name {vm_name} --detach {COMMON_INST_ARGS...} {image}" | ||
| ) | ||
| .run() | ||
| .context("Launching VM with bcvk"); | ||
|
|
||
| if let Err(e) = launch_result { | ||
| eprintln!("Failed to launch VM for plan {}: {:#}", plan, e); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
|
|
||
| // Ensure VM cleanup happens even on error (unless --preserve-vm is set) | ||
| let cleanup_vm = || { | ||
| if preserve_vm { | ||
| return; | ||
| } | ||
| if let Err(e) = cmd!(sh, "bcvk libvirt rm --stop --force {vm_name}") | ||
| .ignore_stderr() | ||
| .ignore_status() | ||
| .run() | ||
| { | ||
| eprintln!("Warning: Failed to cleanup VM {}: {}", vm_name, e); | ||
| } | ||
| }; | ||
|
|
||
| // Wait for VM to be ready and get SSH info | ||
| let vm_info = wait_for_vm_ready(sh, &vm_name); | ||
| let (ssh_port, ssh_key) = match vm_info { | ||
| Ok((port, key)) => (port, key), | ||
| Err(e) => { | ||
| eprintln!("Failed to get VM info for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| println!("VM ready, SSH port: {}", ssh_port); | ||
|
|
||
| // Save SSH private key to a temporary file | ||
| let key_file = tempfile::NamedTempFile::new().context("Creating temporary SSH key file"); | ||
|
|
||
| let key_file = match key_file { | ||
| Ok(f) => f, | ||
| Err(e) => { | ||
| eprintln!("Failed to create SSH key file for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf()) | ||
| .context("Converting key path to UTF-8"); | ||
|
|
||
| let key_path = match key_path { | ||
| Ok(p) => p, | ||
| Err(e) => { | ||
| eprintln!("Failed to convert key path for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| if let Err(e) = std::fs::write(&key_path, ssh_key) { | ||
| eprintln!("Failed to write SSH key for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
|
|
||
| // Set proper permissions on the key file (SSH requires 0600) | ||
| { | ||
| use std::os::unix::fs::PermissionsExt; | ||
| let perms = std::fs::Permissions::from_mode(0o600); | ||
| if let Err(e) = std::fs::set_permissions(&key_path, perms) { | ||
| eprintln!("Failed to set key permissions for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| } | ||
|
|
||
| // Verify SSH connectivity | ||
| println!("Verifying SSH connectivity..."); | ||
| if let Err(e) = verify_ssh_connectivity(sh, ssh_port, &key_path) { | ||
| eprintln!("SSH verification failed for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
|
|
||
| println!("SSH connectivity verified"); | ||
|
|
||
| let ssh_port_str = ssh_port.to_string(); | ||
|
|
||
| // Run tmt for this specific plan using connect provisioner | ||
| println!("Running tmt tests for plan {}...", plan); | ||
|
|
||
| // Generate a unique run ID for this test | ||
| // Use the VM name which already contains a random suffix for uniqueness | ||
| let run_id = vm_name.clone(); | ||
|
|
||
| // Run tmt for this specific plan | ||
| // Note: provision must come before plan for connect to work properly | ||
| let context = context.clone(); | ||
| let how = ["--how=connect", "--guest=localhost", "--user=root"]; | ||
| let env = ["TMT_SCRIPTS_DIR=/var/lib/tmt/scripts", "BCVK_EXPORT=1"] | ||
| .into_iter() | ||
| .chain(args.env.iter().map(|v| v.as_str())) | ||
| .flat_map(|v| ["--environment", v]); | ||
| let test_result = cmd!( | ||
| sh, | ||
| "tmt {context...} run --id {run_id} --all {env...} provision {how...} --port {ssh_port_str} --key {key_path} plan --name {plan}" | ||
| ) | ||
| .run(); | ||
|
|
||
| // Clean up VM regardless of test result (unless --preserve-vm is set) | ||
| cleanup_vm(); | ||
|
|
||
| match test_result { | ||
| Ok(_) => { | ||
| println!("Plan {} completed successfully", plan); | ||
| test_results.push((plan.to_string(), true, Some(run_id))); | ||
| } | ||
| Err(e) => { | ||
| eprintln!("Plan {} failed: {:#}", plan, e); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, Some(run_id))); | ||
| } | ||
| } | ||
|
|
||
| // Print VM connection details if preserving | ||
| if preserve_vm { | ||
| // Copy SSH key to a persistent location | ||
| let persistent_key_path = Utf8Path::new("target").join(format!("{}.ssh-key", vm_name)); | ||
| if let Err(e) = std::fs::copy(&key_path, &persistent_key_path) { | ||
| eprintln!("Warning: Failed to save persistent SSH key: {}", e); | ||
| } else { | ||
| println!("\n========================================"); | ||
| println!("VM preserved for debugging:"); | ||
| println!("========================================"); | ||
| println!("VM name: {}", vm_name); | ||
| println!("SSH port: {}", ssh_port_str); | ||
| println!("SSH key: {}", persistent_key_path); | ||
| println!("\nTo connect via SSH:"); | ||
| println!( | ||
| " ssh -i {} -p {} -o IdentitiesOnly=yes root@localhost", | ||
| persistent_key_path, ssh_port_str | ||
| ); | ||
| println!("\nTo cleanup:"); | ||
| println!(" bcvk libvirt rm --stop --force {}", vm_name); | ||
| println!("========================================\n"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic inside this for loop is quite long and contains repetitive error handling boilerplate. This makes it hard to follow the main flow of setting up and running a test plan.
Consider refactoring the body of this loop into a separate function, for example run_plan_in_vm(sh: &Shell, plan: &str, ...) -> Result<String>.
This would allow you to:
- Use the
?operator for cleaner error propagation within the new function. - Manage VM cleanup automatically using a guard struct that calls
cleanup_vm()onDrop. This is a common pattern in Rust for resource management (RAII).
An example of a guard struct:
struct VmGuard<'a, 'b> {
sh: &'a Shell,
vm_name: &'b str,
preserve: bool,
}
impl<'a, 'b> Drop for VmGuard<'a, 'b> {
fn drop(&mut self) {
if self.preserve {
return;
}
if let Err(e) = cmd!(self.sh, "bcvk libvirt rm --stop --force {vm_name}", vm_name = self.vm_name)
.ignore_stderr()
.ignore_status()
.run()
{
eprintln!("Warning: Failed to cleanup VM {}: {}", self.vm_name, e);
}
}
}The main loop would then become much simpler, focusing on iterating through plans and reporting results.
| @@ -0,0 +1,3 @@ | |||
| # Just creates a file as a new layer for a synthetic upgrade test | |||
| FROM localhost/bootc-integration | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Justfile's _build-upgrade-image rule overrides this FROM instruction with localhost/bootc-integration-bin using podman build --from. For clarity and to avoid confusion, this FROM instruction should match what is used in the Justfile, or use a generic placeholder like scratch to indicate it's always overridden.
FROM localhost/bootc-integration-bin
Hmm, I saw that once locally but it went away and I didn't debug. Dang it...I think this is a complicated side effect of us exporting the host container storage via virtiofs - basically the host selinux policy is tripping over the |
Haven't reproduced the bug yet locally though but it at least doesn't hurt |
| bootc status | ||
| let st = bootc status --json | from json | ||
| let is_composefs = ($st.status.booted.composefs? != null) | ||
| if is_composefs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One failure comes from is_composefs. It should be $is_composefs.
|
Another failure comes from podman build and podman run in diff --git i/tmt/tests/booted/test-image-pushpull-upgrade.nu w/tmt/tests/booted/test-image-pushpull-upgrade.nu
index 5367dcc8..eca7e9dd 100644
--- i/tmt/tests/booted/test-image-pushpull-upgrade.nu
+++ w/tmt/tests/booted/test-image-pushpull-upgrade.nu
@@ -49,9 +49,9 @@ COPY usr/ /usr/
RUN echo test content > /usr/share/blah.txt
" | save Dockerfile
# Build it
- podman build -t localhost/bootc-derived .
+ podman build --security-opt label=disable -t localhost/bootc-derived .
# Just sanity check it
- let v = podman run --rm localhost/bootc-derived cat /usr/share/blah.txt | str trim
+ let v = podman run --rm --security-opt label=disable localhost/bootc-derived cat /usr/share/blah.txt | str trim
assert equal $v "test content"
let orig_root_mtime = ls -Dl /ostree/bootc | get modified
@@ -158,7 +158,7 @@ COPY usr/ /usr/
RUN echo test content2 > /usr/share/blah.txt
" | save Dockerfile
# Build it
- podman build -t localhost/bootc-derived .
+ podman build --security-opt label=disable -t localhost/bootc-derived .
let booted_digest = $booted.imageDigest
print $"booted_digest = ($booted_digest)"
# We should already be fetching updates from container storage |
242a527 to
28e4d5a
Compare
|
Thanks for picking this up! I hope that bootc-dev/bcvk#159 fixes the selinux issue. That said, I haven't yet reproduced the failure locally, and I'm not understanding why. |
28e4d5a to
f28b314
Compare
|
cs10 flaky issue appears again. So my workaround |
|
github test failures:
Testing farm/Packit test failure:
|
Hmmm...it might be older podman doesn't handle additional image stores in the same way? I'll take a look. Nah it's really simple, It's pretty tempting to try to "backfill" this...
I think this one is a flake...but if it keeps happening we'll have to debug. |
f28b314 to
a945294
Compare
I mean, the problem was simple to understand...the fix is definitely uglier and more involved as it requires knowledge about the target environment capabilities "up front" in our tests. Anyways, done now and tested both cases locally. |
To fix SELinux issues. Signed-off-by: Colin Walters <walters@verbum.org>
We need to run most of our tests in a separate provisioned machine, which means it needs an individual plan. And then we need a test for that plan. And then we need the *actual test code*. This "triplication" is a huge annoying pain. TMT is soooo complicated, yet as far as I can tell it doesn't offer us any tools to solve this. So we'll do it here, cut over to generating the TMT stuff from metadata defined in the test file. Hence adding a test is just: - Write a new tests/booted/foo.nu - `cargo xtask update-generated` Signed-off-by: Colin Walters <walters@verbum.org>
…tion
Move TMT test runner code from xtask.rs to tmt module:
- `run_tmt()` and `tmt_provision()` functions
- Helper functions for VM management and SSH connectivity
- Related constants
Also refactor `update_integration()` to use serde_yaml::Value for
building YAML structures instead of string concatenation.
Add detailed error reporting for failed TMT tests:
- Assign run IDs using `tmt run --id`
- Display verbose reports with `tmt run -i {id} report -vvv`
Assisted-by: Claude Code (Sonnet 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
Otherwise we compile many dependencies twice unnecessarily. Signed-off-by: Colin Walters <walters@verbum.org>
To make it easier to do upgrade tests. Signed-off-by: Colin Walters <walters@verbum.org> Co-authored-by: Xiaofeng Wang <henrywangxf@me.com> Signed-off-by: Colin Walters <walters@verbum.org>
This ensures it all can work much more elegantly/naturally with sealed UKI builds - we don't want to do the build-on-target thing. Signed-off-by: Colin Walters <walters@verbum.org>
Signed-off-by: Colin Walters <walters@verbum.org>
e1c7f45 to
27da29f
Compare
4a890d6 to
9eaf95d
Compare
|
OK this one should finally pass all tests! |
CentOS 9 lacks systemd.extra-unit.* support which is required for --bind-storage-ro to work with bcvk. This was causing test failures on centos-9 while working fine on Fedora. Change the approach so tests express intent via `extra.try_bind_storage: true` metadata, and xtask handles the details: - Detect distro by running the container image and parsing os-release - Pass distro to tmt via --context=distro=<id>-<version> - Only add --bind-storage-ro when test wants it AND distro supports it - When bind storage is available, also set BOOTC_upgrade_image env var - Tests can detect missing $env.BOOTC_upgrade_image and fall back to building the upgrade image locally Add --upgrade-image CLI option to specify the upgrade image path, replacing the old --env=BOOTC_upgrade_image approach. Extract magic values to clear const declarations at the top of the file for better maintainability. Assisted-by: Claude Code (Sonnet 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
9eaf95d to
676c259
Compare
|
OK one more test fix for readonly+c9s |
See individual patches.