Skip to content

Commit 8cd76f7

Browse files
author
Alex Williamson
committed
vfio/pci: Check the device set open count on reset
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2155664 commit e806e22 Author: Anthony DeRossi <ajderossi@gmail.com> Date: Wed Nov 9 17:40:27 2022 -0800 vfio/pci: Check the device set open count on reset vfio_pci_dev_set_needs_reset() inspects the open_count of every device in the set to determine whether a reset is allowed. The current device always has open_count == 1 within vfio_pci_core_disable(), effectively disabling the reset logic. This field is also documented as private in vfio_device, so it should not be used to determine whether other devices in the set are open. Checking for vfio_device_set_open_count() > 1 on the device set fixes both issues. After commit 2cd8b14 ("vfio/pci: Move to the device set infrastructure"), failure to create a new file for a device would cause the reset to be skipped due to open_count being decremented after calling close_device() in the error path. After commit eadd86f ("vfio: Remove calls to vfio_group_add_container_user()"), releasing a device would always skip the reset due to an ordering change in vfio_device_fops_release(). Failing to reset the device leaves it in an unknown state, potentially causing errors when it is accessed later or bound to a different driver. This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3) assigned to a Windows guest. After shutting down the guest, unbinding the device from vfio-pci, and binding the device to amdgpu: [ 548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed! [ 548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed [ 548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22 [ 548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed [ 548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init Fixes: 2cd8b14 ("vfio/pci: Move to the device set infrastructure") Fixes: eadd86f ("vfio: Remove calls to vfio_group_add_container_user()") Signed-off-by: Anthony DeRossi <ajderossi@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221110014027.28780-4-ajderossi@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
1 parent 7870f63 commit 8cd76f7

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

drivers/vfio/pci/vfio_pci_core.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2208,12 +2208,12 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
22082208
struct vfio_pci_core_device *cur;
22092209
bool needs_reset = false;
22102210

2211-
list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
2212-
/* No VFIO device in the set can have an open device FD */
2213-
if (cur->vdev.open_count)
2214-
return false;
2211+
/* No other VFIO device in the set can be open. */
2212+
if (vfio_device_set_open_count(dev_set) > 1)
2213+
return false;
2214+
2215+
list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
22152216
needs_reset |= cur->needs_reset;
2216-
}
22172217
return needs_reset;
22182218
}
22192219

0 commit comments

Comments
 (0)