You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge: Backport huge pfnmap support for significantly faster large mmap'd PCI device BARs
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6638
Backport huge pfnmap support for significantly faster large mmap'd PCI device BARs
JIRA: https://issues.redhat.com/browse/RHEL-73613
Mapping and unmapping PCI BARs into VM va-space is a common VFIO function when assigning
a PCIe device to a VM.
Currently, the mapping requires setting up the BAR-mappings at a native pagesize level,
e.g., 4K on x86, 64K possible on ARM64. This is done using PFNMAP support in the kernel,
as these device pages are not backed by the standard page-struct that system memory is
mapped under.
For relatively small to medium sized BARs, the mapping performance isn't too noticeable;
but with vGPUs and BARs in the gigabyte-size ranges, the map time can become a signficant
amount of the VM startup time (minutes).
Upstream resolved this issue by providing huge PFNMAP support -- superpage PFNMAPs;
2MB, and even 1GB(on x86) pages vs 4K page mappings yields three to ten orders of magnitude speed
up in the mapping operation, which is quite visible to the VM user of assigned GPUs in
the guest VM.
This series backports the core of what upstream did:
https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/
along with a couple of prior upstream commits which allow these to apply cleanly to RHEL9,
as well as a couple bug fixes that patchreview identified, and one related commit
the kernel-mm team requested.
The code can be tested by doing the following (thanks to AlexW for providing this info):
# echo "func vfio_pci_mmap_huge_fault +p" > /proc/dynamic_debug/control
Then you'll see things in dmesg like:
vfio-pci 0000:5e:00.0: vfio_pci_mmap_huge_fault(,order = 9) BAR 0 page offset 0x0: 0x100
vfio-pci 0000:5e:00.0: vfio_pci_mmap_huge_fault(,order = 9) BAR 0 page offset 0x200: 0x100
vfio-pci 0000:5e:00.0: vfio_pci_mmap_huge_fault(,order = 9) BAR 0 page offset 0x400: 0x100
Here we know order 9 is a 2MB PMD mapping on x86. BAR0 on this device
is 16MB, so 2MB mappings is the best we can get. If you have a device
with at least a 1GB BAR, you should also see:
vfio-pci 0000:5e:00.0: vfio_pci_mmap_huge_fault(,order = 18) BAR 1 page offset 0x240000: 0x100
vfio-pci 0000:5e:00.0: vfio_pci_mmap_huge_fault(,order = 18) BAR 1 page offset 0x280000: 0x100
vfio-pci 0000:5e:00.0: vfio_pci_mmap_huge_fault(,order = 18) BAR 1 page offset 0x2c0000: 0x100
Again here we know order 18 is 1GB PUD mappings and BAR1 of this device
is 32GB (NVIDIA A10).
You'll need to be running at least QEMU 9.2 to get reliable alignment
for PUD mappings (which neither RHEL-9 or RHEL-10 has atm; they have QEMU-9.1)
If you see order = 0 mappings for BARs that are at least 2MB, something is wrong.
Omitted-fix: 7223769 ("mm/memory.c: simplify pfnmap_lockdep_assert")
Signed-off-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Alex Williamson <alex.williamson@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Approved-by: David Hildenbrand <david@redhat.com>
Merged-by: Augusto Caringi <acaringi@redhat.com>
0 commit comments