Skip to content

Commit ced85df

Browse files
committed
KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
JIRA: https://issues.redhat.com/browse/RHEL-73607 Upstream: https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git commit 0c67288 Author: Ankit Agrawal <ankita@nvidia.com> Date: Sat Jul 5 07:17:15 2025 +0000 KVM: arm64: Allow cacheable stage 2 mapping using VMA flags KVM currently forces non-cacheable memory attributes (either Normal-NC or Device-nGnRE) for a region based on pfn_is_map_memory(), i.e. whether or not the kernel has a cacheable alias for it. This is necessary in situations where KVM needs to perform CMOs on the region but is unnecessarily restrictive when hardware obviates the need for CMOs. KVM doesn't need to perform any CMOs on hardware with FEAT_S2FWB and CTR_EL0.DIC. As luck would have it, there are implementations in the wild that need to map regions of a device with cacheable attributes to function properly. An example of this is Nvidia's Grace Hopper/Blackwell systems where GPU memory is interchangeable with DDR and retains properties such as cacheability, unaligned accesses, atomics and handling of executable faults. Of course, for this to work in a VM the GPU memory needs to have a cacheable mapping at stage-2. Allow cacheable stage-2 mappings to be created on supporting hardware when the VMA has cacheable memory attributes. Check these preconditions during memslot creation (in addition to fault handling) to potentially 'fail-fast' as a courtesy to userspace. CC: Oliver Upton <oliver.upton@linux.dev> CC: Sean Christopherson <seanjc@google.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-6-ankita@nvidia.com [ Oliver: refine changelog, squash kvm_supports_cacheable_pfnmap() patch ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Donald Dutile <ddutile@redhat.com>
1 parent 66668d1 commit ced85df

File tree

2 files changed

+55
-22
lines changed

2 files changed

+55
-22
lines changed

arch/arm64/include/asm/kvm_mmu.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,24 @@ static inline void kvm_fault_unlock(struct kvm *kvm)
371371
read_unlock(&kvm->mmu_lock);
372372
}
373373

374+
/*
375+
* ARM64 KVM relies on a simple conversion from physaddr to a kernel
376+
* virtual address (KVA) when it does cache maintenance as the CMO
377+
* instructions work on virtual addresses. This is incompatible with
378+
* VM_PFNMAP VMAs which may not have a kernel direct mapping to a
379+
* virtual address.
380+
*
381+
* With S2FWB and CACHE DIC features, KVM need not do cache flushing
382+
* and CMOs are NOP'd. This has the effect of no longer requiring a
383+
* KVA for addresses mapped into the S2. The presence of these features
384+
* are thus necessary to support cacheable S2 mapping of VM_PFNMAP.
385+
*/
386+
static inline bool kvm_supports_cacheable_pfnmap(void)
387+
{
388+
return cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) &&
389+
cpus_have_final_cap(ARM64_HAS_CACHE_DIC);
390+
}
391+
374392
#ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS
375393
void kvm_s2_ptdump_create_debugfs(struct kvm *kvm);
376394
#else

arch/arm64/kvm/mmu.c

Lines changed: 37 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1650,18 +1650,39 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16501650
if (is_error_noslot_pfn(pfn))
16511651
return -EFAULT;
16521652

1653+
/*
1654+
* Check if this is non-struct page memory PFN, and cannot support
1655+
* CMOs. It could potentially be unsafe to access as cachable.
1656+
*/
16531657
if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(pfn)) {
1654-
/*
1655-
* If the page was identified as device early by looking at
1656-
* the VMA flags, vma_pagesize is already representing the
1657-
* largest quantity we can map. If instead it was mapped
1658-
* via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
1659-
* and must not be upgraded.
1660-
*
1661-
* In both cases, we don't let transparent_hugepage_adjust()
1662-
* change things at the last minute.
1663-
*/
1664-
s2_force_noncacheable = true;
1658+
if (is_vma_cacheable) {
1659+
/*
1660+
* Whilst the VMA owner expects cacheable mapping to this
1661+
* PFN, hardware also has to support the FWB and CACHE DIC
1662+
* features.
1663+
*
1664+
* ARM64 KVM relies on kernel VA mapping to the PFN to
1665+
* perform cache maintenance as the CMO instructions work on
1666+
* virtual addresses. VM_PFNMAP region are not necessarily
1667+
* mapped to a KVA and hence the presence of hardware features
1668+
* S2FWB and CACHE DIC are mandatory to avoid the need for
1669+
* cache maintenance.
1670+
*/
1671+
if (!kvm_supports_cacheable_pfnmap())
1672+
return -EFAULT;
1673+
} else {
1674+
/*
1675+
* If the page was identified as device early by looking at
1676+
* the VMA flags, vma_pagesize is already representing the
1677+
* largest quantity we can map. If instead it was mapped
1678+
* via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
1679+
* and must not be upgraded.
1680+
*
1681+
* In both cases, we don't let transparent_hugepage_adjust()
1682+
* change things at the last minute.
1683+
*/
1684+
s2_force_noncacheable = true;
1685+
}
16651686
} else if (logging_active && !write_fault) {
16661687
/*
16671688
* Only actually map the page as writable if this was a write
@@ -1670,15 +1691,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16701691
writable = false;
16711692
}
16721693

1673-
/*
1674-
* Prevent non-cacheable mappings in the stage-2 if a region of memory
1675-
* is cacheable in the primary MMU and the kernel lacks a cacheable
1676-
* alias. KVM cannot guarantee coherency between the guest/host aliases
1677-
* without the ability to perform CMOs.
1678-
*/
1679-
if (is_vma_cacheable && s2_force_noncacheable)
1680-
return -EINVAL;
1681-
16821694
if (exec_fault && s2_force_noncacheable)
16831695
return -ENOEXEC;
16841696

@@ -2239,8 +2251,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
22392251
break;
22402252
}
22412253

2242-
/* Cacheable PFNMAP is not allowed */
2243-
if (kvm_vma_is_cacheable(vma)) {
2254+
/*
2255+
* Cacheable PFNMAP is allowed only if the hardware
2256+
* supports it.
2257+
*/
2258+
if (kvm_vma_is_cacheable(vma) && !kvm_supports_cacheable_pfnmap()) {
22442259
ret = -EINVAL;
22452260
break;
22462261
}

0 commit comments

Comments
 (0)