Skip to content

Commit 8f032ec

Browse files
author
Maxim Levitsky
committed
KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_IGNORE_GUEST_PAT
JIRA: https://issues.redhat.com/browse/RHEL-47242 commit c9c1e20 Author: Yan Zhao <yan.y.zhao@intel.com> Date: Mon Feb 24 15:09:45 2025 +0800 KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_IGNORE_GUEST_PAT Introduce an Intel specific quirk KVM_X86_QUIRK_IGNORE_GUEST_PAT to have KVM ignore guest PAT when this quirk is enabled. On AMD platforms, KVM always honors guest PAT. On Intel however there are two issues. First, KVM *cannot* honor guest PAT if CPU feature self-snoop is not supported. Second, UC access on certain Intel platforms can be very slow[1] and honoring guest PAT on those platforms may break some old guests that accidentally specify video RAM as UC. Those old guests may never expect the slowness since KVM always forces WB previously. See [2]. So, introduce a quirk that KVM can enable by default on all Intel platforms to avoid breaking old unmodifiable guests. Newer userspace can disable this quirk if it wishes KVM to honor guest PAT; disabling the quirk will fail if self-snoop is not supported, i.e. if KVM cannot obey the wish. The quirk is a no-op on AMD and also if any assigned devices have non-coherent DMA. This is not an issue, as KVM_X86_QUIRK_CD_NW_CLEARED is another example of a quirk that is sometimes automatically disabled. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1] Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2] Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com> [Use supported_quirks/inapplicable_quirks to support both AMD and no-self-snoop cases, as well as to remove the shadow_memtype_mask check from kvm_mmu_may_ignore_guest_pat(). - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
1 parent cccbe3d commit 8f032ec

File tree

7 files changed

+73
-15
lines changed

7 files changed

+73
-15
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8200,6 +8200,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
82008200
and 0x489), as KVM does now allow them to
82018201
be set by userspace (KVM sets them based on
82028202
guest CPUID, for safety purposes).
8203+
8204+
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
8205+
guest PAT and forces the effective memory
8206+
type to WB in EPT. The quirk is not available
8207+
on Intel platforms which are incapable of
8208+
safely honoring guest PAT (i.e., without CPU
8209+
self-snoop, KVM always ignores guest PAT and
8210+
forces effective memory type to WB). It is
8211+
also ignored on AMD platforms or, on Intel,
8212+
when a VM has non-coherent DMA devices
8213+
assigned; KVM always honors guest PAT in
8214+
such case. The quirk is needed to avoid
8215+
slowdowns on certain Intel Xeon platforms
8216+
(e.g. ICX, SPR) where self-snoop feature is
8217+
supported but UC is slow enough to cause
8218+
issues with some older guests that use
8219+
UC instead of WC to map the video RAM.
8220+
Userspace can disable the quirk to honor
8221+
guest PAT if it knows that there is no such
8222+
guest software, for example if it does not
8223+
expose a bochs graphics device (which is
8224+
known to have had a buggy driver).
82038225
=================================== ============================================
82048226

82058227
7.32 KVM_CAP_MAX_VCPU_ID

arch/x86/include/asm/kvm_host.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2429,10 +2429,12 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
24292429
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
24302430
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
24312431
KVM_X86_QUIRK_SLOT_ZAP_ALL | \
2432-
KVM_X86_QUIRK_STUFF_FEATURE_MSRS)
2432+
KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
2433+
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
24332434

24342435
#define KVM_X86_CONDITIONAL_QUIRKS \
2435-
KVM_X86_QUIRK_CD_NW_CLEARED
2436+
(KVM_X86_QUIRK_CD_NW_CLEARED | \
2437+
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
24362438

24372439
/*
24382440
* KVM previously used a u32 field in kvm_run to indicate the hypercall was

arch/x86/include/uapi/asm/kvm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,7 @@ struct kvm_sync_regs {
441441
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
442442
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
443443
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
444+
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)
444445

445446
#define KVM_STATE_NESTED_FORMAT_VMX 0
446447
#define KVM_STATE_NESTED_FORMAT_SVM 1

arch/x86/kvm/mmu.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
235235
return -(u32)fault & errcode;
236236
}
237237

238-
bool kvm_mmu_may_ignore_guest_pat(void);
238+
bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);
239239

240240
int kvm_mmu_post_init_vm(struct kvm *kvm);
241241
void kvm_mmu_pre_destroy_vm(struct kvm *kvm);

arch/x86/kvm/mmu/mmu.c

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4836,17 +4836,19 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
48364836
}
48374837
#endif
48384838

4839-
bool kvm_mmu_may_ignore_guest_pat(void)
4839+
bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm)
48404840
{
48414841
/*
48424842
* When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
48434843
* has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
48444844
* honor the memtype from the guest's PAT so that guest accesses to
48454845
* memory that is DMA'd aren't cached against the guest's wishes. As a
4846-
* result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
4847-
* KVM _always_ ignores guest PAT (when EPT is enabled).
4846+
* result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA.
4847+
* KVM _always_ ignores guest PAT, when EPT is enabled and when quirk
4848+
* KVM_X86_QUIRK_IGNORE_GUEST_PAT is enabled or the CPU lacks the
4849+
* ability to safely honor guest PAT.
48484850
*/
4849-
return shadow_memtype_mask;
4851+
return kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT);
48504852
}
48514853

48524854
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

arch/x86/kvm/vmx/vmx.c

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7644,6 +7644,17 @@ int vmx_vm_init(struct kvm *kvm)
76447644
return 0;
76457645
}
76467646

7647+
static inline bool vmx_ignore_guest_pat(struct kvm *kvm)
7648+
{
7649+
/*
7650+
* Non-coherent DMA devices need the guest to flush CPU properly.
7651+
* In that case it is not possible to map all guest RAM as WB, so
7652+
* always trust guest PAT.
7653+
*/
7654+
return !kvm_arch_has_noncoherent_dma(kvm) &&
7655+
kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT);
7656+
}
7657+
76477658
u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
76487659
{
76497660
/*
@@ -7653,13 +7664,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
76537664
if (is_mmio)
76547665
return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
76557666

7656-
/*
7657-
* Force WB and ignore guest PAT if the VM does NOT have a non-coherent
7658-
* device attached. Letting the guest control memory types on Intel
7659-
* CPUs may result in unexpected behavior, and so KVM's ABI is to trust
7660-
* the guest to behave only as a last resort.
7661-
*/
7662-
if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
7667+
/* Force WB if ignoring guest PAT */
7668+
if (vmx_ignore_guest_pat(vcpu->kvm))
76637669
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
76647670

76657671
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
@@ -8595,6 +8601,27 @@ __init int vmx_hardware_setup(void)
85958601

85968602
kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
85978603

8604+
/*
8605+
* On Intel CPUs that lack self-snoop feature, letting the guest control
8606+
* memory types may result in unexpected behavior. So always ignore guest
8607+
* PAT on those CPUs and map VM as writeback, not allowing userspace to
8608+
* disable the quirk.
8609+
*
8610+
* On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is
8611+
* supported, UC is slow enough to cause issues with some older guests (e.g.
8612+
* an old version of bochs driver uses ioremap() instead of ioremap_wc() to
8613+
* map the video RAM, causing wayland desktop to fail to get started
8614+
* correctly). To avoid breaking those older guests that rely on KVM to force
8615+
* memory type to WB, provide KVM_X86_QUIRK_IGNORE_GUEST_PAT to preserve the
8616+
* safer (for performance) default behavior.
8617+
*
8618+
* On top of this, non-coherent DMA devices need the guest to flush CPU
8619+
* caches properly. This also requires honoring guest PAT, and is forced
8620+
* independent of the quirk in vmx_ignore_guest_pat().
8621+
*/
8622+
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
8623+
kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT;
8624+
kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT;
85988625
return r;
85998626
}
86008627

arch/x86/kvm/x86.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9838,6 +9838,10 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
98389838
if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled)
98399839
kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM);
98409840

9841+
/* KVM always ignores guest PAT for shadow paging. */
9842+
if (!tdp_enabled)
9843+
kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT;
9844+
98419845
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
98429846
kvm_caps.supported_xss = 0;
98439847

@@ -13563,7 +13567,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
1356313567
* (or last) non-coherent device is (un)registered to so that new SPTEs
1356413568
* with the correct "ignore guest PAT" setting are created.
1356513569
*/
13566-
if (kvm_mmu_may_ignore_guest_pat())
13570+
if (kvm_mmu_may_ignore_guest_pat(kvm))
1356713571
kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
1356813572
}
1356913573

0 commit comments

Comments
 (0)