Skip to content

Commit 7c1282f

Browse files
dvdgomezPlaidCat
authored andcommitted
x86/speculation: Add RSB VM Exit protections
jira LE-958 cve CVE-2022-26373 commit 2b12993 upstream-diff Enums SPECTRE_V2_RETPOLINE_IBRS_USER and SPECTRE_V2_IBRS_ALWAYS have been added to the switch statement as they are still used by EL8 and are required to avoid compile errors. tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e., eIBRS systems which enable legacy IBRS explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> (cherry picked from commit 2b12993) Signed-off-by: David Gomez <dgomez@ciq.com>
1 parent 21c757b commit 7c1282f

File tree

9 files changed

+115
-29
lines changed

9 files changed

+115
-29
lines changed

Documentation/admin-guide/hw-vuln/spectre.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,14 @@ The possible values in this file are:
422422
'RSB filling' Protection of RSB on context switch enabled
423423
============= ===========================================
424424

425+
- EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status:
426+
427+
=========================== =======================================================
428+
'PBRSB-eIBRS: SW sequence' CPU is affected and protection of RSB on VMEXIT enabled
429+
'PBRSB-eIBRS: Vulnerable' CPU is vulnerable
430+
'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB
431+
=========================== =======================================================
432+
425433
Full mitigation might require a microcode update from the CPU
426434
vendor. When the necessary microcode is not available, the kernel will
427435
report vulnerability.

arch/x86/include/asm/cpufeatures.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,7 @@
298298
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
299299
#define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */
300300
#define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
301+
#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
301302

302303
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
303304
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
@@ -451,5 +452,6 @@
451452
#define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
452453
#define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */
453454
#define X86_BUG_RETBLEED X86_BUG(26) /* CPU is affected by RETBleed */
455+
#define X86_BUG_EIBRS_PBRSB X86_BUG(27) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
454456

455457
#endif /* _ASM_X86_CPUFEATURES_H */

arch/x86/include/asm/msr-index.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,10 @@
148148
* are restricted to targets in
149149
* kernel.
150150
*/
151+
#define ARCH_CAP_PBRSB_NO BIT(24) /*
152+
* Not susceptible to Post-Barrier
153+
* Return Stack Buffer Predictions.
154+
*/
151155

152156
#define MSR_IA32_FLUSH_CMD 0x0000010b
153157
#define L1D_FLUSH BIT(0) /*

arch/x86/include/asm/nospec-branch.h

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,13 +144,28 @@
144144
#endif
145145
.endm
146146

147+
.macro ISSUE_UNBALANCED_RET_GUARD
148+
ANNOTATE_INTRA_FUNCTION_CALL
149+
call .Lunbalanced_ret_guard_\@
150+
int3
151+
.Lunbalanced_ret_guard_\@:
152+
add $(BITS_PER_LONG/8), %_ASM_SP
153+
lfence
154+
.endm
155+
147156
/*
148157
* A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
149158
* monstrosity above, manually.
150159
*/
151-
.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
160+
.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2
161+
.ifb \ftr2
152162
ALTERNATIVE "jmp .Lskip_rsb_\@", "", \ftr
163+
.else
164+
ALTERNATIVE_2 "jmp .Lskip_rsb_\@", "", \ftr, "jmp .Lunbalanced_\@", \ftr2
165+
.endif
153166
__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)
167+
.Lunbalanced_\@:
168+
ISSUE_UNBALANCED_RET_GUARD
154169
.Lskip_rsb_\@:
155170
.endm
156171

arch/x86/kernel/cpu/bugs.c

Lines changed: 65 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1318,6 +1318,55 @@ static void __init spec_ctrl_disable_kernel_rrsba(void)
13181318
}
13191319
}
13201320

1321+
static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
1322+
{
1323+
/*
1324+
* Similar to context switches, there are two types of RSB attacks
1325+
* after VM exit:
1326+
*
1327+
* 1) RSB underflow
1328+
*
1329+
* 2) Poisoned RSB entry
1330+
*
1331+
* When retpoline is enabled, both are mitigated by filling/clearing
1332+
* the RSB.
1333+
*
1334+
* When IBRS is enabled, while #1 would be mitigated by the IBRS branch
1335+
* prediction isolation protections, RSB still needs to be cleared
1336+
* because of #2. Note that SMEP provides no protection here, unlike
1337+
* user-space-poisoned RSB entries.
1338+
*
1339+
* eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
1340+
* bug is present then a LITE version of RSB protection is required,
1341+
* just a single call needs to retire before a RET is executed.
1342+
*/
1343+
switch (mode) {
1344+
case SPECTRE_V2_NONE:
1345+
return;
1346+
1347+
case SPECTRE_V2_EIBRS_LFENCE:
1348+
case SPECTRE_V2_EIBRS:
1349+
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
1350+
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
1351+
pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
1352+
}
1353+
return;
1354+
1355+
case SPECTRE_V2_EIBRS_RETPOLINE:
1356+
case SPECTRE_V2_RETPOLINE_IBRS_USER:
1357+
case SPECTRE_V2_RETPOLINE:
1358+
case SPECTRE_V2_LFENCE:
1359+
case SPECTRE_V2_IBRS_ALWAYS:
1360+
case SPECTRE_V2_IBRS:
1361+
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
1362+
pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
1363+
return;
1364+
}
1365+
1366+
pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit");
1367+
dump_stack();
1368+
}
1369+
13211370
static void __init spectre_v2_select_mitigation(void)
13221371
{
13231372
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -1485,28 +1534,7 @@ static void __init spectre_v2_select_mitigation(void)
14851534
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
14861535
pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
14871536

1488-
/*
1489-
* Similar to context switches, there are two types of RSB attacks
1490-
* after vmexit:
1491-
*
1492-
* 1) RSB underflow
1493-
*
1494-
* 2) Poisoned RSB entry
1495-
*
1496-
* When retpoline is enabled, both are mitigated by filling/clearing
1497-
* the RSB.
1498-
*
1499-
* When IBRS is enabled, while #1 would be mitigated by the IBRS branch
1500-
* prediction isolation protections, RSB still needs to be cleared
1501-
* because of #2. Note that SMEP provides no protection here, unlike
1502-
* user-space-poisoned RSB entries.
1503-
*
1504-
* eIBRS, on the other hand, has RSB-poisoning protections, so it
1505-
* doesn't need RSB clearing after vmexit.
1506-
*/
1507-
if (boot_cpu_has(X86_FEATURE_RETPOLINE) ||
1508-
boot_cpu_has(X86_FEATURE_KERNEL_IBRS))
1509-
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
1537+
spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
15101538

15111539
/*
15121540
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
@@ -2249,6 +2277,19 @@ static char *ibpb_state(void)
22492277
return "";
22502278
}
22512279

2280+
static char *pbrsb_eibrs_state(void)
2281+
{
2282+
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
2283+
if (boot_cpu_has(X86_FEATURE_RSB_VMEXIT_LITE) ||
2284+
boot_cpu_has(X86_FEATURE_RSB_VMEXIT))
2285+
return ", PBRSB-eIBRS: SW sequence";
2286+
else
2287+
return ", PBRSB-eIBRS: Vulnerable";
2288+
} else {
2289+
return ", PBRSB-eIBRS: Not affected";
2290+
}
2291+
}
2292+
22522293
static ssize_t spectre_v2_show_state(char *buf)
22532294
{
22542295
if (spectre_v2_enabled == SPECTRE_V2_LFENCE)
@@ -2261,12 +2302,13 @@ static ssize_t spectre_v2_show_state(char *buf)
22612302
spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
22622303
return sprintf(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
22632304

2264-
return sprintf(buf, "%s%s%s%s%s%s\n",
2305+
return sprintf(buf, "%s%s%s%s%s%s%s\n",
22652306
spectre_v2_strings[spectre_v2_enabled],
22662307
ibpb_state(),
22672308
boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
22682309
stibp_state(),
22692310
boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "",
2311+
pbrsb_eibrs_state(),
22702312
spectre_v2_module_string());
22712313
}
22722314

arch/x86/kernel/cpu/common.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -952,6 +952,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
952952
#define MSBDS_ONLY BIT(5)
953953
#define NO_SWAPGS BIT(6)
954954
#define NO_ITLB_MULTIHIT BIT(7)
955+
#define NO_EIBRS_PBRSB BIT(9)
955956

956957
#define VULNWL(vendor, family, model, whitelist) \
957958
X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, whitelist)
@@ -988,7 +989,7 @@ static const __initconst struct x86_cpu_id_v2 cpu_vuln_whitelist[] = {
988989

989990
VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
990991
VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
991-
VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
992+
VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
992993

993994
/*
994995
* Technically, swapgs isn't serializing on AMD (despite it previously
@@ -998,7 +999,9 @@ static const __initconst struct x86_cpu_id_v2 cpu_vuln_whitelist[] = {
998999
* good enough for our purposes.
9991000
*/
10001001

1001-
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT),
1002+
VULNWL_INTEL(ATOM_TREMONT, NO_EIBRS_PBRSB),
1003+
VULNWL_INTEL(ATOM_TREMONT_L, NO_EIBRS_PBRSB),
1004+
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
10021005

10031006
/* AMD Family 0xf - 0x12 */
10041007
VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
@@ -1169,6 +1172,11 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
11691172
setup_force_cpu_bug(X86_BUG_RETBLEED);
11701173
}
11711174

1175+
if (cpu_has(c, X86_FEATURE_IBRS_ENHANCED) &&
1176+
!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
1177+
!(ia32_cap & ARCH_CAP_PBRSB_NO))
1178+
setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
1179+
11721180
if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
11731181
return;
11741182

arch/x86/kvm/vmx/vmenter.S

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -226,11 +226,13 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
226226
* entries and (in some cases) RSB underflow.
227227
*
228228
* eIBRS has its own protection against poisoned RSB, so it doesn't
229-
* need the RSB filling sequence. But it does need to be enabled
230-
* before the first unbalanced RET.
229+
* need the RSB filling sequence. But it does need to be enabled, and a
230+
* single call to retire, before the first unbalanced RET.
231231
*/
232232

233-
FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
233+
FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT,\
234+
X86_FEATURE_RSB_VMEXIT_LITE
235+
234236

235237
pop %_ASM_ARG2 /* @flags */
236238
pop %_ASM_ARG1 /* @vmx */

tools/arch/x86/include/asm/cpufeatures.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,7 @@
272272

273273
/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */
274274
#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */
275+
#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM-Exit when EIBRS is enabled */
275276

276277
/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */
277278
#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring */

tools/arch/x86/include/asm/msr-index.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,10 @@
147147
* are restricted to targets in
148148
* kernel.
149149
*/
150+
#define ARCH_CAP_PBRSB_NO BIT(24) /*
151+
* Not susceptible to Post-Barrier
152+
* Return Stack Buffer Predictions.
153+
*/
150154

151155
#define MSR_IA32_FLUSH_CMD 0x0000010b
152156
#define L1D_FLUSH BIT(0) /*

0 commit comments

Comments
 (0)