x86/mce/therm_throt: Optimize notifications of thermal throttle

PlaidCat · PlaidCat · commit 598d507e5dd6 · 2025-06-06T13:30:24.000-04:00
jira LE-3201 Rebuild_History Non-Buildable kernel-rt-4.18.0-553.22.1.rt7.363.el8_10 commit-author Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> commit f665620 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-rt-4.18.0-553.22.1.rt7.363.el8_10/f6656208.failed Some modern systems have very tight thermal tolerances. Because of this they may cross thermal thresholds when running normal workloads (even during boot). The CPU hardware will react by limiting power/frequency and using duty cycles to bring the temperature back into normal range. Thus users may see a "critical" message about the "temperature above threshold" which is soon followed by "temperature/speed normal". These messages are rate-limited, but still may repeat every few minutes. This issue became worse starting with the Ivy Bridge generation of CPUs because they include a TCC activation offset in the MSR IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before critical temperatures are reached. A test run on a laptop with Intel 8th Gen i5 core for two hours with a workload resulted in 20K+ thermal interrupts per CPU for core level and another 20K+ interrupts at package level. The kernel logs were full of throttling messages. The real value of these threshold interrupts, is to debug problems with the external cooling solutions and performance issues due to excessive throttling. So the solution here is the following: - In the current thermal_throttle folder, show: - the maximum time for one throttling event and, - the total amount of time the system was in throttling state. - Do not log short excursions. - Log only when, in spite of thermal throttling, the temperature is rising. On the high threshold interrupt trigger a delayed workqueue that monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When the log bit is set, this workqueue callback calculates three point moving average and logs a warning message when the temperature trend is rising. When this log bit is clear and temperature is below threshold temperature, then the workqueue callback logs a "Normal" message. Once a high threshold event is logged, the logging is rate-limited. With this patch on the same test laptop, no warnings are printed in the logs as the max time the processor could bring the temperature under control is only 280 ms. This implementation is done with the inputs from Alan Cox and Tony Luck. [ bp: Touchups. ] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: bberg@redhat.com Cc: ckellner@redhat.com Cc: hdegoede@redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191111214312.81365-1-srinivas.pandruvada@linux.intel.com (cherry picked from commit f665620) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # drivers/thermal/intel/therm_throt.c
diff --git a/ciq/ciq_backports/kernel-rt-4.18.0-553.22.1.rt7.363.el8_10/f6656208.failed b/ciq/ciq_backports/kernel-rt-4.18.0-553.22.1.rt7.363.el8_10/f6656208.failed
@@ -0,0 +1,123 @@
+x86/mce/therm_throt: Optimize notifications of thermal throttle
+
+jira LE-3201
+Rebuild_History Non-Buildable kernel-rt-4.18.0-553.22.1.rt7.363.el8_10
+commit-author Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
+commit f6656208f04e5b3804054008eba4bf7170f4c841
+Empty-Commit: Cherry-Pick Conflicts during history rebuild.
+Will be included in final tarball splat. Ref for failed cherry-pick at:
+ciq/ciq_backports/kernel-rt-4.18.0-553.22.1.rt7.363.el8_10/f6656208.failed
+
+Some modern systems have very tight thermal tolerances. Because of this
+they may cross thermal thresholds when running normal workloads (even
+during boot). The CPU hardware will react by limiting power/frequency
+and using duty cycles to bring the temperature back into normal range.
+
+Thus users may see a "critical" message about the "temperature above
+threshold" which is soon followed by "temperature/speed normal". These
+messages are rate-limited, but still may repeat every few minutes.
+
+This issue became worse starting with the Ivy Bridge generation of
+CPUs because they include a TCC activation offset in the MSR
+IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before
+critical temperatures are reached.
+
+A test run on a laptop with Intel 8th Gen i5 core for two hours with a
+workload resulted in 20K+ thermal interrupts per CPU for core level and
+another 20K+ interrupts at package level. The kernel logs were full of
+throttling messages.
+
+The real value of these threshold interrupts, is to debug problems with
+the external cooling solutions and performance issues due to excessive
+throttling.
+
+So the solution here is the following:
+
+  - In the current thermal_throttle folder, show:
+    - the maximum time for one throttling event and,
+    - the total amount of time the system was in throttling state.
+
+  - Do not log short excursions.
+
+  - Log only when, in spite of thermal throttling, the temperature is rising.
+  On the high threshold interrupt trigger a delayed workqueue that
+  monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When
+  the log bit is set, this workqueue callback calculates three point moving
+  average and logs a warning message when the temperature trend is rising.
+
+  When this log bit is clear and temperature is below threshold
+  temperature, then the workqueue callback logs a "Normal" message. Once a
+  high threshold event is logged, the logging is rate-limited.
+
+With this patch on the same test laptop, no warnings are printed in the logs
+as the max time the processor could bring the temperature under control is
+only 280 ms.
+
+This implementation is done with the inputs from Alan Cox and Tony Luck.
+
+ [ bp: Touchups. ]
+
+	Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
+	Signed-off-by: Borislav Petkov <bp@suse.de>
+	Cc: "H. Peter Anvin" <hpa@zytor.com>
+	Cc: bberg@redhat.com
+	Cc: ckellner@redhat.com
+	Cc: hdegoede@redhat.com
+	Cc: Ingo Molnar <mingo@redhat.com>
+	Cc: linux-edac <linux-edac@vger.kernel.org>
+	Cc: Thomas Gleixner <tglx@linutronix.de>
+	Cc: Tony Luck <tony.luck@intel.com>
+	Cc: x86-ml <x86@kernel.org>
+Link: https://lkml.kernel.org/r/20191111214312.81365-1-srinivas.pandruvada@linux.intel.com
+(cherry picked from commit f6656208f04e5b3804054008eba4bf7170f4c841)
+	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
+
+# Conflicts:
+#	drivers/thermal/intel/therm_throt.c
+diff --cc drivers/thermal/intel/therm_throt.c
+index dd55d96efeff,d01e0da0163a..000000000000
+--- a/drivers/thermal/intel/therm_throt.c
++++ b/drivers/thermal/intel/therm_throt.c
+@@@ -268,23 -458,28 +457,41 @@@ static void thermal_throttle_remove_dev
+  /* Get notified when a cpu comes on/off. Be hotplug friendly. */
+  static int thermal_throttle_online(unsigned int cpu)
+  {
++ 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
+  	struct device *dev = get_cpu_device(cpu);
+  
+++<<<<<<< HEAD:drivers/thermal/intel/therm_throt.c
+ +	/*
+ +	 * The first CPU coming online will enable the HFI. Usually this causes
+ +	 * hardware to issue an HFI thermal interrupt. Such interrupt will reach
+ +	 * the CPU once we enable the thermal vector in the local APIC.
+ +	 */
+ +	intel_hfi_online(cpu);
+++=======
++ 	state->package_throttle.level = PACKAGE_LEVEL;
++ 	state->core_throttle.level = CORE_LEVEL;
++ 
++ 	INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
++ 	INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);
+++>>>>>>> f6656208f04e (x86/mce/therm_throt: Optimize notifications of thermal throttle):arch/x86/kernel/cpu/mce/therm_throt.c
+  
+  	return thermal_throttle_add_dev(dev, cpu);
+  }
+  
+  static int thermal_throttle_offline(unsigned int cpu)
+  {
++ 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
+  	struct device *dev = get_cpu_device(cpu);
+  
+++<<<<<<< HEAD:drivers/thermal/intel/therm_throt.c
+ +	intel_hfi_offline(cpu);
+++=======
++ 	cancel_delayed_work(&state->package_throttle.therm_work);
++ 	cancel_delayed_work(&state->core_throttle.therm_work);
++ 
++ 	state->package_throttle.rate_control_active = false;
++ 	state->core_throttle.rate_control_active = false;
+++>>>>>>> f6656208f04e (x86/mce/therm_throt: Optimize notifications of thermal throttle):arch/x86/kernel/cpu/mce/therm_throt.c
+  
+  	thermal_throttle_remove_dev(dev);
+  	return 0;
+* Unmerged path drivers/thermal/intel/therm_throt.c