Skip to content

Commit e6fd7af

Browse files
committed
Merge: update cpuidle to match upstream v6.15
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6766 Resolves: 87863 JIRA: https://issues.redhat.com/browse/RHEL-87863 Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com> Approved-by: Eric Chanudet <echanude@redhat.com> Approved-by: Lenny Szubowicz <lszubowi@redhat.com> Approved-by: David Arcari <darcari@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Jarod Wilson <jarod@redhat.com>
2 parents 18ad6e2 + 90f84f3 commit e6fd7af

File tree

14 files changed

+176
-186
lines changed

14 files changed

+176
-186
lines changed

Documentation/admin-guide/pm/cpuidle.rst

Lines changed: 39 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -269,61 +269,56 @@ Namely, when invoked to select an idle state for a CPU (i.e. an idle state that
269269
the CPU will ask the processor hardware to enter), it attempts to predict the
270270
idle duration and uses the predicted value for idle state selection.
271271

272-
It first obtains the time until the closest timer event with the assumption
273-
that the scheduler tick will be stopped. That time, referred to as the *sleep
274-
length* in what follows, is the upper bound on the time before the next CPU
275-
wakeup. It is used to determine the sleep length range, which in turn is needed
276-
to get the sleep length correction factor.
277-
278-
The ``menu`` governor maintains two arrays of sleep length correction factors.
279-
One of them is used when tasks previously running on the given CPU are waiting
280-
for some I/O operations to complete and the other one is used when that is not
281-
the case. Each array contains several correction factor values that correspond
282-
to different sleep length ranges organized so that each range represented in the
283-
array is approximately 10 times wider than the previous one.
284-
285-
The correction factor for the given sleep length range (determined before
286-
selecting the idle state for the CPU) is updated after the CPU has been woken
287-
up and the closer the sleep length is to the observed idle duration, the closer
288-
to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
289-
The sleep length is multiplied by the correction factor for the range that it
290-
falls into to obtain the first approximation of the predicted idle duration.
291-
292-
Next, the governor uses a simple pattern recognition algorithm to refine its
272+
It first uses a simple pattern recognition algorithm to obtain a preliminary
293273
idle duration prediction. Namely, it saves the last 8 observed idle duration
294274
values and, when predicting the idle duration next time, it computes the average
295275
and variance of them. If the variance is small (smaller than 400 square
296276
milliseconds) or it is small relative to the average (the average is greater
297277
that 6 times the standard deviation), the average is regarded as the "typical
298-
interval" value. Otherwise, the longest of the saved observed idle duration
278+
interval" value. Otherwise, either the longest or the shortest (depending on
279+
which one is farther from the average) of the saved observed idle duration
299280
values is discarded and the computation is repeated for the remaining ones.
281+
300282
Again, if the variance of them is small (in the above sense), the average is
301283
taken as the "typical interval" value and so on, until either the "typical
302-
interval" is determined or too many data points are disregarded, in which case
303-
the "typical interval" is assumed to equal "infinity" (the maximum unsigned
304-
integer value). The "typical interval" computed this way is compared with the
305-
sleep length multiplied by the correction factor and the minimum of the two is
306-
taken as the predicted idle duration.
307-
308-
Then, the governor computes an extra latency limit to help "interactive"
309-
workloads. It uses the observation that if the exit latency of the selected
310-
idle state is comparable with the predicted idle duration, the total time spent
311-
in that state probably will be very short and the amount of energy to save by
312-
entering it will be relatively small, so likely it is better to avoid the
313-
overhead related to entering that state and exiting it. Thus selecting a
314-
shallower state is likely to be a better option then. The first approximation
315-
of the extra latency limit is the predicted idle duration itself which
316-
additionally is divided by a value depending on the number of tasks that
317-
previously ran on the given CPU and now they are waiting for I/O operations to
318-
complete. The result of that division is compared with the latency limit coming
319-
from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
320-
framework and the minimum of the two is taken as the limit for the idle states'
321-
exit latency.
284+
interval" is determined or too many data points are disregarded. In the latter
285+
case, if the size of the set of data points still under consideration is
286+
sufficiently large, the next idle duration is not likely to be above the largest
287+
idle duration value still in that set, so that value is taken as the predicted
288+
next idle duration. Finally, if the set of data points still under
289+
consideration is too small, no prediction is made.
290+
291+
If the preliminary prediction of the next idle duration computed this way is
292+
long enough, the governor obtains the time until the closest timer event with
293+
the assumption that the scheduler tick will be stopped. That time, referred to
294+
as the *sleep length* in what follows, is the upper bound on the time before the
295+
next CPU wakeup. It is used to determine the sleep length range, which in turn
296+
is needed to get the sleep length correction factor.
297+
298+
The ``menu`` governor maintains an array containing several correction factor
299+
values that correspond to different sleep length ranges organized so that each
300+
range represented in the array is approximately 10 times wider than the previous
301+
one.
302+
303+
The correction factor for the given sleep length range (determined before
304+
selecting the idle state for the CPU) is updated after the CPU has been woken
305+
up and the closer the sleep length is to the observed idle duration, the closer
306+
to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
307+
The sleep length is multiplied by the correction factor for the range that it
308+
falls into to obtain an approximation of the predicted idle duration that is
309+
compared to the "typical interval" determined previously and the minimum of
310+
the two is taken as the final idle duration prediction.
311+
312+
If the "typical interval" value is small, which means that the CPU is likely
313+
to be woken up soon enough, the sleep length computation is skipped as it may
314+
be costly and the idle duration is simply predicted to equal the "typical
315+
interval" value.
322316

323317
Now, the governor is ready to walk the list of idle states and choose one of
324318
them. For this purpose, it compares the target residency of each state with
325-
the predicted idle duration and the exit latency of it with the computed latency
326-
limit. It selects the state with the target residency closest to the predicted
319+
the predicted idle duration and the exit latency of it with the with the latency
320+
limit coming from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
321+
framework. It selects the state with the target residency closest to the predicted
327322
idle duration, but still below it, and exit latency that does not exceed the
328323
limit.
329324

arch/powerpc/include/asm/machdep.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55

66
#include <linux/seq_file.h>
77
#include <linux/init.h>
8-
#include <linux/dma-mapping.h>
98
#include <linux/export.h>
9+
#include <linux/time64.h>
10+
11+
#include <asm/page.h>
1012

1113
#include <asm/setup.h>
1214

@@ -17,10 +19,12 @@
1719

1820
struct pt_regs;
1921
struct pci_bus;
22+
struct device;
2023
struct device_node;
2124
struct iommu_table;
2225
struct rtc_time;
2326
struct file;
27+
struct pci_dev;
2428
struct pci_controller;
2529
struct kimage;
2630
struct pci_host_bridge;

arch/powerpc/kernel/sysfs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include <asm/hvcall.h>
1818
#include <asm/machdep.h>
1919
#include <asm/smp.h>
20+
#include <asm/time.h>
2021
#include <asm/pmc.h>
2122
#include <asm/firmware.h>
2223
#include <asm/idle.h>

arch/powerpc/platforms/pseries/svm.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
#include <linux/memblock.h>
1111
#include <linux/mem_encrypt.h>
1212
#include <linux/cc_platform.h>
13+
#include <linux/mem_encrypt.h>
1314
#include <asm/machdep.h>
1415
#include <asm/svm.h>
1516
#include <asm/swiotlb.h>

drivers/cpuidle/cpuidle-arm.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -137,17 +137,17 @@ static int __init arm_idle_init_cpu(int cpu)
137137
/*
138138
* arm_idle_init - Initializes arm cpuidle driver
139139
*
140-
* Initializes arm cpuidle driver for all CPUs, if any CPU fails
141-
* to register cpuidle driver then rollback to cancel all CPUs
142-
* registeration.
140+
* Initializes arm cpuidle driver for all present CPUs, if any
141+
* CPU fails to register cpuidle driver then rollback to cancel
142+
* all CPUs registration.
143143
*/
144144
static int __init arm_idle_init(void)
145145
{
146146
int cpu, ret;
147147
struct cpuidle_driver *drv;
148148
struct cpuidle_device *dev;
149149

150-
for_each_possible_cpu(cpu) {
150+
for_each_present_cpu(cpu) {
151151
ret = arm_idle_init_cpu(cpu);
152152
if (ret)
153153
goto out_fail;

drivers/cpuidle/cpuidle-big_little.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ static int __init bl_idle_driver_init(struct cpuidle_driver *drv, int part_id)
148148
if (!cpumask)
149149
return -ENOMEM;
150150

151-
for_each_possible_cpu(cpu)
151+
for_each_present_cpu(cpu)
152152
if (smp_cpuid_part(cpu) == part_id)
153153
cpumask_set_cpu(cpu, cpumask);
154154

drivers/cpuidle/cpuidle-psci-domain.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ static int psci_pd_init(struct device_node *np, bool use_osi)
7272
*/
7373
if (use_osi) {
7474
pd->power_off = psci_pd_power_off;
75+
pd->flags |= GENPD_FLAG_ACTIVE_WAKEUP;
7576
if (IS_ENABLED(CONFIG_PREEMPT_RT))
7677
pd->flags |= GENPD_FLAG_RPM_ALWAYS_ON;
7778
} else {

drivers/cpuidle/cpuidle-psci.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include <linux/syscore_ops.h>
2626

2727
#include <asm/cpuidle.h>
28+
#include <trace/events/power.h>
2829

2930
#include "cpuidle-psci.h"
3031
#include "dt_idle_states.h"
@@ -74,7 +75,9 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
7475
if (!state)
7576
state = states[idx];
7677

78+
trace_psci_domain_idle_enter(dev->cpu, state, s2idle);
7779
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
80+
trace_psci_domain_idle_exit(dev->cpu, state, s2idle);
7881

7982
if (s2idle)
8083
dev_pm_genpd_resume(pd_dev);
@@ -400,7 +403,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu)
400403
/*
401404
* psci_idle_probe - Initializes PSCI cpuidle driver
402405
*
403-
* Initializes PSCI cpuidle driver for all CPUs, if any CPU fails
406+
* Initializes PSCI cpuidle driver for all present CPUs, if any CPU fails
404407
* to register cpuidle driver then rollback to cancel all CPUs
405408
* registration.
406409
*/
@@ -410,7 +413,7 @@ static int psci_cpuidle_probe(struct platform_device *pdev)
410413
struct cpuidle_driver *drv;
411414
struct cpuidle_device *dev;
412415

413-
for_each_possible_cpu(cpu) {
416+
for_each_present_cpu(cpu) {
414417
ret = psci_idle_init_cpu(&pdev->dev, cpu);
415418
if (ret)
416419
goto out_fail;

drivers/cpuidle/cpuidle-pseries.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#include <asm/idle.h>
2323
#include <asm/plpar_wrappers.h>
2424
#include <asm/rtas.h>
25+
#include <asm/time.h>
2526

2627
static struct cpuidle_driver pseries_idle_driver = {
2728
.name = "pseries_idle",

drivers/cpuidle/cpuidle-qcom-spm.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ static int qcom_cpu_spc(struct spm_driver_data *drv)
4848
ret = cpu_suspend(0, qcom_pm_collapse);
4949
/*
5050
* ARM common code executes WFI without calling into our driver and
51-
* if the SPM mode is not reset, then we may accidently power down the
51+
* if the SPM mode is not reset, then we may accidentally power down the
5252
* cpu when we intended only to gate the cpu clock.
5353
* Ensure the state is set to standby before returning.
5454
*/
@@ -135,7 +135,7 @@ static int spm_cpuidle_drv_probe(struct platform_device *pdev)
135135
if (ret)
136136
return dev_err_probe(&pdev->dev, ret, "set warm boot addr failed");
137137

138-
for_each_possible_cpu(cpu) {
138+
for_each_present_cpu(cpu) {
139139
ret = spm_cpuidle_register(&pdev->dev, cpu);
140140
if (ret && ret != -ENODEV) {
141141
dev_err(&pdev->dev,

0 commit comments

Comments
 (0)