Skip to content

Commit 3a729b9

Browse files
committed
Revert "sched/core: Reduce cost of sched_move_task when config autogroup"
JIRA: https://issues.redhat.com/browse/RHEL-96250 Conflicts: Context diff due to not having sched_ext code in RHEL9. commit 76f970c Author: Dietmar Eggemann <dietmar.eggemann@arm.com> Date: Fri Mar 14 16:13:45 2025 +0100 Revert "sched/core: Reduce cost of sched_move_task when config autogroup" This reverts commit eff6c8c. Hazem reported a 30% drop in UnixBench spawn test with commit eff6c8c ("sched/core: Reduce cost of sched_move_task when config autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM (aarch64) (single level MC sched domain): https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com There is an early bail from sched_move_task() if p->sched_task_group is equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope' (Ubuntu '22.04.5 LTS'). So in: do_exit() sched_autogroup_exit_task() sched_move_task() if sched_get_task_group(p) == p->sched_task_group return /* p is enqueued */ dequeue_task() \ sched_change_group() | task_change_group_fair() | detach_task_cfs_rq() | (1) set_task_rq() | attach_task_cfs_rq() | enqueue_task() / (1) isn't called for p anymore. Turns out that the regression is related to sgs->group_util in group_is_overloaded() and group_has_capacity(). If (1) isn't called for all the 'spawn' tasks then sgs->group_util is ~900 and sgs->group_capacity = 1024 (single CPU sched domain) and this leads to group_is_overloaded() returning true (2) and group_has_capacity() false (3) much more often compared to the case when (1) is called. I.e. there are much more cases of 'group_is_overloaded' and 'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which then returns much more often a CPU != smp_processor_id() (5). This isn't good for these extremely short running tasks (FORK + EXIT) and also involves calling sched_balance_find_dst_group_cpu() unnecessary (single CPU sched domain). Instead if (1) is called for 'p->flags & PF_EXITING' then the path (4),(6) is taken much more often. select_task_rq_fair(..., wake_flags = WF_FORK) cpu = smp_processor_id() new_cpu = sched_balance_find_dst_cpu(..., cpu, ...) group = sched_balance_find_dst_group(..., cpu) do { update_sg_wakeup_stats() sgs->group_type = group_classify() if group_is_overloaded() (2) return group_overloaded if !group_has_capacity() (3) return group_fully_busy return group_has_spare (4) } while group if local_sgs.group_type > idlest_sgs.group_type return idlest (5) case group_has_spare: if local_sgs.idle_cpus >= idlest_sgs.idle_cpus return NULL (6) Unixbench Tests './Run -c 4 spawn' on: (a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4') and Ubuntu 22.04.5 LTS (aarch64). Shell & test run in '/user.slice/user-1000.slice/session-1.scope'. w/o patch w/ patch 21005 27120 (b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and Ubuntu 22.04.5 LTS (x86_64). Shell & test run in '/A'. w/o patch w/ patch 67675 88806 CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal 0 or 1. Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Hagar Hemdan <hagarhem@amazon.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@arm.com Signed-off-by: Phil Auld <pauld@redhat.com>
1 parent 895b8a3 commit 3a729b9

File tree

1 file changed

+3
-18
lines changed

1 file changed

+3
-18
lines changed

kernel/sched/core.c

Lines changed: 3 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8790,7 +8790,7 @@ void sched_release_group(struct task_group *tg)
87908790
spin_unlock_irqrestore(&task_group_lock, flags);
87918791
}
87928792

8793-
static struct task_group *sched_get_task_group(struct task_struct *tsk)
8793+
static void sched_change_group(struct task_struct *tsk)
87948794
{
87958795
struct task_group *tg;
87968796

@@ -8802,13 +8802,7 @@ static struct task_group *sched_get_task_group(struct task_struct *tsk)
88028802
tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
88038803
struct task_group, css);
88048804
tg = autogroup_task_group(tsk, tg);
8805-
8806-
return tg;
8807-
}
8808-
8809-
static void sched_change_group(struct task_struct *tsk, struct task_group *group)
8810-
{
8811-
tsk->sched_task_group = group;
8805+
tsk->sched_task_group = tg;
88128806

88138807
#ifdef CONFIG_FAIR_GROUP_SCHED
88148808
if (tsk->sched_class->task_change_group)
@@ -8829,20 +8823,11 @@ void sched_move_task(struct task_struct *tsk)
88298823
{
88308824
int queued, running, queue_flags =
88318825
DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
8832-
struct task_group *group;
88338826
struct rq *rq;
88348827

88358828
CLASS(task_rq_lock, rq_guard)(tsk);
88368829
rq = rq_guard.rq;
88378830

8838-
/*
8839-
* Esp. with SCHED_AUTOGROUP enabled it is possible to get superfluous
8840-
* group changes.
8841-
*/
8842-
group = sched_get_task_group(tsk);
8843-
if (group == tsk->sched_task_group)
8844-
return;
8845-
88468831
update_rq_clock(rq);
88478832

88488833
running = task_current(rq, tsk);
@@ -8853,7 +8838,7 @@ void sched_move_task(struct task_struct *tsk)
88538838
if (running)
88548839
put_prev_task(rq, tsk);
88558840

8856-
sched_change_group(tsk, group);
8841+
sched_change_group(tsk);
88578842

88588843
if (queued)
88598844
enqueue_task(rq, tsk, queue_flags);

0 commit comments

Comments
 (0)