Skip to content

Commit f4f8151

Browse files
committed
Merge: workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6197 JIRA: https://issues.redhat.com/browse/RHEL-74107 CVE: CVE-2024-57888 commit de35994 Author: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Date: Thu, 19 Dec 2024 09:30:30 +0000 workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker After commit 746ae46 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM") amdgpu started seeing the following warning: [ ] workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu] ... [ ] Workqueue: sdma0 drm_sched_run_job_work [gpu_sched] ... [ ] Call Trace: [ ] <TASK> ... [ ] ? check_flush_dependency+0xf5/0x110 ... [ ] cancel_delayed_work_sync+0x6e/0x80 [ ] amdgpu_gfx_off_ctrl+0xab/0x140 [amdgpu] [ ] amdgpu_ring_alloc+0x40/0x50 [amdgpu] [ ] amdgpu_ib_schedule+0xf4/0x810 [amdgpu] [ ] ? drm_sched_run_job_work+0x22c/0x430 [gpu_sched] [ ] amdgpu_job_run+0xaa/0x1f0 [amdgpu] [ ] drm_sched_run_job_work+0x257/0x430 [gpu_sched] [ ] process_one_work+0x217/0x720 ... [ ] </TASK> The intent of the verifcation done in check_flush_depedency is to ensure forward progress during memory reclaim, by flagging cases when either a memory reclaim process, or a memory reclaim work item is flushed from a context not marked as memory reclaim safe. This is correct when flushing, but when called from the cancel(_delayed)_work_sync() paths it is a false positive because work is either already running, or will not be running at all. Therefore cancelling it is safe and we can relax the warning criteria by letting the helper know of the calling context. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: fca839c ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue") References: 746ae46 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM") Cc: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v4.5+ Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com> Approved-by: Phil Auld <pauld@redhat.com> Approved-by: Herton R. Krzesinski <herton@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Patrick Talbert <ptalbert@redhat.com>
2 parents 7223a3a + ec40aa3 commit f4f8151

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

kernel/workqueue.c

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3749,23 +3749,27 @@ void workqueue_softirq_dead(unsigned int cpu)
37493749
* check_flush_dependency - check for flush dependency sanity
37503750
* @target_wq: workqueue being flushed
37513751
* @target_work: work item being flushed (NULL for workqueue flushes)
3752+
* @from_cancel: are we called from the work cancel path
37523753
*
37533754
* %current is trying to flush the whole @target_wq or @target_work on it.
3754-
* If @target_wq doesn't have %WQ_MEM_RECLAIM, verify that %current is not
3755-
* reclaiming memory or running on a workqueue which doesn't have
3756-
* %WQ_MEM_RECLAIM as that can break forward-progress guarantee leading to
3757-
* a deadlock.
3755+
* If this is not the cancel path (which implies work being flushed is either
3756+
* already running, or will not be at all), check if @target_wq doesn't have
3757+
* %WQ_MEM_RECLAIM and verify that %current is not reclaiming memory or running
3758+
* on a workqueue which doesn't have %WQ_MEM_RECLAIM as that can break forward-
3759+
* progress guarantee leading to a deadlock.
37583760
*/
37593761
static void check_flush_dependency(struct workqueue_struct *target_wq,
3760-
struct work_struct *target_work)
3762+
struct work_struct *target_work,
3763+
bool from_cancel)
37613764
{
3762-
work_func_t target_func = target_work ? target_work->func : NULL;
3765+
work_func_t target_func;
37633766
struct worker *worker;
37643767

3765-
if (target_wq->flags & WQ_MEM_RECLAIM)
3768+
if (from_cancel || target_wq->flags & WQ_MEM_RECLAIM)
37663769
return;
37673770

37683771
worker = current_wq_worker();
3772+
target_func = target_work ? target_work->func : NULL;
37693773

37703774
WARN_ONCE(current->flags & PF_MEMALLOC,
37713775
"workqueue: PF_MEMALLOC task %d(%s) is flushing !WQ_MEM_RECLAIM %s:%ps",
@@ -4032,7 +4036,7 @@ void __flush_workqueue(struct workqueue_struct *wq)
40324036
list_add_tail(&this_flusher.list, &wq->flusher_overflow);
40334037
}
40344038

4035-
check_flush_dependency(wq, NULL);
4039+
check_flush_dependency(wq, NULL, false);
40364040

40374041
mutex_unlock(&wq->mutex);
40384042

@@ -4209,7 +4213,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
42094213
}
42104214

42114215
wq = pwq->wq;
4212-
check_flush_dependency(wq, work);
4216+
check_flush_dependency(wq, work, from_cancel);
42134217

42144218
insert_wq_barrier(pwq, barr, work, worker);
42154219
raw_spin_unlock_irq(&pool->lock);

0 commit comments

Comments
 (0)