Skip to content

Commit 04fa877

Browse files
committed
smp: Reduce NMI traffic from CSD waiters to CSD destination
JIRA: https://issues.redhat.com/browse/RHEL-16867 commit 0d3a00b Author: Imran Khan <imran.f.khan@oracle.com> Date: Tue, 9 May 2023 08:31:24 +1000 smp: Reduce NMI traffic from CSD waiters to CSD destination On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang, then all of these waiting CPUs send an NMI to the destination CPU in order to dump its backtrace. Given enough NMIs, the destination CPU will spent much of its time producing backtraces, thus further delaying that CPU's response to the original CSD IPI. In the worst case, by the time destination CPU is done producing all of these backtrace NMIs, the CSD wait timeout will have elapsed so that the waiters resend their backtrace NMIs again, further delaying forward progress. Therefore, to avoid these delays, issue the backtrace NMI only from the first waiter. The destination CPU's other waiters can make use of backtrace obtained from the first waiter's NMI. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>
1 parent 65a0760 commit 04fa877

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

kernel/smp.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
4646

4747
static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
4848

49+
static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
50+
4951
static void __flush_smp_call_function_queue(bool warn_cpu_offline);
5052

5153
int smpcfd_prepare_cpu(unsigned int cpu)
@@ -248,7 +250,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
248250
*bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
249251
}
250252
if (cpu >= 0) {
251-
dump_cpu_task(cpu);
253+
if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
254+
dump_cpu_task(cpu);
252255
if (!cpu_cur_csd) {
253256
pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
254257
arch_send_call_function_single_ipi(cpu);
@@ -429,9 +432,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
429432
struct llist_node *entry, *prev;
430433
struct llist_head *head;
431434
static bool warned;
435+
atomic_t *tbt;
432436

433437
lockdep_assert_irqs_disabled();
434438

439+
/* Allow waiters to send backtrace NMI from here onwards */
440+
tbt = this_cpu_ptr(&trigger_backtrace);
441+
atomic_set_release(tbt, 1);
442+
435443
head = this_cpu_ptr(&call_single_queue);
436444
entry = llist_del_all(head);
437445
entry = llist_reverse_order(entry);

0 commit comments

Comments
 (0)