[PATCH OLK-5.10] rcu: shorten the critical section that rnp->lock protects in rcu_dump_cpu_stacks

20 May 2024

From: Zheng Zengkai <zhengzengkai@huawei.com>

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7MQJB

--------------------------------

Concurrent ltp stress testcases cause a hardlockup issue in KunPeng920:

------------[ cut here ]------------
[ 2301.316914] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2301.320386]  try_charge+0x2c8/0x600
[ 2301.325566] rcu:     69-...0: (23 ticks this GP) idle=fb2/1/0x4000000000000000 softirq=39368/39368 fqs=5591
[ 2301.335766]  (detected by 29, t=15006 jiffies, g=91585, q=3635521)
[ 2301.345458] Sending NMI from CPU 29 to CPUs 69:
------------[ cut here ]------------
[ 2379.033470] NMI watchdog: Watchdog detected hard LOCKUP on cpu 69
[ 2379.033523] CPU: 69 PID: 2143608 Comm: memcg_test_1 Kdump: loaded Tainted: G        W         5.10.0-sp2-lockdepdbg+ #45
[ 2379.033524] Hardware name: Huawei TaiShan 5280 V2/BC82AMDDA, BIOS 1.93 10/13/2022
[ 2379.033525] pstate: 00400089 (nzcv daIf +PAN -UAO -TCO BTYPE=--)
[ 2379.033525] pc : native_queued_spin_lock_slowpath+0x264/0x330
[ 2379.033526] lr : rcu_iw_handler+0xc4/0x130
[ 2379.033527] sp : ffff80001022be60
[ 2379.033528] x29: ffff80001022be60 x28: ffff2f6c7cc44d00
[ 2379.033529] x27: ffff2f6b81ee21a8 x26: ffff80001022c000
[ 2379.033530] x25: ffff800010228000 x24: ffffd913b32a6000
[ 2379.033532] x23: 0000000000000000 x22: ffffd913b3aa4000
[ 2379.033533] x21: ffffd913b32ba5f0 x20: ffff2f6b7ffa37c0
[ 2379.033534] x19: ffffd913b365c440 x18: 0000000000000060
[ 2379.033535] x17: 0000000000000000 x16: 0000000000000000
[ 2379.033537] x15: ffffffffffffffff x14: ffff8000bcb9b4f0
[ 2379.033538] x13: 00000000fffffffd x12: 0000000000000040
[ 2379.033539] x11: ffff2f63806bada0 x10: ffff2f63806bada2
[ 2379.033541] x9 : 0000000000000000 x8 : 0000000000000000
[ 2379.033542] x7 : ffff2f6b7ffa3740 x6 : ffffd913b2f67740
[ 2379.033543] x5 : ffff2f6b7ffa3740 x4 : 0000000001180101
[ 2379.033544] x3 : ffffd913b365c440 x2 : 0000000000000118
[ 2379.033545] x1 : 0000000001180000 x0 : 0000000000000000
[ 2379.033547] Call trace:
[ 2379.033547]  native_queued_spin_lock_slowpath+0x264/0x330
[ 2379.033548]  irq_work_single+0x38/0x9c
[ 2379.033548]  flush_smp_call_function_queue+0x144/0x26c
[ 2379.033549]  generic_smp_call_function_single_interrupt+0x1c/0x30
[ 2379.033550]  do_handle_IPI+0x84/0x2e4
[ 2379.033550]  ipi_handler+0x24/0x3c
[ 2379.033551]  handle_percpu_devid_fasteoi_ipi+0x84/0x14c
[ 2379.033552]  __handle_domain_irq+0x84/0xf0
[ 2379.033553]  gic_handle_irq+0x78/0x2c0
[ 2379.033553]  el1_irq+0xb8/0x140
[ 2379.033554]  dump_stack+0xe8/0x140
[ 2379.033554]  dump_header+0x50/0x19c
[ 2379.033555]  out_of_memory+0x338/0x380
[ 2379.033556]  mem_cgroup_out_of_memory+0x128/0x144
[ 2379.033557]  mem_cgroup_oom+0x188/0x250
[ 2379.033557]  try_charge+0x2c8/0x600
[ 2379.033558]  mem_cgroup_charge+0x128/0x424
[ 2379.033559]  wp_page_copy+0xc8/0xb40
[ 2379.033559]  do_wp_page+0x228/0x594
[ 2379.033560]  handle_pte_fault+0x1f8/0x21c
[ 2379.033561]  __handle_mm_fault+0x1b0/0x380
[ 2379.033561]  handle_mm_fault+0xf4/0x250
[ 2379.033562]  do_page_fault+0x188/0x454
[ 2379.033563]  do_mem_abort+0x48/0xb0
[ 2379.033563]  el0_da+0x44/0x80
[ 2379.033564]  el0_sync_handler+0x88/0xb4
[ 2379.033564]  el0_sync+0x160/0x180

cpu29                                   cpu69
rcu_dump_cpu_stacks() grab rnp->lock
  nmi_trigger_cpumask_backtrace()
    arm64_send_ipi()
                                        do_handle_IPI
                                          flush_smp_call_function_queue
                                            rcu_iw_handler spin rnp->lock
                                              deadlock
                                            nmi_cpu_backtrace
    wait for 10s or backtrace_mask clear

For arm64 w/o NMI-triggered stack traces, IPI backtrace feature is used,
while in rcu_dump_cpu_stacks(), raw_spin_lock_irqsave_rcu_node() will
grab the rcu_node->lock to protect the rcu_node data used in the
for_each_leaf_node_possible_cpu loop, while the process of backtrace
for the rcu stalled cpu may be longer than expected, causing potential
concurrent issue while someone contending for the same rcu_node->lock.

Like the call trace shown above, rcu_node->lock will not be released
until all the stalled cpus' backtrace finished in nmi_cpu_backtrace()
or 10s timeout in nmi_trigger_cpumask_backtrace(), if there are pending
IPI callbacks in the smp call_single_queue ahead of the ipi_cpu_backtrace
callback contending for the same rcu_node->lock, deadlock will be
inevitable.

To avoid such problems, shorten the critical section that rcu_node->lock
protects to avoid waiting for the backtrace process finish.

Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/rcu/tree_stall.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index e9d8b6dbde7c..35624190c6f2 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -334,10 +334,12 @@ static void rcu_dump_cpu_stacks(void)
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		for_each_leaf_node_possible_cpu(rnp, cpu)
 			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
+				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 				if (cpu_is_offline(cpu))
 					pr_err("Offline CPU %d blocking current GP.\n", cpu);
 				else
 					dump_cpu_task(cpu);
+				raw_spin_lock_irqsave_rcu_node(rnp, flags);
 			}
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
-- 
2.25.1

    

Wei Li

patchwork bot

tags

participants (2)