From: Zheng Zengkai zhengzengkai@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7MQJB
--------------------------------
Concurrent ltp stress testcases cause a hardlockup issue in KunPeng920:
------------[ cut here ]------------ [ 2301.316914] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 2301.320386] try_charge+0x2c8/0x600 [ 2301.325566] rcu: 69-...0: (23 ticks this GP) idle=fb2/1/0x4000000000000000 softirq=39368/39368 fqs=5591 [ 2301.335766] (detected by 29, t=15006 jiffies, g=91585, q=3635521) [ 2301.345458] Sending NMI from CPU 29 to CPUs 69: ------------[ cut here ]------------ [ 2379.033470] NMI watchdog: Watchdog detected hard LOCKUP on cpu 69 [ 2379.033523] CPU: 69 PID: 2143608 Comm: memcg_test_1 Kdump: loaded Tainted: G W 5.10.0-sp2-lockdepdbg+ #45 [ 2379.033524] Hardware name: Huawei TaiShan 5280 V2/BC82AMDDA, BIOS 1.93 10/13/2022 [ 2379.033525] pstate: 00400089 (nzcv daIf +PAN -UAO -TCO BTYPE=--) [ 2379.033525] pc : native_queued_spin_lock_slowpath+0x264/0x330 [ 2379.033526] lr : rcu_iw_handler+0xc4/0x130 [ 2379.033527] sp : ffff80001022be60 [ 2379.033528] x29: ffff80001022be60 x28: ffff2f6c7cc44d00 [ 2379.033529] x27: ffff2f6b81ee21a8 x26: ffff80001022c000 [ 2379.033530] x25: ffff800010228000 x24: ffffd913b32a6000 [ 2379.033532] x23: 0000000000000000 x22: ffffd913b3aa4000 [ 2379.033533] x21: ffffd913b32ba5f0 x20: ffff2f6b7ffa37c0 [ 2379.033534] x19: ffffd913b365c440 x18: 0000000000000060 [ 2379.033535] x17: 0000000000000000 x16: 0000000000000000 [ 2379.033537] x15: ffffffffffffffff x14: ffff8000bcb9b4f0 [ 2379.033538] x13: 00000000fffffffd x12: 0000000000000040 [ 2379.033539] x11: ffff2f63806bada0 x10: ffff2f63806bada2 [ 2379.033541] x9 : 0000000000000000 x8 : 0000000000000000 [ 2379.033542] x7 : ffff2f6b7ffa3740 x6 : ffffd913b2f67740 [ 2379.033543] x5 : ffff2f6b7ffa3740 x4 : 0000000001180101 [ 2379.033544] x3 : ffffd913b365c440 x2 : 0000000000000118 [ 2379.033545] x1 : 0000000001180000 x0 : 0000000000000000 [ 2379.033547] Call trace: [ 2379.033547] native_queued_spin_lock_slowpath+0x264/0x330 [ 2379.033548] irq_work_single+0x38/0x9c [ 2379.033548] flush_smp_call_function_queue+0x144/0x26c [ 2379.033549] generic_smp_call_function_single_interrupt+0x1c/0x30 [ 2379.033550] do_handle_IPI+0x84/0x2e4 [ 2379.033550] ipi_handler+0x24/0x3c [ 2379.033551] handle_percpu_devid_fasteoi_ipi+0x84/0x14c [ 2379.033552] __handle_domain_irq+0x84/0xf0 [ 2379.033553] gic_handle_irq+0x78/0x2c0 [ 2379.033553] el1_irq+0xb8/0x140 [ 2379.033554] dump_stack+0xe8/0x140 [ 2379.033554] dump_header+0x50/0x19c [ 2379.033555] out_of_memory+0x338/0x380 [ 2379.033556] mem_cgroup_out_of_memory+0x128/0x144 [ 2379.033557] mem_cgroup_oom+0x188/0x250 [ 2379.033557] try_charge+0x2c8/0x600 [ 2379.033558] mem_cgroup_charge+0x128/0x424 [ 2379.033559] wp_page_copy+0xc8/0xb40 [ 2379.033559] do_wp_page+0x228/0x594 [ 2379.033560] handle_pte_fault+0x1f8/0x21c [ 2379.033561] __handle_mm_fault+0x1b0/0x380 [ 2379.033561] handle_mm_fault+0xf4/0x250 [ 2379.033562] do_page_fault+0x188/0x454 [ 2379.033563] do_mem_abort+0x48/0xb0 [ 2379.033563] el0_da+0x44/0x80 [ 2379.033564] el0_sync_handler+0x88/0xb4 [ 2379.033564] el0_sync+0x160/0x180
cpu29 cpu69 rcu_dump_cpu_stacks() grab rnp->lock nmi_trigger_cpumask_backtrace() arm64_send_ipi() do_handle_IPI flush_smp_call_function_queue rcu_iw_handler spin rnp->lock deadlock nmi_cpu_backtrace wait for 10s or backtrace_mask clear
For arm64 w/o NMI-triggered stack traces, IPI backtrace feature is used, while in rcu_dump_cpu_stacks(), raw_spin_lock_irqsave_rcu_node() will grab the rcu_node->lock to protect the rcu_node data used in the for_each_leaf_node_possible_cpu loop, while the process of backtrace for the rcu stalled cpu may be longer than expected, causing potential concurrent issue while someone contending for the same rcu_node->lock.
Like the call trace shown above, rcu_node->lock will not be released until all the stalled cpus' backtrace finished in nmi_cpu_backtrace() or 10s timeout in nmi_trigger_cpumask_backtrace(), if there are pending IPI callbacks in the smp call_single_queue ahead of the ipi_cpu_backtrace callback contending for the same rcu_node->lock, deadlock will be inevitable.
To avoid such problems, shorten the critical section that rcu_node->lock protects to avoid waiting for the backtrace process finish.
Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Signed-off-by: Wei Li liwei391@huawei.com --- kernel/rcu/tree_stall.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index e9d8b6dbde7c..35624190c6f2 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -334,10 +334,12 @@ static void rcu_dump_cpu_stacks(void) raw_spin_lock_irqsave_rcu_node(rnp, flags); for_each_leaf_node_possible_cpu(rnp, cpu) if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) { + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); if (cpu_is_offline(cpu)) pr_err("Offline CPU %d blocking current GP.\n", cpu); else dump_cpu_task(cpu); + raw_spin_lock_irqsave_rcu_node(rnp, flags); } raw_spin_unlock_irqrestore_rcu_node(rnp, flags); }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/7534 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/U...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/7534 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/U...