From: Xiongfeng Wang wangxiongfeng2@huawei.com
hulk inclusion category: bugfix bugzilla: 30212 CVE: NA
---------------------------
When we print system information by echo 't' into 'sysrq-trigger' on several cores at the same time, we got the following calltrace.
[ 1352.854632] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6 [ 1352.854633] Modules linked in: nf_log_arp nf_log_ipv6 nf_log_ipv4 nf_log_common binfmt_misc salsa20_generic camellia_generic cast6_generic cast_common rfkill serpent_generic twofish_generic twofish_common xts lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 sha512_generic loop jprob(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat hns_roce_hw_v2 hns_roce ib_core aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ipmi_ssif ofpart sha256_arm64 sha1_ce cmdlinepart [ 1352.854649] hi_sfc ses enclosure mtd sg sbsa_gwdt ipmi_si ipmi_devintf ipmi_msghandler spi_dw_mmio sch_fq_codel ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod realtek hclge hisi_sas_v3_hw hisi_sas_main ahci libsas libahci hns3 hinic libata usb_storage hnae3 megaraid_sas scsi_transport_sas i2c_designware_platform i2c_designware_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_vs] [ 1352.854658] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1 [ 1352.854659] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019 [ 1352.854659] pstate: 80400089 (Nzcv daIf +PAN -UAO) [ 1352.854660] pc : queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854660] lr : print_cpu+0x414/0x690 [ 1352.854660] sp : ffff0001743afb80 [ 1352.854661] x29: ffff0001743afb80 x28: ffff805fcef6e880 [ 1352.854662] x27: 0000000000000000 x26: 0000000000000000 [ 1352.854662] x25: ffff000008cab000 x24: ffff000008cab000 [ 1352.854663] x23: 0000000000000000 x22: 0000000000000000 [ 1352.854664] x21: ffff000009478000 x20: 0000000000900001 [ 1352.854664] x19: ffff000009478d20 x18: ffffffffffffffff [ 1352.854665] x17: 0000000000000000 x16: 0000000000000000 [ 1352.854666] x15: ffff000009273708 x14: ffff00000947af60 [ 1352.854667] x13: ffff00000947abab x12: ffff00000929d000 [ 1352.854668] x11: 0000000000006fc8 x10: ffff00000947a1c0 [ 1352.854668] x9 : 0000000000000001 x8 : 0000000000000000 [ 1352.854669] x7 : ffff0000092737c8 x6 : ffff803fffc9e1c0 [ 1352.854670] x5 : 0000000000000000 x4 : ffff803fffc9e1c0 [ 1352.854671] x3 : ffff000008f5e000 x2 : 00000000001c0000 [ 1352.854671] x1 : 0000000000000000 x0 : ffff803fffc9e1c8 [ 1352.854672] Call trace: [ 1352.854673] queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854673] print_cpu+0x414/0x690 [ 1352.854673] sysrq_sched_debug_show+0x50/0x80 [ 1352.854674] show_state_filter+0xc0/0xd0 [ 1352.854674] sysrq_handle_showstate+0x18/0x28 [ 1352.854674] __handle_sysrq+0xa0/0x190 [ 1352.854675] write_sysrq_trigger+0x70/0x88 [ 1352.854675] proc_reg_write+0x80/0xd8 [ 1352.854675] __vfs_write+0x60/0x190 [ 1352.854676] vfs_write+0xac/0x1c0 [ 1352.854676] ksys_write+0x74/0xf0 [ 1352.854676] __arm64_sys_write+0x24/0x30 [ 1352.854677] el0_svc_common+0x78/0x130 [ 1352.854677] el0_svc_handler+0x38/0x78 [ 1352.854677] el0_svc+0x8/0xc [ 1352.854678] Kernel panic - not syncing: Hard LOCKUP [ 1352.854679] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1 [ 1352.854679] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019 [ 1352.854679] Call trace: [ 1352.854680] dump_backtrace+0x0/0x198 [ 1352.854680] show_stack+0x24/0x30 [ 1352.854681] dump_stack+0xa4/0xc4 [ 1352.854681] panic+0x130/0x304 [ 1352.854681] __stack_chk_fail+0x0/0x28 [ 1352.854682] watchdog_hardlockup_check+0x138/0x140 [ 1352.854682] sdei_watchdog_callback+0x20/0x30 [ 1352.854682] sdei_event_handler+0x50/0xf0 [ 1352.854683] __sdei_handler+0xd8/0x228 [ 1352.854683] __sdei_asm_handler+0xbc/0x134 [ 1352.854683] queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854684] print_cpu+0x414/0x690 [ 1352.854684] sysrq_sched_debug_show+0x50/0x80 [ 1352.854684] show_state_filter+0xc0/0xd0 [ 1352.854685] sysrq_handle_showstate+0x18/0x28 [ 1352.854685] __handle_sysrq+0xa0/0x190 [ 1352.854685] write_sysrq_trigger+0x70/0x88 [ 1352.854686] proc_reg_write+0x80/0xd8 [ 1352.854686] __vfs_write+0x60/0x190 [ 1352.854686] vfs_write+0xac/0x1c0 [ 1352.854687] ksys_write+0x74/0xf0 [ 1352.854687] __arm64_sys_write+0x24/0x30 [ 1352.854687] el0_svc_common+0x78/0x130 [ 1352.854688] el0_svc_handler+0x38/0x78 [ 1352.854688] el0_svc+0x8/0xc
It is because there are many processes in the system. 'print_cpu()' aquires 'sched_debug_lock', print some information, and releases 'sched_debug_lock'. This procedure takes about 4 seconds in our testcase. When four cores concurrently print system info by sysrq, it will takes the last core 12 seconds to get the spinlock. This will cause a hardlockup.
Signed-off-by: Kai Shen shenkai8@huawei.com Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-By: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/tty/sysrq.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index aa2e394..72a8c70 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -1091,6 +1091,9 @@ int unregister_sysrq_key(int key, struct sysrq_key_op *op_p) EXPORT_SYMBOL(unregister_sysrq_key);
#ifdef CONFIG_PROC_FS + +static DEFINE_MUTEX(sysrq_mutex); + /* * writing 'C' to /proc/sysrq-trigger is like sysrq-C */ @@ -1102,7 +1105,10 @@ static ssize_t write_sysrq_trigger(struct file *file, const char __user *buf,
if (get_user(c, buf)) return -EFAULT; + + mutex_lock(&sysrq_mutex); __handle_sysrq(c, false); + mutex_unlock(&sysrq_mutex); }
return count;