Xiongfeng Wang (12): firmware: arm_sdei: add interrupt binding api firmware: arm_sdei: make 'sdei_api_event_disable/enable' public lockup_detector: init lockup detector after all the init_calls watchdog: add nmi_watchdog support for arm64 based on SDEI sdei_watchdog: clear EOI of the secure timer before kdump sdei_watchdog: set secure timer period base on 'watchdog_thresh' sdei_watchdog: avoid possible false hardlockup init: only move down lockup_detector_init() when sdei_watchdog is enabled kprobes/arm64: Blacklist sdei watchdog callback functions openeuler_defconfig: Enable SDEI Watchdog stop_machine: mask sdei before running the callback arm64: kexec: only clear EOI for SDEI in NMI context
arch/arm64/configs/openeuler_defconfig | 3 +- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/machine_kexec.c | 11 ++ arch/arm64/kernel/watchdog_sdei.c | 151 +++++++++++++++++++++++++ drivers/firmware/arm_sdei.c | 26 ++++- include/linux/arm_sdei.h | 5 + include/linux/nmi.h | 8 ++ include/uapi/linux/arm_sdei.h | 2 + init/main.c | 7 +- kernel/stop_machine.c | 10 ++ kernel/watchdog.c | 3 + lib/Kconfig.debug | 9 ++ 12 files changed, 232 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/kernel/watchdog_sdei.c
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
This patch add a interrupt binding api function which returns the binded event number.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/firmware/arm_sdei.c | 10 ++++++++++ include/linux/arm_sdei.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c index 285fe7ad490d..9171a9d05140 100644 --- a/drivers/firmware/arm_sdei.c +++ b/drivers/firmware/arm_sdei.c @@ -188,6 +188,16 @@ int sdei_api_event_context(u32 query, u64 *result) } NOKPROBE_SYMBOL(sdei_api_event_context);
+int sdei_api_event_interrupt_bind(int hwirq) +{ + u64 event_number; + + invoke_sdei_fn(SDEI_1_0_FN_SDEI_INTERRUPT_BIND, hwirq, 0, 0, 0, 0, + &event_number); + + return (int)event_number; +} + static int sdei_api_event_get_info(u32 event, u32 info, u64 *result) { return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_GET_INFO, event, info, 0, diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index 255701e1251b..bf92dc48fbea 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -36,6 +36,7 @@ int sdei_event_unregister(u32 event_num);
int sdei_event_enable(u32 event_num); int sdei_event_disable(u32 event_num); +int sdei_api_event_interrupt_bind(int hwirq);
/* GHES register/unregister helpers */ int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *normal_cb,
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
NMI Watchdog need to enable the event for each core individually. But the existing public api 'sdei_event_enable' enable events for all cores when the event type is private.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com --- drivers/firmware/arm_sdei.c | 4 ++-- include/linux/arm_sdei.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c index 9171a9d05140..36600024736d 100644 --- a/drivers/firmware/arm_sdei.c +++ b/drivers/firmware/arm_sdei.c @@ -389,7 +389,7 @@ static int sdei_platform_reset(void) return err; }
-static int sdei_api_event_enable(u32 event_num) +int sdei_api_event_enable(u32 event_num) { return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_ENABLE, event_num, 0, 0, 0, 0, NULL); @@ -436,7 +436,7 @@ int sdei_event_enable(u32 event_num) return err; }
-static int sdei_api_event_disable(u32 event_num) +int sdei_api_event_disable(u32 event_num) { return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_DISABLE, event_num, 0, 0, 0, 0, NULL); diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index bf92dc48fbea..f5f6ba7a1d50 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -37,6 +37,8 @@ int sdei_event_unregister(u32 event_num); int sdei_event_enable(u32 event_num); int sdei_event_disable(u32 event_num); int sdei_api_event_interrupt_bind(int hwirq); +int sdei_api_event_disable(u32 event_num); +int sdei_api_event_enable(u32 event_num);
/* GHES register/unregister helpers */ int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *normal_cb,
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
We call 'sdei_init' as 'subsys_initcall_sync'. lockup detector need to be initialised after sdei_init. The influence of this patch is that we can not detect the hard lockup in init_calls.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com --- init/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/init/main.c b/init/main.c index 436d73261810..db7800605428 100644 --- a/init/main.c +++ b/init/main.c @@ -1535,7 +1535,6 @@ static noinline void __init kernel_init_freeable(void)
rcu_init_tasks_generic(); do_pre_smp_initcalls(); - lockup_detector_init();
smp_init(); sched_init_smp(); @@ -1546,6 +1545,8 @@ static noinline void __init kernel_init_freeable(void)
do_basic_setup();
+ lockup_detector_init(); + kunit_run_all_tests();
wait_for_initramfs();
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
Add nmi_watchdog support for arm64 based on SDEI.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com
Conflicts: lib/Kconfig.debug arch/arm64/kernel/watchdog_sdei.c Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/watchdog_sdei.c | 112 ++++++++++++++++++++++++++++++ lib/Kconfig.debug | 9 +++ 3 files changed, 122 insertions(+) create mode 100644 arch/arm64/kernel/watchdog_sdei.c
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index d95b3d6b471a..d48aa807dcce 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -68,6 +68,7 @@ arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_CRASH_CORE) += crash_core.o obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o +obj-$(CONFIG_SDEI_WATCHDOG) += watchdog_sdei.o obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o obj-$(CONFIG_ARM64_MTE) += mte.o obj-y += vdso-wrap.o diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c new file mode 100644 index 000000000000..8f9eb838b969 --- /dev/null +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -0,0 +1,112 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Detect hard lockups on a system + * + * Note: Most of this code is borrowed heavily from the perf hardlockup + * detector, so thanks to Don for the initial implementation. + */ + +#define pr_fmt(fmt) "SDEI NMI watchdog: " fmt + +#include <asm/irq_regs.h> +#include <asm/kvm_hyp.h> +#include <asm/smp_plat.h> +#include <asm/sdei.h> +#include <asm/virt.h> +#include <linux/arm_sdei.h> +#include <linux/nmi.h> + +/* We use the secure physical timer as SDEI NMI watchdog timer */ +#define SDEI_NMI_WATCHDOG_HWIRQ 29 + +static int sdei_watchdog_event_num; +static bool disable_sdei_nmi_watchdog; +static bool sdei_watchdog_registered; + +void watchdog_hardlockup_enable(unsigned int cpu) +{ + int ret; + + if (!sdei_watchdog_registered) + return; + + /* Skip the first hardlockup check incase BIOS didn't init the + * secure timer correctly */ + watchdog_hardlockup_touch_cpu(cpu); + ret = sdei_api_event_enable(sdei_watchdog_event_num); + if (ret) { + pr_err("Enable NMI Watchdog failed on cpu%d\n", + smp_processor_id()); + } +} + +void watchdog_hardlockup_disable(unsigned int cpu) +{ + int ret; + + if (!sdei_watchdog_registered) + return; + + ret = sdei_api_event_disable(sdei_watchdog_event_num); + if (ret) + pr_err("Disable NMI Watchdog failed on cpu%d\n", + smp_processor_id()); +} + +static int sdei_watchdog_callback(u32 event, + struct pt_regs *regs, void *arg) +{ + watchdog_hardlockup_check(smp_processor_id(), regs); + + return 0; +} + +static void sdei_nmi_watchdog_bind(void *data) +{ + int ret; + + ret = sdei_api_event_interrupt_bind(SDEI_NMI_WATCHDOG_HWIRQ); + if (ret < 0) + pr_err("SDEI bind failed on cpu%d, return %d\n", + smp_processor_id(), ret); +} + +static int __init disable_sdei_nmi_watchdog_setup(char *str) +{ + disable_sdei_nmi_watchdog = true; + return 1; +} +__setup("disable_sdei_nmi_watchdog", disable_sdei_nmi_watchdog_setup); + +int __init watchdog_hardlockup_probe(void) +{ + int ret; + + if (disable_sdei_nmi_watchdog) + return -EINVAL; + + if (!is_hyp_mode_available()) { + pr_err("Disable SDEI NMI Watchdog in VM\n"); + return -EINVAL; + } + + sdei_watchdog_event_num = sdei_api_event_interrupt_bind(SDEI_NMI_WATCHDOG_HWIRQ); + if (sdei_watchdog_event_num < 0) { + pr_err("Bind interrupt failed. Firmware may not support SDEI !\n"); + return sdei_watchdog_event_num; + } + + on_each_cpu(sdei_nmi_watchdog_bind, NULL, true); + + ret = sdei_event_register(sdei_watchdog_event_num, + sdei_watchdog_callback, NULL); + if (ret) { + pr_err("SDEI Watchdog register callback failed\n"); + return ret; + } + + sdei_watchdog_registered = true; + pr_info("SDEI Watchdog registered successfully\n"); + + return 0; +} diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index fa307f93fa2e..cee4d3f75820 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1045,6 +1045,12 @@ config HAVE_HARDLOCKUP_DETECTOR_BUDDY depends on SMP default y
+config SDEI_WATCHDOG + bool "SDEI NMI Watchdog support" + depends on ARM_SDE_INTERFACE + depends on HARDLOCKUP_DETECTOR + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + # # Global switch whether to build a hardlockup detector at all. It is available # only when the architecture supports at least one implementation. There are @@ -1061,6 +1067,7 @@ config HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH imply HARDLOCKUP_DETECTOR_PERF imply HARDLOCKUP_DETECTOR_BUDDY + imply SDEI_WATCHDOG imply HARDLOCKUP_DETECTOR_ARCH select LOCKUP_DETECTOR
@@ -1097,6 +1104,7 @@ config HARDLOCKUP_DETECTOR_PERF depends on HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on !SDEI_WATCHDOG select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
config HARDLOCKUP_DETECTOR_BUDDY @@ -1105,6 +1113,7 @@ config HARDLOCKUP_DETECTOR_BUDDY depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on !SDEI_WATCHDOG select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
config HARDLOCKUP_DETECTOR_ARCH
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
When we panic in hardlockup, the secure timer interrupt remains activate because firmware clear eoi after dispatch is completed. This will cause arm_arch_timer interrupt failed to trigger in the second kernel.
This patch add a new SMC helper to clear eoi of a certain interrupt and clear eoi of the secure timer before booting the second kernel.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com
Conflicts: include/linux/nmi.h Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com --- arch/arm64/kernel/machine_kexec.c | 10 ++++++++++ arch/arm64/kernel/watchdog_sdei.c | 6 ++++++ drivers/firmware/arm_sdei.c | 6 ++++++ include/linux/arm_sdei.h | 1 + include/linux/nmi.h | 6 ++++++ include/uapi/linux/arm_sdei.h | 1 + 6 files changed, 30 insertions(+)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 078910db77a4..cfa6b0dafc88 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -10,6 +10,7 @@ #include <linux/irq.h> #include <linux/kernel.h> #include <linux/kexec.h> +#include <linux/nmi.h> #include <linux/page-flags.h> #include <linux/reboot.h> #include <linux/set_memory.h> @@ -262,6 +263,15 @@ void machine_crash_shutdown(struct pt_regs *regs) /* shutdown non-crashing cpus */ crash_smp_send_stop();
+ /* + * when we panic in hardlockup detected by sdei_watchdog, the secure + * timer interrupt remains activate here because firmware clear eoi + * after dispatch is completed. This will cause arm_arch_timer + * interrupt failed to trigger in the second kernel. So we clear eoi + * of the secure timer before booting the second kernel. + */ + sdei_watchdog_clear_eoi(); + /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index 8f9eb838b969..7ebf6b5ab237 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -78,6 +78,12 @@ static int __init disable_sdei_nmi_watchdog_setup(char *str) } __setup("disable_sdei_nmi_watchdog", disable_sdei_nmi_watchdog_setup);
+void sdei_watchdog_clear_eoi(void) +{ + if (sdei_watchdog_registered) + sdei_api_clear_eoi(SDEI_NMI_WATCHDOG_HWIRQ); +} + int __init watchdog_hardlockup_probe(void) { int ret; diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c index 36600024736d..5e229d3eb552 100644 --- a/drivers/firmware/arm_sdei.c +++ b/drivers/firmware/arm_sdei.c @@ -198,6 +198,12 @@ int sdei_api_event_interrupt_bind(int hwirq) return (int)event_number; }
+int sdei_api_clear_eoi(int hwirq) +{ + return invoke_sdei_fn(SDEI_1_0_FN_SDEI_CLEAR_EOI, hwirq, 0, 0, 0, 0, + NULL); +} + static int sdei_api_event_get_info(u32 event, u32 info, u64 *result) { return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_GET_INFO, event, info, 0, diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index f5f6ba7a1d50..6381537e7015 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -39,6 +39,7 @@ int sdei_event_disable(u32 event_num); int sdei_api_event_interrupt_bind(int hwirq); int sdei_api_event_disable(u32 event_num); int sdei_api_event_enable(u32 event_num); +int sdei_api_clear_eoi(int hwirq);
/* GHES register/unregister helpers */ int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *normal_cb, diff --git a/include/linux/nmi.h b/include/linux/nmi.h index e92e378df000..404c78e04a05 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -235,4 +235,10 @@ static inline void nmi_backtrace_stall_snap(const struct cpumask *btp) {} static inline void nmi_backtrace_stall_check(const struct cpumask *btp) {} #endif
+#ifdef CONFIG_SDEI_WATCHDOG +void sdei_watchdog_clear_eoi(void); +#else +static inline void sdei_watchdog_clear_eoi(void) { } +#endif + #endif diff --git a/include/uapi/linux/arm_sdei.h b/include/uapi/linux/arm_sdei.h index af0630ba5437..1187b1b49c87 100644 --- a/include/uapi/linux/arm_sdei.h +++ b/include/uapi/linux/arm_sdei.h @@ -24,6 +24,7 @@ #define SDEI_1_0_FN_SDEI_INTERRUPT_RELEASE SDEI_1_0_FN(0x0E) #define SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI_1_0_FN(0x11) #define SDEI_1_0_FN_SDEI_SHARED_RESET SDEI_1_0_FN(0x12) +#define SDEI_1_0_FN_SDEI_CLEAR_EOI SDEI_1_0_FN(0x18)
#define SDEI_VERSION_MAJOR_SHIFT 48 #define SDEI_VERSION_MAJOR_MASK 0x7fff
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
The period of the secure timer is set to 3s by BIOS. That means the secure timer interrupt will trigger every 3 seconds. To further decrease the NMI watchdog's effect on performance, this patch set the period of the secure timer base on 'watchdog_thresh'. This variable is initiallized to 10s. We can also set the period at runtime by modifying '/proc/sys/kernel/watchdog_thresh'
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com
Conflicts: arch/arm64/kernel/watchdog_sdei.c (context conflict) Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com --- arch/arm64/kernel/watchdog_sdei.c | 13 +++++++++++++ drivers/firmware/arm_sdei.c | 6 ++++++ include/linux/arm_sdei.h | 1 + include/uapi/linux/arm_sdei.h | 1 + 4 files changed, 21 insertions(+)
diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index 7ebf6b5ab237..758e20eadc31 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -33,6 +33,8 @@ void watchdog_hardlockup_enable(unsigned int cpu) /* Skip the first hardlockup check incase BIOS didn't init the * secure timer correctly */ watchdog_hardlockup_touch_cpu(cpu); + sdei_api_set_secure_timer_period(watchdog_thresh); + ret = sdei_api_event_enable(sdei_watchdog_event_num); if (ret) { pr_err("Enable NMI Watchdog failed on cpu%d\n", @@ -102,6 +104,17 @@ int __init watchdog_hardlockup_probe(void) return sdei_watchdog_event_num; }
+ /* + * After we introduced 'sdei_api_set_secure_timer_period', we disselect + * 'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP'. So we need to make sure that + * firmware can set the period of the secure timer and the timer + * interrupt doesn't trigger too soon. + */ + if (sdei_api_set_secure_timer_period(watchdog_thresh)) { + pr_err("Firmware doesn't support setting the secure timer period, please update your BIOS !\n"); + return -EINVAL; + } + on_each_cpu(sdei_nmi_watchdog_bind, NULL, true);
ret = sdei_event_register(sdei_watchdog_event_num, diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c index 5e229d3eb552..0f7ef69071c0 100644 --- a/drivers/firmware/arm_sdei.c +++ b/drivers/firmware/arm_sdei.c @@ -204,6 +204,12 @@ int sdei_api_clear_eoi(int hwirq) NULL); }
+int sdei_api_set_secure_timer_period(int sec) +{ + return invoke_sdei_fn(SDEI_1_0_FN_SET_SECURE_TIMER_PERIOD, sec, 0, 0, 0, + 0, NULL); +} + static int sdei_api_event_get_info(u32 event, u32 info, u64 *result) { return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_GET_INFO, event, info, 0, diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index 6381537e7015..28e247dd5773 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -40,6 +40,7 @@ int sdei_api_event_interrupt_bind(int hwirq); int sdei_api_event_disable(u32 event_num); int sdei_api_event_enable(u32 event_num); int sdei_api_clear_eoi(int hwirq); +int sdei_api_set_secure_timer_period(int sec);
/* GHES register/unregister helpers */ int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *normal_cb, diff --git a/include/uapi/linux/arm_sdei.h b/include/uapi/linux/arm_sdei.h index 1187b1b49c87..a5375679dd50 100644 --- a/include/uapi/linux/arm_sdei.h +++ b/include/uapi/linux/arm_sdei.h @@ -25,6 +25,7 @@ #define SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI_1_0_FN(0x11) #define SDEI_1_0_FN_SDEI_SHARED_RESET SDEI_1_0_FN(0x12) #define SDEI_1_0_FN_SDEI_CLEAR_EOI SDEI_1_0_FN(0x18) +#define SDEI_1_0_FN_SET_SECURE_TIMER_PERIOD SDEI_1_0_FN(0x19)
#define SDEI_VERSION_MAJOR_SHIFT 48 #define SDEI_VERSION_MAJOR_MASK 0x7fff
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
Firmware may not trigger SDEI event as required frequency. SDEI event may be triggered too soon, which cause false hardlockup in kernel. Check the time stamp in sdei_watchdog_callbak and skip the hardlockup check if it is invoked too soon.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com
Conflicts: arch/arm64/kernel/watchdog_sdei.c (context conflict) Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com --- arch/arm64/kernel/watchdog_sdei.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index 758e20eadc31..4a143a598eef 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -22,6 +22,7 @@ static int sdei_watchdog_event_num; static bool disable_sdei_nmi_watchdog; static bool sdei_watchdog_registered; +static DEFINE_PER_CPU(ktime_t, last_check_time);
void watchdog_hardlockup_enable(unsigned int cpu) { @@ -34,6 +35,7 @@ void watchdog_hardlockup_enable(unsigned int cpu) * secure timer correctly */ watchdog_hardlockup_touch_cpu(cpu); sdei_api_set_secure_timer_period(watchdog_thresh); + __this_cpu_write(last_check_time, ktime_get_mono_fast_ns());
ret = sdei_api_event_enable(sdei_watchdog_event_num); if (ret) { @@ -58,6 +60,22 @@ void watchdog_hardlockup_disable(unsigned int cpu) static int sdei_watchdog_callback(u32 event, struct pt_regs *regs, void *arg) { + ktime_t delta, now = ktime_get_mono_fast_ns(); + + delta = now - __this_cpu_read(last_check_time); + __this_cpu_write(last_check_time, now); + + /* + * Set delta to 4/5 of the actual watchdog threshold period so the + * hrtimer is guaranteed to fire at least once within the real + * watchdog threshold. + */ + if (delta < watchdog_thresh * (u64)NSEC_PER_SEC * 4 / 5) { + pr_err(FW_BUG "SDEI Watchdog event triggered too soon, " + "time to last check:%lld ns\n", delta); + return 0; + } + watchdog_hardlockup_check(smp_processor_id(), regs);
return 0;
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
When I enable CONFIG_DEBUG_PREEMPT and CONFIG_PREEMPT on X86, I got the following Call Trace:
[ 3.341853] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 3.344392] caller is debug_smp_processor_id+0x17/0x20 [ 3.344395] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #398 [ 3.344397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 3.344399] Call Trace: [ 3.344410] dump_stack+0x60/0x76 [ 3.344412] check_preemption_disabled+0xba/0xc0 [ 3.344415] debug_smp_processor_id+0x17/0x20 [ 3.344422] hardlockup_detector_event_create+0xf/0x60 [ 3.344427] hardlockup_detector_perf_init+0xf/0x41 [ 3.344430] watchdog_nmi_probe+0xe/0x10 [ 3.344432] lockup_detector_init+0x22/0x5b [ 3.344437] kernel_init_freeable+0x20c/0x245 [ 3.344439] ? rest_init+0xd0/0xd0 [ 3.344441] kernel_init+0xe/0x110 [ 3.344446] ret_from_fork+0x22/0x30
It is because sched_init_smp() set 'current->nr_cpus_allowed' to possible cpu number, and check_preemption_disabled() failed. This issue is introduced by commit a79050434b45, which move down lockup_detector_init() after do_basic_setup(). Fix it by moving lockup_detector_init() to its origin place when sdei_watchdog is disabled. There is no problem when sdei_watchdog is enabled because watchdog_nmi_probe() is overridden in 'arch/arm64/kernel/watchdog_sdei.c' in this case.
Fixes: a79050434b45 ("lockup_detector: init lockup detector after all the init_calls") Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm64/kernel/watchdog_sdei.c | 2 +- include/linux/nmi.h | 2 ++ init/main.c | 6 +++++- 3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index 4a143a598eef..155f36e24699 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -20,7 +20,7 @@ #define SDEI_NMI_WATCHDOG_HWIRQ 29
static int sdei_watchdog_event_num; -static bool disable_sdei_nmi_watchdog; +bool disable_sdei_nmi_watchdog; static bool sdei_watchdog_registered; static DEFINE_PER_CPU(ktime_t, last_check_time);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 404c78e04a05..7bd446acad24 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -237,8 +237,10 @@ static inline void nmi_backtrace_stall_check(const struct cpumask *btp) {}
#ifdef CONFIG_SDEI_WATCHDOG void sdei_watchdog_clear_eoi(void); +extern bool disable_sdei_nmi_watchdog; #else static inline void sdei_watchdog_clear_eoi(void) { } +#define disable_sdei_nmi_watchdog 1 #endif
#endif diff --git a/init/main.c b/init/main.c index db7800605428..6f16041b53a2 100644 --- a/init/main.c +++ b/init/main.c @@ -1535,6 +1535,8 @@ static noinline void __init kernel_init_freeable(void)
rcu_init_tasks_generic(); do_pre_smp_initcalls(); + if (disable_sdei_nmi_watchdog) + lockup_detector_init();
smp_init(); sched_init_smp(); @@ -1545,7 +1547,9 @@ static noinline void __init kernel_init_freeable(void)
do_basic_setup();
- lockup_detector_init(); + /* sdei_watchdog needs to be initialized after sdei_init */ + if (!disable_sdei_nmi_watchdog) + lockup_detector_init();
kunit_run_all_tests();
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
Functions called in sdei_handler are not allowed to be kprobed, so marked them as NOKPROBE_SYMBOL. There are so many functions in 'watchdog_check_timestamp()'. Luckily, we don't need 'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP' now. So just make CONFIG_SDEI_WATCHDOG depends on !CONFIG_HARDLOCKUP_CHECK_TIMESTAMP in case someone add 'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP' in the future.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Hanjun Guo guohanjun@huawei.com
Conflicts: kernel/watchdog.c kernel/watchdog_hld.c Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com --- arch/arm64/kernel/watchdog_sdei.c | 2 ++ kernel/watchdog.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index 155f36e24699..6f43496de56e 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -14,6 +14,7 @@ #include <asm/sdei.h> #include <asm/virt.h> #include <linux/arm_sdei.h> +#include <linux/kprobes.h> #include <linux/nmi.h>
/* We use the secure physical timer as SDEI NMI watchdog timer */ @@ -80,6 +81,7 @@ static int sdei_watchdog_callback(u32 event,
return 0; } +NOKPROBE_SYMBOL(sdei_watchdog_callback);
static void sdei_nmi_watchdog_bind(void *data) { diff --git a/kernel/watchdog.c b/kernel/watchdog.c index d145305d95fe..1795d767e620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -16,6 +16,7 @@ #include <linux/cpu.h> #include <linux/nmi.h> #include <linux/init.h> +#include <linux/kprobes.h> #include <linux/module.h> #include <linux/sysctl.h> #include <linux/tick.h> @@ -127,6 +128,7 @@ static bool is_hardlockup(unsigned int cpu)
return false; } +NOKPROBE_SYMBOL(is_hardlockup);
static void watchdog_hardlockup_kick(void) { @@ -184,6 +186,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) per_cpu(watchdog_hardlockup_warned, cpu) = false; } } +NOKPROBE_SYMBOL(watchdog_hardlockup_check);
#else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
-------------------------------------------------
Enable SDEI Watchdog for ARM64.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com --- arch/arm64/configs/openeuler_defconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 8f1a4db8d49b..28d54725af5a 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -7747,11 +7747,12 @@ CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HARDLOCKUP_DETECTOR=y # CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY is not set -CONFIG_HARDLOCKUP_DETECTOR_PERF=y +# CONFIG_HARDLOCKUP_DETECTOR_PERF is not set # CONFIG_HARDLOCKUP_DETECTOR_BUDDY is not set # CONFIG_HARDLOCKUP_DETECTOR_ARCH is not set CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y +CONFIG_SDEI_WATCHDOG=y CONFIG_DETECT_HUNG_TASK=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120 # CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC
----------------------------------------
Kprobes use 'stop_machine' to modify code which could be ran in the sdei_handler at the same time. This patch mask sdei before running the stop_machine callback to avoid this race condition.
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: Wei Li liwei391@huawei.com Reviewed-by: Cheng Jian cj.chengjian@huawei.com --- kernel/stop_machine.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index cedb17ba158a..9466d61d21c9 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -23,6 +23,10 @@ #include <linux/nmi.h> #include <linux/sched/wake_q.h>
+#ifdef CONFIG_ARM64 +#include <linux/arm_sdei.h> +#endif + /* * Structure to determine completion condition and record errors. May * be shared by works on different cpus. @@ -234,6 +238,9 @@ static int multi_cpu_stop(void *data) case MULTI_STOP_DISABLE_IRQ: local_irq_disable(); hard_irq_disable(); +#ifdef CONFIG_ARM64 + sdei_mask_local_cpu(); +#endif break; case MULTI_STOP_RUN: if (is_active) @@ -254,6 +261,9 @@ static int multi_cpu_stop(void *data) rcu_momentary_dyntick_idle(); } while (curstate != MULTI_STOP_EXIT);
+#ifdef CONFIG_ARM64 + sdei_unmask_local_cpu(); +#endif local_irq_restore(flags); return err; }
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA
----------------------------------------
We need to clear EOI for the secure timer only when we panic from sdei_handler. If we clear EOI for the secure timer in normal panic routiue, it has no bad effect on Hi1620, but it may cause undefine behavior on Hi1616. So add a check for NMI context before we clear EOI for the secure timer.
Fixes: dd397d5febc4("sdei_watchdog: clear EOI of the secure timer before kdump")
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com --- arch/arm64/kernel/machine_kexec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index cfa6b0dafc88..40607a4fe3a5 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -270,7 +270,8 @@ void machine_crash_shutdown(struct pt_regs *regs) * interrupt failed to trigger in the second kernel. So we clear eoi * of the secure timer before booting the second kernel. */ - sdei_watchdog_clear_eoi(); + if (in_nmi()) + sdei_watchdog_clear_eoi();
/* for crashing cpu */ crash_save_cpu(regs, smp_processor_id());
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3211 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/A...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3211 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/A...