[PATCH v2 00/26] Support NMI Watchdog

Douglas Anderson (2): watchdog/perf: add a weak function for an arch to detect if perf can use NMIs arm64: enable perf events based hard lockup detector Ionela Voinescu (1): cpufreq: add function to get the hardware max frequency Jinjie Ruan (2): irqchip/gic-v3: Fix hard LOCKUP caused by NMI being masked irqchip/gic-v3: Fix a system stall when using pseudo NMI with CONFIG_ARM64_NMI closed Lecopzer Chen (2): watchdog/perf: adapt the watchdog_perf interface for async model arm64: add hw_nmi_get_sample_period for preparation of lockup detector Lorenzo Pieralisi (1): irqchip/gic-v3: Implement FEAT_GICv3_NMI support Mark Brown (11): arm64/booting: Document boot requirements for FEAT_NMI arm64/sysreg: Add definitions for immediate versions of MSR ALLINT arm64/asm: Introduce assembly macros for managing ALLINT arm64/hyp-stub: Enable access to ALLINT arm64/cpufeature: Detect PE support for FEAT_NMI KVM: arm64: Hide FEAT_NMI from guests arm64/nmi: Manage masking for superpriority interrupts along with DAIF arm64/entry: Don't call preempt_schedule_irq() with NMIs masked arm64/irq: Document handling of FEAT_NMI in irqflags.h arm64/nmi: Add handling of superpriority interrupts as NMIs arm64/nmi: Add Kconfig for NMI Mark Rutland (3): irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling irqchip/gic-v3: Refactor ISB + EOIR at ack time irqchip/gic-v3: Fix priority mask handling Xiongfeng Wang (1): init: only move down lockup_detector_init() when sdei_watchdog is enabled Yicong Yang (3): watchdog: Support watchdog_sdei coexist with existing watchdogs watchdog: Fix call trace when failed to initialize sdei irqchip/gic-v3: Fix one race condition due to NMI withdraw Documentation/arm64/booting.rst | 6 + arch/arm/include/asm/arch_gicv3.h | 7 +- arch/arm64/Kconfig | 16 ++ arch/arm64/include/asm/assembler.h | 34 +++ arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 8 + arch/arm64/include/asm/daifflags.h | 20 ++ arch/arm64/include/asm/irqflags.h | 10 + arch/arm64/include/asm/ptrace.h | 2 +- arch/arm64/include/asm/sysreg.h | 26 +++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/cpufeature.c | 64 +++++- arch/arm64/kernel/head.S | 13 ++ arch/arm64/kernel/perf_event.c | 12 +- arch/arm64/kernel/process.c | 9 + arch/arm64/kernel/watchdog_hld.c | 36 ++++ arch/arm64/kernel/watchdog_sdei.c | 8 +- arch/arm64/kvm/hyp/switch.c | 7 + arch/arm64/kvm/sys_regs.c | 2 + drivers/cpufreq/cpufreq.c | 20 ++ drivers/irqchip/irq-gic-v3.c | 314 +++++++++++++++++++++++----- drivers/perf/arm_pmu.c | 5 + include/linux/cpufreq.h | 5 + include/linux/irqchip/arm-gic-v3.h | 4 + include/linux/nmi.h | 11 + include/linux/perf/arm_pmu.h | 2 + init/main.c | 6 +- kernel/watchdog.c | 83 +++++++- kernel/watchdog_hld.c | 12 +- 29 files changed, 674 insertions(+), 71 deletions(-) create mode 100644 arch/arm64/kernel/watchdog_hld.c -- 2.25.1

From: Mark Rutland <mark.rutland@arm.com> [Upstream commit adf14453d2c037ab529040c1186ea32e277e783a] There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. We identified and fixes this for regular IRQs in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Unfortunately, we forgot to do the same for psuedo-NMIs when support for those was added in commit: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Which means that when pseudo-NMIs are used for PMU support, we'll hit the same problem. Apply the same fix as for regular IRQs. Note that when EOI mode 1 is in use, the call to gic_write_eoir() will provide an ISB. Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-2-mark.rutland@arm.com --- drivers/irqchip/irq-gic-v3.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index f875e168c873..720cc39b0a2d 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -581,6 +581,9 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) if (static_branch_likely(&supports_deactivate_key)) gic_write_eoir(irqnr); + else + isb() + /* * Leave the PSR.I bit set to prevent other NMIs to be * received while handling this one. -- 2.25.1

From: Mark Rutland <mark.rutland@arm.com> [Upstream commit 6efb50923771f392122f5ce69dfc43b08f16e449] There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. To fix this, we place an ISB between a read of IAR and the subsequent invocation of an IRQ handler. When EOI mode 1 is in use, we need to EOI an interrupt prior to invoking its handler, and we have a write to EOIR for this. As this write to EOIR requires an ISB, and this is provided by the gic_write_eoir() helper, we omit the usual ISB in this case, with the logic being: | if (static_branch_likely(&supports_deactivate_key)) | gic_write_eoir(irqnr); | else | isb(); This is somewhat opaque, and it would be a little clearer if there were an unconditional ISB, with only the write to EOIR being conditional, e.g. | if (static_branch_likely(&supports_deactivate_key)) | write_gicreg(irqnr, ICC_EOIR1_EL1); | | isb(); This patch rewrites the code that way, with this logic factored into a new helper function with comments explaining what the ISB is for, as were originally laid out in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Note that since then, we removed the IAR polling in commit: 342677d70ab92142 ("irqchip/gic-v3: Remove acknowledge loop") ... which removed one of the two race conditions. For consistency, other portions of the driver are made to manipulate EOIR using write_gicreg() and explcit ISBs, and the gic_write_eoir() helper function is removed. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-3-mark.rutland@arm.com --- arch/arm/include/asm/arch_gicv3.h | 7 +---- drivers/irqchip/irq-gic-v3.c | 43 ++++++++++++++++++++++++------- 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/arch_gicv3.h b/arch/arm/include/asm/arch_gicv3.h index c815477b4303..29797945f299 100644 --- a/arch/arm/include/asm/arch_gicv3.h +++ b/arch/arm/include/asm/arch_gicv3.h @@ -128,6 +128,7 @@ static inline u64 read_ ## a64(void) \ return val; \ } +CPUIF_MAP(ICC_EOIR1, ICC_EOIR1_EL1) CPUIF_MAP(ICC_PMR, ICC_PMR_EL1) CPUIF_MAP(ICC_AP0R0, ICC_AP0R0_EL1) CPUIF_MAP(ICC_AP0R1, ICC_AP0R1_EL1) @@ -177,12 +178,6 @@ CPUIF_MAP_LO_HI(ICH_LR0, ICH_LRC0, ICH_LR0_EL2) /* Low-level accessors */ -static inline void gic_write_eoir(u32 irq) -{ - write_sysreg(irq, ICC_EOIR1); - isb(); -} - static inline void gic_write_dir(u32 val) { write_sysreg(val, ICC_DIR); diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 720cc39b0a2d..3f440dcb3523 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -486,7 +486,8 @@ static void gic_irq_nmi_teardown(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { - gic_write_eoir(gic_irq(d)); + write_gicreg(gic_irq(d), ICC_EOIR1_EL1); + isb(); } static void gic_eoimode1_eoi_irq(struct irq_data *d) @@ -567,10 +568,38 @@ static void gic_deactivate_unhandled(u32 irqnr) if (irqnr < 8192) gic_write_dir(irqnr); } else { - gic_write_eoir(irqnr); + write_gicreg(irqnr, ICC_EOIR1_EL1); + isb(); } } +/* + * Follow a read of the IAR with any HW maintenance that needs to happen prior + * to invoking the relevant IRQ handler. We must do two things: + * + * (1) Ensure instruction ordering between a read of IAR and subsequent + * instructions in the IRQ handler using an ISB. + * + * It is possible for the IAR to report an IRQ which was signalled *after* + * the CPU took an IRQ exception as multiple interrupts can race to be + * recognized by the GIC, earlier interrupts could be withdrawn, and/or + * later interrupts could be prioritized by the GIC. + * + * For devices which are tightly coupled to the CPU, such as PMUs, a + * context synchronization event is necessary to ensure that system + * register state is not stale, as these may have been indirectly written + * *after* exception entry. + * + * (2) Deactivate the interrupt when EOI mode 1 is in use. + */ +static inline void gic_complete_ack(u32 irqnr) +{ + if (static_branch_likely(&supports_deactivate_key)) + write_gicreg(irqnr, ICC_EOIR1_EL1); + + isb(); +} + static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) { bool irqs_enabled = interrupts_enabled(regs); @@ -579,10 +608,7 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) if (irqs_enabled) nmi_enter(); - if (static_branch_likely(&supports_deactivate_key)) - gic_write_eoir(irqnr); - else - isb() + gic_complete_ack(irqnr); /* * Leave the PSR.I bit set to prevent other NMIs to be @@ -623,10 +649,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs if (likely(irqnr > 15)) { int err; - if (static_branch_likely(&supports_deactivate_key)) - gic_write_eoir(irqnr); - else - isb(); + gic_complete_ack(irqnr); err = handle_domain_irq(gic_data.domain, irqnr, regs); if (err) { -- 2.25.1

From: Mark Rutland <mark.rutland@arm.com> [Upstream commit 614ab80c96474682157cabb14f8c8602b3422e90] When a kernel is built with CONFIG_ARM64_PSEUDO_NMI=y and pseudo-NMIs are enabled at runtime, GICv3's gic_handle_irq() can leave DAIF and ICC_PMR_EL1 in an unexpected state in some cases, breaking subsequent usage of local_irq_enable() and resulting in softirqs being run with IRQs erroneously masked (possibly resulting in deadlocks). This can happen when an IRQ exception is taken from a context where regular IRQs were unmasked, and either: (1) ICC_IAR1_EL1 indicates a special INTID (e.g. as a result of an IRQ being withdrawn since the IRQ exception was taken). (2) ICC_IAR1_EL1 and ICC_RPR_EL1 indicate an NMI was acknowledged. When an NMI is taken from a context where regular IRQs were masked, there is no problem. When CONFIG_ARM64_DEBUG_PRIORITY_MASKING=y, this can be detected with perf, e.g. | # ./perf record -a -g -e cycles:k ls -alR / > /dev/null 2>&1 | ------------[ cut here ]------------ | WARNING: CPU: 0 PID: 14 at arch/arm64/include/asm/irqflags.h:32 arch_local_irq_enable+0x4c/0x6c | Modules linked in: | CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 5.18.0-rc5-00004-g876c38e3d20b #12 | Hardware name: linux,dummy-virt (DT) | pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : arch_local_irq_enable+0x4c/0x6c | lr : __do_softirq+0x110/0x5d8 | sp : ffff8000080bbbc0 | pmr_save: 000000f0 | x29: ffff8000080bbbc0 x28: ffff316ac3a6ca40 x27: 0000000000000000 | x26: 0000000000000000 x25: ffffa04611c06008 x24: ffffa04611c06008 | x23: 0000000040400005 x22: 0000000000000200 x21: ffff8000080bbe20 | x20: ffffa0460fe10320 x19: 0000000000000009 x18: 0000000000000000 | x17: ffff91252dfa9000 x16: ffff800008004000 x15: 0000000000004000 | x14: 0000000000000028 x13: ffffa0460fe17578 x12: ffffa0460fed4294 | x11: ffffa0460fedc168 x10: ffffffffffffff80 x9 : ffffa0460fe10a70 | x8 : ffffa0460fedc168 x7 : 000000000000b762 x6 : 00000000057c3bdf | x5 : ffff8000080bbb18 x4 : 0000000000000000 x3 : 0000000000000001 | x2 : ffff91252dfa9000 x1 : 0000000000000060 x0 : 00000000000000f0 | Call trace: | arch_local_irq_enable+0x4c/0x6c | __irq_exit_rcu+0x180/0x1ac | irq_exit_rcu+0x1c/0x44 | el1_interrupt+0x4c/0xe4 | el1h_64_irq_handler+0x18/0x24 | el1h_64_irq+0x74/0x78 | smpboot_thread_fn+0x68/0x2c0 | kthread+0x124/0x130 | ret_from_fork+0x10/0x20 | irq event stamp: 193241 | hardirqs last enabled at (193240): [<ffffa0460fe10a9c>] __do_softirq+0x10c/0x5d8 | hardirqs last disabled at (193241): [<ffffa0461102ffe4>] el1_dbg+0x24/0x90 | softirqs last enabled at (193234): [<ffffa0460fe10e00>] __do_softirq+0x470/0x5d8 | softirqs last disabled at (193239): [<ffffa0460fea9944>] __irq_exit_rcu+0x180/0x1ac | ---[ end trace 0000000000000000 ]--- The necessary manipulation of DAIF and ICC_PMR_EL1 depends on the interrupted context, but the structure of gic_handle_irq() makes this also depend on whether the GIC reports an IRQ, NMI, or special INTID: * When the interrupted context had regular IRQs masked (and hence the interrupt must be an NMI), the entry code performs the NMI entry/exit and gic_handle_irq() should return with DAIF and ICC_PMR_EL1 unchanged. This is handled correctly today. * When the interrupted context had regular IRQs unmasked, the entry code performs IRQ entry/exit, but expects gic_handle_irq() to always update ICC_PMR_EL1 and DAIF.IF to unmask NMIs (but not regular IRQs) prior to returning (which it must do prior to invoking any regular IRQ handler). This unbalanced calling convention is necessary because we don't know whether an NMI has been taken until acknowledged by a read from ICC_IAR1_EL1, and so we need to perform the read with NMI masked in case an NMI has been taken (and needs to be handled with NMIs masked). Unfortunately, this is not handled consistently: - When ICC_IAR1_EL1 reports a special INTID, gic_handle_irq() returns immediately without manipulating ICC_PMR_EL1 and DAIF. - When RPR_EL1 indicates an NMI, gic_handle_irq() calls gic_handle_nmi() to invoke the NMI handler, then returns without manipulating ICC_PMR_EL1 and DAIF. - For regular IRQs, gic_handle_irq() manipulates ICC_PMR_EL1 and DAIF prior to invoking the IRQ handler. There were related problems with special INTID handling in the past, where if an exception was taken from a context with regular IRQs masked and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously unmask NMIs in NMI context permitted an unexpected nested NMI. That case specifically was fixed by commit: a97709f563a078e2 ("irqchip/gic-v3: Do not enable irqs when handling spurious interrups") ... but unfortunately that commit added an inverse problem, where if an exception was taken from a context with regular IRQs *unmasked* and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously fail to unmask NMIs (and consequently regular IRQs could not be unmasked during softirq processing). Before and after that commit, if an NMI was taken from a context with regular IRQs unmasked gic_handle_irq() would not unmask NMIs prior to returning, leading to the same problem with softirq handling. This patch fixes this by restructuring gic_handle_irq(), splitting it into separate irqson/irqsoff helper functions which consistently perform the DAIF + ICC_PMR1_EL1 manipulation based upon the interrupted context, regardless of the event indicated by ICC_IAR1_EL1. The special INTID handling is moved into the low-level IRQ/NMI handler invocation helper functions, so that early returns don't prevent the required manipulation of DAIF + ICC_PMR_EL1. Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-4-mark.rutland@arm.com --- drivers/irqchip/irq-gic-v3.c | 138 +++++++++++++++++++++++++---------- 1 file changed, 101 insertions(+), 37 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 3f440dcb3523..5b57493f02b2 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -600,50 +600,23 @@ static inline void gic_complete_ack(u32 irqnr) isb(); } -static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) +static bool gic_rpr_is_nmi_prio(void) { - bool irqs_enabled = interrupts_enabled(regs); - int err; - - if (irqs_enabled) - nmi_enter(); - - gic_complete_ack(irqnr); - - /* - * Leave the PSR.I bit set to prevent other NMIs to be - * received while handling this one. - * PSR.I will be restored when we ERET to the - * interrupted context. - */ - err = handle_domain_nmi(gic_data.domain, irqnr, regs); - if (err) - gic_deactivate_unhandled(irqnr); + if (!gic_supports_nmi()) + return false; - if (irqs_enabled) - nmi_exit(); + return unlikely(gic_read_rpr() == GICD_INT_NMI_PRI); } -static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) +static bool gic_irqnr_is_special(u32 irqnr) { - u32 irqnr; - - irqnr = gic_read_iar(); - - /* Check for special IDs first */ - if ((irqnr >= 1020 && irqnr <= 1023)) - return; + return irqnr >= 1020 && irqnr <= 1023; +} - if (gic_supports_nmi() && - unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) { - gic_handle_nmi(irqnr, regs); +static void __gic_handle_irq(u32 irqnr, struct pt_regs *regs) +{ + if (gic_irqnr_is_special(irqnr)) return; - } - - if (gic_prio_masking_enabled()) { - gic_pmr_mask_irqs(); - gic_arch_enable_irqs(); - } /* Treat anything but SGIs in a uniform way */ if (likely(irqnr > 15)) { @@ -675,6 +648,97 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs WARN_ONCE(true, "Unexpected SGI received!\n"); #endif } + +} + +static void __gic_handle_nmi(u32 irqnr, struct pt_regs *regs) +{ + if (gic_irqnr_is_special(irqnr)) + return; + + gic_complete_ack(irqnr); + + if (handle_domain_nmi(gic_data.domain, irqnr, regs)) + gic_deactivate_unhandled(irqnr); +} + +/* + * An exception has been taken from a context with IRQs enabled, and this could + * be an IRQ or an NMI. + * + * The entry code called us with DAIF.IF set to keep NMIs masked. We must clear + * DAIF.IF (and update ICC_PMR_EL1 to mask regular IRQs) prior to returning, + * after handling any NMI but before handling any IRQ. + * + * The entry code has performed IRQ entry, and if an NMI is detected we must + * perform NMI entry/exit around invoking the handler. + */ +static void __gic_handle_irq_from_irqson(struct pt_regs *regs) +{ + bool is_nmi; + u32 irqnr; + + irqnr = gic_read_iar(); + + is_nmi = gic_rpr_is_nmi_prio(); + + if (is_nmi) { + nmi_enter(); + __gic_handle_nmi(irqnr, regs); + nmi_exit(); + } + + if (gic_prio_masking_enabled()) { + gic_pmr_mask_irqs(); + gic_arch_enable_irqs(); + } + + if (!is_nmi) + __gic_handle_irq(irqnr, regs); +} + +/* + * An exception has been taken from a context with IRQs disabled, which can only + * be an NMI. + * + * The entry code called us with DAIF.IF set to keep NMIs masked. We must leave + * DAIF.IF (and ICC_PMR_EL1) unchanged. + * + * The entry code has performed NMI entry. + */ +static void __gic_handle_irq_from_irqsoff(struct pt_regs *regs) +{ + u64 pmr; + u32 irqnr; + + /* + * We were in a context with IRQs disabled. However, the + * entry code has set PMR to a value that allows any + * interrupt to be acknowledged, and not just NMIs. This can + * lead to surprising effects if the NMI has been retired in + * the meantime, and that there is an IRQ pending. The IRQ + * would then be taken in NMI context, something that nobody + * wants to debug twice. + * + * Until we sort this, drop PMR again to a level that will + * actually only allow NMIs before reading IAR, and then + * restore it to what it was. + */ + pmr = gic_read_pmr(); + gic_pmr_mask_irqs(); + isb(); + irqnr = gic_read_iar(); + gic_write_pmr(pmr); + + __gic_handle_nmi(irqnr, regs); +} + +static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) +{ + if (unlikely(gic_supports_nmi() && !interrupts_enabled(regs))) + __gic_handle_irq_from_irqsoff(regs); + else + __gic_handle_irq_from_irqson(regs); } static u32 gic_get_pribits(void) -- 2.25.1

From: Mark Brown <broonie@kernel.org> In order to use FEAT_NMI we must be able to use ALLINT, require that it behave as though not trapped when it is present. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- Documentation/arm64/booting.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst index d3f3a60fbf25..eca3e4d1d18d 100644 --- a/Documentation/arm64/booting.rst +++ b/Documentation/arm64/booting.rst @@ -245,6 +245,12 @@ Before jumping into the kernel, the following conditions must be met: - HCR_EL2.APK (bit 40) must be initialised to 0b1 - HCR_EL2.API (bit 41) must be initialised to 0b1 + For CPUs with Non-maskable Interrupts (FEAT_NMI): + + - If the kernel is entered at EL1 and EL2 is present: + + - HCRX_EL2.TALLINT must be initialised to 0b0. + The requirements described above for CPU mode, caches, MMUs, architected timers, coherency and system registers apply to all CPUs. All CPUs must enter the kernel in the same exception level. -- 2.25.1

From: Mark Brown <broonie@kernel.org> Encodings are provided for ALLINT which allow setting of ALLINT.ALLINT using an immediate rather than requiring that a register be loaded with the value to write. Since these don't currently fit within the scheme we have for sysreg generation add manual encodings like we currently do for other similar registers such as SVCR. Since it is required that these immediate versions be encoded with xzr as the source register provide asm wrapper which ensure this is the case. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/daifflags.h | 9 +++++++++ arch/arm64/include/asm/sysreg.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h index 48bfbf70dbb0..b6b4dea197ab 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -15,6 +15,15 @@ #define DAIF_ERRCTX (PSR_I_BIT | PSR_A_BIT) #define DAIF_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT) +static __always_inline void _allint_clear(void) +{ + asm volatile(__msr_s(SYS_ALLINT_CLR, "xzr")); +} + +static __always_inline void _allint_set(void) +{ + asm volatile(__msr_s(SYS_ALLINT_SET, "xzr")); +} /* mask/save/unmask/restore all exceptions, including interrupts. */ static inline void local_daif_mask(void) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index a769d9d11aaa..fb7e120b893a 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -105,6 +105,8 @@ #define SYS_DC_CSW sys_insn(1, 0, 7, 10, 2) #define SYS_DC_CISW sys_insn(1, 0, 7, 14, 2) +#define SYS_ALLINT_CLR sys_reg(0, 1, 4, 0, 0) +#define SYS_ALLINT_SET sys_reg(0, 1, 4, 1, 0) #define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0) #define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2) -- 2.25.1

From: Mark Brown <broonie@kernel.org> In order to allow assembly code to ensure that not even superpriorty interrupts can preempt it provide macros for enabling and disabling ALLINT.ALLINT. This is not integrated into the existing DAIF macros since we do not always wish to manage ALLINT along with DAIF and the use of DAIF in the naming of the existing macros might lead to surprises if ALLINT is also managed. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/assembler.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 4e0c07c60f84..192d61b68d7d 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -23,6 +23,22 @@ #include <asm/ptrace.h> #include <asm/thread_info.h> + .macro disable_allint +#ifdef CONFIG_ARM64_NMI +alternative_if ARM64_HAS_NMI + msr_s SYS_ALLINT_SET, xzr +alternative_else_nop_endif +#endif + .endm + + .macro enable_allint +#ifdef CONFIG_ARM64_NMI +alternative_if ARM64_HAS_NMI + msr_s SYS_ALLINT_CLR, xzr +alternative_else_nop_endif +#endif + .endm + .macro save_and_disable_daif, flags mrs \flags, daif msr daifset, #0xf -- 2.25.1

From: Mark Brown <broonie@kernel.org> In order to use NMIs we need to ensure that traps are disabled for it so update HCRX_EL2 to ensure that TALLINT is not set when we detect support for NMIs. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 5 +++++ arch/arm64/kernel/head.S | 13 +++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index fb7e120b893a..fecc829377a1 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -223,6 +223,8 @@ #define SYS_PAR_EL1_F BIT(0) #define SYS_PAR_EL1_FST GENMASK(6, 1) +#define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) + /*** Statistical Profiling Extension ***/ /* ID registers */ #define SYS_PMSIDR_EL1 sys_reg(3, 0, 9, 9, 7) @@ -419,6 +421,7 @@ #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7) #define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) +#define SYS_HCRX_EL2 sys_reg(3, 4, 1, 2, 2) #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) @@ -635,6 +638,7 @@ #define ID_AA64PFR0_EL0_32BIT_64BIT 0x2 /* id_aa64pfr1 */ +#define ID_AA64PFR1_NMI_SHIFT 36 #define ID_AA64PFR1_SSBS_SHIFT 4 #define ID_AA64PFR1_SSBS_PSTATE_NI 0 @@ -686,6 +690,7 @@ /* id_aa64mmfr1 */ #define ID_AA64MMFR1_ECBHB_SHIFT 60 #define ID_AA64MMFR1_TIDCP1_SHIFT 52 +#define ID_AA64MMFR1_HCX_SHIFT 40 #define ID_AA64MMFR1_PAN_SHIFT 20 #define ID_AA64MMFR1_LOR_SHIFT 16 #define ID_AA64MMFR1_HPD_SHIFT 12 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 2f784d3b4b39..4558f6d94dab 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -497,11 +497,24 @@ ENTRY(el2_setup) msr sctlr_el2, x0 #ifdef CONFIG_ARM64_VHE + mrs x2, id_aa64pfr1_el1 + ubfx x2, x2, #ID_AA64PFR1_NMI_SHIFT, #4 + cbz x2, .Lskip_nmi +.Linit_nmi: + mrs x2, id_aa64mmfr1_el1 + ubfx x2, x2, #ID_AA64MMFR1_HCX_SHIFT, #4 + cbz x2, .Lskip_nmi + + mrs_s x2, SYS_HCRX_EL2 + bic x2, x2, #HCRX_EL2_TALLINT_MASK // Don't trap ALLINT + msr_s SYS_HCRX_EL2, x2 + /* * Check for VHE being present. For the rest of the EL2 setup, * x2 being non-zero indicates that we do have VHE, and that the * kernel is intended to run at EL2. */ +.Lskip_nmi: mrs x2, id_aa64mmfr1_el1 ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4 #else -- 2.25.1

From: Mark Brown <broonie@kernel.org> Use of FEAT_NMI requires that all the PEs in the system and the GIC have NMI support. This patch implements the PE part of that detection. In order to avoid problematic interactions between real and pseudo NMIs we disable the architected feature if the user has enabled pseudo NMIs on the command line. If this is done on a system where support for the architected feature is detected then a warning is printed during boot in order to help users spot what is likely to be a misconfiguration. In order to allow KVM to offer the feature to guests even if pseudo NMIs are in use by the host we have a separate feature for the raw feature which is used in KVM. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 8 ++++ arch/arm64/include/asm/sysreg.h | 5 +++ arch/arm64/kernel/cpufeature.c | 64 ++++++++++++++++++++++++++++- 4 files changed, 78 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index 9a2d5148f9cc..2246c379a68d 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -67,6 +67,8 @@ #define ARM64_HAS_TIDCP1 57 #define ARM64_HAS_FGT 58 #define ARM64_HAFT 59 +#define ARM64_HAS_NMI 60 +#define ARM64_USES_NMI 61 #define ARM64_NCAPS 80 diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 528115f7e361..6e7ea1750010 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -284,6 +284,8 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0; * CPUs must match the state of the capability as detected by the boot CPU. */ #define ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE ARM64_CPUCAP_SCOPE_BOOT_CPU +#define ARM64_CPUCAP_BOOT_CPU_FEATURE \ + (ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU) struct arm64_cpu_capabilities { const char *desc; @@ -658,6 +660,12 @@ static __always_inline bool system_uses_irq_prio_masking(void) cpus_have_const_cap(ARM64_HAS_IRQ_PRIO_MASKING); } +static __always_inline bool system_uses_nmi(void) +{ + return IS_ENABLED(CONFIG_ARM64_NMI) && + cpus_have_const_cap(ARM64_USES_NMI); +} + static inline bool system_has_prio_mask_debugging(void) { return IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING) && diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index fecc829377a1..64eda6356005 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -529,6 +529,8 @@ /* SCTLR_EL1 specific flags. */ #define SCTLR_EL1_TIDCP (BIT(63)) +#define SCTLR_EL1_SPINTMASK (BIT(62)) +#define SCTLR_EL1_NMI (BIT(61)) #define SCTLR_EL1_UCI (BIT(26)) #define SCTLR_EL1_E0E (BIT(24)) #define SCTLR_EL1_SPAN (BIT(23)) @@ -645,6 +647,9 @@ #define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 #define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 +#define ID_AA64PFR1_NMI_IMP_DEF 0x1 +#define ID_AA64PFR1_NMI_IMP_NI 0x0 + /* id_aa64zfr0 */ #define ID_AA64ZFR0_SM4_SHIFT 40 #define ID_AA64ZFR0_SHA3_SHIFT 32 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 3ad5bc874e79..43fc63acb986 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -28,6 +28,7 @@ #include <asm/traps.h> #include <asm/vectors.h> #include <asm/virt.h> +#include <asm/daifflags.h> /* Kernel representation of AT_HWCAP and AT_HWCAP2 */ static unsigned long elf_hwcap __read_mostly; @@ -181,6 +182,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { }; static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_NMI_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI), ARM64_FTR_END, }; @@ -1275,9 +1277,11 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap) } #endif /* CONFIG_ARM64_PTR_AUTH */ -#ifdef CONFIG_ARM64_PSEUDO_NMI +#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) || IS_ENABLED(CONFIG_ARM64_NMI) static bool enable_pseudo_nmi; +#endif +#ifdef CONFIG_ARM64_PSEUDO_NMI static int __init early_enable_pseudo_nmi(char *p) { return strtobool(p, &enable_pseudo_nmi); @@ -1291,6 +1295,41 @@ static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry, } #endif +#ifdef CONFIG_ARM64_NMI +static bool use_nmi(const struct arm64_cpu_capabilities *entry, int scope) +{ + if (!has_cpuid_feature(entry, scope)) + return false; + + /* + * Having both real and pseudo NMIs enabled simultaneously is + * likely to cause confusion. Since pseudo NMIs must be + * enabled with an explicit command line option, if the user + * has set that option on a system with real NMIs for some + * reason assume they know what they're doing. + */ + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && enable_pseudo_nmi) { + pr_info("Pseudo NMI enabled, not using architected NMI\n"); + return false; + } + + return true; +} + +static void nmi_enable(const struct arm64_cpu_capabilities *__unused) +{ + /* + * Enable use of NMIs controlled by ALLINT, SPINTMASK should + * be clear by default but make it explicit that we are using + * this mode. Ensure that ALLINT is clear first in order to + * avoid leaving things masked. + */ + _allint_clear(); + sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPINTMASK, SCTLR_EL1_NMI); + isb(); +} +#endif + static void elf_hwcap_fixup(void) { #ifdef CONFIG_ARM64_ERRATUM_1742098 @@ -1672,6 +1711,29 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sign = FTR_UNSIGNED, .min_field_value = 1, }, +#ifdef CONFIG_ARM64_NMI + { + .desc = "Non-maskable Interrupts present", + .capability = ARM64_HAS_NMI, + .type = ARM64_CPUCAP_BOOT_CPU_FEATURE, + .sys_reg = SYS_ID_AA64PFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64PFR1_NMI_SHIFT, + .min_field_value = ID_AA64PFR1_NMI_IMP_DEF, + .matches = has_cpuid_feature, + }, + { + .desc = "Non-maskable Interrupts enabled", + .capability = ARM64_USES_NMI, + .type = ARM64_CPUCAP_BOOT_CPU_FEATURE, + .sys_reg = SYS_ID_AA64PFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64PFR1_NMI_SHIFT, + .min_field_value = ID_AA64PFR1_NMI_IMP_DEF, + .matches = use_nmi, + .cpu_enable = nmi_enable, + }, +#endif {}, }; -- 2.25.1

From: Mark Brown <broonie@kernel.org> FEAT_NMI is not yet useful to guests pending implementation of vGIC support. Mask out the feature from the ID register and prevent guests creating state in ALLINT.ALLINT by activating the trap on write provided in HCRX_EL2.TALLINT when they are running. There is no trap available for reads from ALLINT. We do not need to check for FEAT_HCRX since it is mandatory since v8.7 and FEAT_NMI is a v8.8 feature. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 9 +++++++++ arch/arm64/kvm/hyp/switch.c | 7 +++++++ arch/arm64/kvm/sys_regs.c | 2 ++ 3 files changed, 18 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 64eda6356005..3da501312196 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -223,6 +223,8 @@ #define SYS_PAR_EL1_F BIT(0) #define SYS_PAR_EL1_FST GENMASK(6, 1) +#define ID_AA64PFR1_NMI_MASK GENMASK(39, 36) +#define HCRX_EL2_TALLINT BIT(6) #define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) /*** Statistical Profiling Extension ***/ @@ -908,6 +910,13 @@ write_sysreg(__scs_new, sysreg); \ } while (0) +#define sysreg_clear_set_s(sysreg, clear, set) do { \ + u64 __scs_val = read_sysreg_s(sysreg); \ + u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set); \ + if (__scs_new != __scs_val) \ + write_sysreg_s(__scs_new, sysreg); \ +} while (0) + #endif #endif /* __ASM_SYSREG_H */ diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 768983bd2326..624e5c83a497 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -87,11 +87,18 @@ static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu) */ write_sysreg(0, pmselr_el0); write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0); + + if (cpus_have_final_cap(ARM64_HAS_NMI)) + sysreg_clear_set_s(SYS_HCRX_EL2, 0, HCRX_EL2_TALLINT); + write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); } static void __hyp_text __deactivate_traps_common(void) { + if (cpus_have_final_cap(ARM64_HAS_NMI)) + sysreg_clear_set_s(SYS_HCRX_EL2, HCRX_EL2_TALLINT, 0); + write_sysreg(0, hstr_el2); write_sysreg(0, pmuserenr_el0); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 605222ed2943..4fad598eb3dd 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1095,6 +1095,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, (0xfUL << ID_AA64ISAR1_API_SHIFT) | (0xfUL << ID_AA64ISAR1_GPA_SHIFT) | (0xfUL << ID_AA64ISAR1_GPI_SHIFT)); + } else if (id == SYS_ID_AA64PFR1_EL1) { + val &= ~ID_AA64PFR1_NMI_MASK; } return val; -- 2.25.1

From: Mark Brown <broonie@kernel.org> As we do for pseudo NMIs add code to our DAIF management which keeps superpriority interrupts unmasked when we have asynchronous exceptions enabled. Since superpriority interrupts are not masked through DAIF like pseduo NMIs are we also need to modify the assembler macros for managing DAIF to ensure that the masking is done in the assembly code. At present users of the assembly macros always mask pseudo NMIs. There is a difference to the actual handling between pseudo NMIs and superpriority interrupts in the assembly save_and_disable_irq and restore_irq macros, these cover both interrupts and FIQs using DAIF without regard for the use of pseudo NMIs so also mask those but are not updated here to mask superpriority interrupts. Given the names it is not clear that the behaviour with pseudo NMIs is particularly intentional, and in any case these macros are only used in the implementation of alternatives for software PAN while hardware PAN has been mandatory since v8.1 so it is not anticipated that practical systems with support for FEAT_NMI will ever execute the affected code. This should be a conservative set of masked regions, we may be able to relax this in future, but this should represent a good starting point. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/assembler.h | 18 ++++++++++++++++++ arch/arm64/include/asm/daifflags.h | 11 +++++++++++ 2 files changed, 29 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 192d61b68d7d..9c0c22540b1c 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -39,27 +39,45 @@ alternative_else_nop_endif #endif .endm + .macro restore_allint, flags +#ifdef CONFIG_ARM64_NMI +alternative_if ARM64_HAS_NMI + and \flags, \flags, #PSR_A_BIT + .if \flags == PSR_A_BIT + msr_s SYS_ALLINT_SET, xzr + .else + msr_s SYS_ALLINT_CLR, xzr + .endif +alternative_else_nop_endif +#endif + .endm + .macro save_and_disable_daif, flags + disable_allint mrs \flags, daif msr daifset, #0xf .endm .macro disable_daif + disable_allint msr daifset, #0xf .endm .macro enable_daif msr daifclr, #0xf + enable_allint .endm .macro restore_daif, flags:req msr daif, \flags + restore_allint \flags .endm /* Only on aarch64 pstate, PSR_D_BIT is different for aarch32 */ .macro inherit_daif, pstate:req, tmp:req and \tmp, \pstate, #(PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT) msr daif, \tmp + restore_allint \pstate .endm /* IRQ is the lowest priority flag, unconditionally unmask the rest. */ diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h index b6b4dea197ab..29f88942a184 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -42,6 +42,9 @@ static inline void local_daif_mask(void) if (system_uses_irq_prio_masking()) gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET); + if (system_uses_nmi()) + _allint_set(); + trace_hardirqs_off(); } @@ -123,6 +126,14 @@ static inline void local_daif_restore(unsigned long flags) write_sysreg(flags, daif); + /* If we can take asynchronous errors we can take NMIs */ + if (system_uses_nmi()) { + if (flags & PSR_A_BIT) + _allint_set(); + else + _allint_clear(); + } + if (irq_disabled) trace_hardirqs_off(); } -- 2.25.1

From: Mark Brown <broonie@kernel.org> As we do for pseudo NMIs don't call preempt_schedule_irq() when architechted NMIs are masked. If they are masked then we are calling from a preempting context so skip preemption. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/process.c | 9 +++++++++ 2 files changed, 12 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 3da501312196..521f2c6cce86 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -202,6 +202,8 @@ #define SYS_SPSR_EL1 sys_reg(3, 0, 4, 0, 0) #define SYS_ELR_EL1 sys_reg(3, 0, 4, 0, 1) +#define SYS_ALLINT sys_reg(3, 0, 4, 3, 0) + #define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0) #define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0) @@ -226,6 +228,7 @@ #define ID_AA64PFR1_NMI_MASK GENMASK(39, 36) #define HCRX_EL2_TALLINT BIT(6) #define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) +#define ALLINT_ALLINT BIT(13) /*** Statistical Profiling Extension ***/ /* ID registers */ diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 9bc43e35e87f..0b9193dff1e2 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -693,6 +693,15 @@ core_initcall(tagged_addr_init); asmlinkage void __sched arm64_preempt_schedule_irq(void) { + /* + * Architected NMIs are unmasked prior to handling regular + * IRQs and masked while handling FIQs. If ALLINT is set then + * we are in a NMI or other preempting context so skip + * preemption. + */ + if (system_uses_nmi() && (read_sysreg_s(SYS_ALLINT) & ALLINT_ALLINT)) + return; + lockdep_assert_irqs_disabled(); /* -- 2.25.1

From: Mark Brown <broonie@kernel.org> We have documentation at the top of irqflags.h which explains the DAIF masking. Since the additional masking with NMIs is related and also covers the IF in DAIF extend the comment to note what's going on with NMIs though none of the code in irqflags.h is updated to handle NMIs. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/irqflags.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index 1a59f0ed1ae3..2b2cdd7bb6fb 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -18,6 +18,16 @@ * side effects for other flags. Keeping to this order makes it easier for * entry.S to know which exceptions should be unmasked. * + * With the addition of the FEAT_NMI extension we gain an additional + * class of superpriority IRQ/FIQ which is separately masked with a + * choice of modes controlled by SCTLR_ELn.{SPINTMASK,NMI}. Linux + * sets SPINTMASK to 0 and NMI to 1 which results in ALLINT.ALLINT + * masking both superpriority interrupts and IRQ/FIQ regardless of the + * I and F settings. Since these superpriority interrupts are being + * used as NMIs we do not include them in the interrupt masking here, + * anything that requires that NMIs be masked needs to explicitly do + * so. + * * FIQ is never expected, but we mask it when we disable debug exceptions, and * unmask it at all other times. */ -- 2.25.1

From: Mark Brown <broonie@kernel.org> Our goal with superpriority interrupts is to use them as NMIs, taking advantage of the much smaller regions where they are masked to allow prompt handling of the most time critical interrupts. When an interrupt configured with superpriority we will enter EL1 as normal for any interrupt, the presence of a superpriority interrupt is indicated with a status bit in ISR_EL1. We use this to check for the presence of a superpriority interrupt before we unmask anything in elX_interrupt(), reporting without unmasking any interrupts. If no superpriority interrupt is present then we handle normal interrupts as normal, superpriority interrupts will be unmasked while doing so as a result of setting DAIF_PROCCTX. Both IRQs and FIQs may be configured with superpriority so we handle both, passing an additional root handler into the elX_interrupt() function along with the mask for the bit in ISR_EL1 which indicates the presence of the relevant kind of superpriority interrupt. These root handlers can be configured by the interrupt controller similarly to the root handlers for normal interrupts using the newly added set_handle_nmi_irq() and set_handle_nmi_fiq() functions. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 2 ++ drivers/irqchip/irq-gic-v3.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 521f2c6cce86..761b26417a5d 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -205,6 +205,7 @@ #define SYS_ALLINT sys_reg(3, 0, 4, 3, 0) #define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0) +#define SYS_ICC_NMIAR1_EL1 sys_reg(3, 0, 12, 9, 5) #define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0) #define SYS_AFSR1_EL1 sys_reg(3, 0, 5, 1, 1) @@ -229,6 +230,7 @@ #define HCRX_EL2_TALLINT BIT(6) #define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) #define ALLINT_ALLINT BIT(13) +#define ISR_EL1_IS BIT(10) /*** Statistical Profiling Extension ***/ /* ID registers */ diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 5b57493f02b2..356ee8fd368a 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -733,8 +733,38 @@ static void __gic_handle_irq_from_irqsoff(struct pt_regs *regs) __gic_handle_nmi(irqnr, regs); } +#ifdef CONFIG_ARM64 +static inline u64 gic_read_nmiar(void) +{ + u64 irqstat; + + irqstat = read_sysreg_s(SYS_ICC_NMIAR1_EL1); + + dsb(sy); + + return irqstat; +} + +static __always_inline void __el1_nmi(struct pt_regs *regs) +{ + u32 irqnr = gic_read_nmiar(); + + nmi_enter(); + __gic_handle_nmi(irqnr, regs); + nmi_exit(); +} +#endif + static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) { +#ifdef CONFIG_ARM64 + /* Is there a NMI to handle? */ + if (system_uses_nmi() && (read_sysreg(isr_el1) & ISR_EL1_IS)) { + __el1_nmi(regs); + return; + } +#endif + if (unlikely(gic_supports_nmi() && !interrupts_enabled(regs))) __gic_handle_irq_from_irqsoff(regs); else -- 2.25.1

From: Mark Brown <broonie@kernel.org> Since NMI handling is in some fairly hot paths we provide a Kconfig option which allows support to be compiled out when not needed. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/Kconfig | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2b74d81fce9e..fc074fd5f040 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1651,6 +1651,19 @@ endmenu menu "ARMv8.8 architectural features" +config ARM64_NMI + bool "Enable support for Non-maskable Interrupts (NMI)" + default y + help + Non-maskable interrupts are an architecture and GIC feature + which allow the system to configure some interrupts to be + configured to have superpriority, allowing them to be handled + before other interrupts and masked for shorter periods of time. + + The feature is detected at runtime, and will remain disabled + if the cpu does not implement the feature. It will also be + disabled if pseudo NMIs are enabled at runtime. + config ARM64_HAFT bool "Support for Hardware managed Access Flag for Table Descriptors" depends on ARM64_HW_AFDBM -- 2.25.1

From: Lorenzo Pieralisi <lpieralisi@kernel.org> The FEAT_GICv3_NMI GIC feature coupled with the CPU FEAT_NMI enables handling NMI interrupts in HW on aarch64, by adding a superpriority interrupt to the existing GIC priority scheme. Implement GIC driver support for the FEAT_GICv3_NMI feature. Rename gic_supports_nmi() helper function to gic_supports_pseudo_nmis() to make the pseudo NMIs code path clearer and more explicit. Check, through the ARM64 capabilitity infrastructure, if support for FEAT_NMI was detected on the core and the system has not overridden the detection and forced pseudo-NMIs enablement. If FEAT_NMI is detected, it was not overridden (check embedded in the system_uses_nmi() call) and the GIC supports the FEAT_GICv3_NMI feature, install an NMI handler and initialize NMIs related HW GIC registers. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- drivers/irqchip/irq-gic-v3.c | 91 ++++++++++++++++++++++++++---- include/linux/irqchip/arm-gic-v3.h | 4 ++ 2 files changed, 83 insertions(+), 12 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 356ee8fd368a..d259dd0f28fa 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -51,6 +51,7 @@ struct gic_chip_data { u32 nr_redist_regions; u64 flags; bool has_rss; + bool has_nmi; unsigned int ppi_nr; struct partition_desc **ppi_descs; }; @@ -101,6 +102,20 @@ static DEFINE_PER_CPU(bool, has_rss); /* Our default, arbitrary priority value. Linux only uses one anyway. */ #define DEFAULT_PMR_VALUE 0xf0 +#ifdef CONFIG_ARM64 +#include <asm/cpufeature.h> + +static inline bool has_v3_3_nmi(void) +{ + return gic_data.has_nmi && system_uses_nmi(); +} +#else +static inline bool has_v3_3_nmi(void) +{ + return false; +} +#endif + #ifdef CONFIG_VIRT_VTIMER_IRQ_BYPASS phys_addr_t get_gicr_paddr(int cpu) { @@ -285,6 +300,42 @@ static int gic_peek_irq(struct irq_data *d, u32 offset) return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask); } +static DEFINE_RAW_SPINLOCK(irq_controller_lock); + +static void gic_irq_configure_nmi(struct irq_data *d, bool enable) +{ + void __iomem *base, *addr; + u32 offset, index, mask, val; + + offset = convert_offset_index(d, GICD_INMIR, &index); + mask = 1 << (index % 32); + + if (gic_irq_in_rdist(d)) + base = gic_data_rdist_sgi_base(); + else + base = gic_data.dist_base; + + addr = base + offset + (index / 32) * 4; + + raw_spin_lock(&irq_controller_lock); + + val = readl_relaxed(addr); + val = enable ? (val | mask) : (val & ~mask); + writel_relaxed(val, addr); + + raw_spin_unlock(&irq_controller_lock); +} + +static void gic_irq_enable_nmi(struct irq_data *d) +{ + gic_irq_configure_nmi(d, true); +} + +static void gic_irq_disable_nmi(struct irq_data *d) +{ + gic_irq_configure_nmi(d, false); +} + static void gic_poke_irq(struct irq_data *d, u32 offset) { void (*rwp_wait)(void); @@ -331,7 +382,7 @@ static void gic_unmask_irq(struct irq_data *d) gic_poke_irq(d, GICD_ISENABLER); } -static inline bool gic_supports_nmi(void) +static inline bool gic_supports_pseudo_nmis(void) { return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && static_branch_likely(&supports_pseudo_nmis); @@ -418,7 +469,7 @@ static int gic_irq_nmi_setup(struct irq_data *d) { struct irq_desc *desc = irq_to_desc(d->irq); - if (!gic_supports_nmi()) + if (!gic_supports_pseudo_nmis() && !has_v3_3_nmi()) return -EINVAL; if (gic_peek_irq(d, GICD_ISENABLER)) { @@ -446,7 +497,10 @@ static int gic_irq_nmi_setup(struct irq_data *d) desc->handle_irq = handle_fasteoi_nmi; } - gic_irq_set_prio(d, GICD_INT_NMI_PRI); + if (has_v3_3_nmi()) + gic_irq_enable_nmi(d); + else + gic_irq_set_prio(d, GICD_INT_NMI_PRI); return 0; } @@ -455,7 +509,7 @@ static void gic_irq_nmi_teardown(struct irq_data *d) { struct irq_desc *desc = irq_to_desc(d->irq); - if (WARN_ON(!gic_supports_nmi())) + if (WARN_ON(!gic_supports_pseudo_nmis() && !has_v3_3_nmi())) return; if (gic_peek_irq(d, GICD_ISENABLER)) { @@ -481,7 +535,10 @@ static void gic_irq_nmi_teardown(struct irq_data *d) desc->handle_irq = handle_fasteoi_irq; } - gic_irq_set_prio(d, GICD_INT_DEF_PRI); + if (has_v3_3_nmi()) + gic_irq_disable_nmi(d); + else + gic_irq_set_prio(d, GICD_INT_DEF_PRI); } static void gic_eoi_irq(struct irq_data *d) @@ -602,7 +659,7 @@ static inline void gic_complete_ack(u32 irqnr) static bool gic_rpr_is_nmi_prio(void) { - if (!gic_supports_nmi()) + if (!gic_supports_pseudo_nmis()) return false; return unlikely(gic_read_rpr() == GICD_INT_NMI_PRI); @@ -765,7 +822,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs } #endif - if (unlikely(gic_supports_nmi() && !interrupts_enabled(regs))) + if (unlikely(gic_supports_pseudo_nmis() && !interrupts_enabled(regs))) __gic_handle_irq_from_irqsoff(regs); else __gic_handle_irq_from_irqson(regs); @@ -1049,7 +1106,7 @@ static void gic_cpu_sys_reg_init(void) * to die as interrupt masking will not work properly on all * CPUs */ - WARN_ON(gic_supports_nmi() && group0 && + WARN_ON(gic_supports_pseudo_nmis() && group0 && !gic_dist_security_disabled()); } @@ -1627,14 +1684,19 @@ static const struct gic_quirk gic_quirks[] = { } }; +static void gic_enable_pseudo_nmis(void) +{ + static_branch_enable(&supports_pseudo_nmis); +} + static void gic_enable_nmi_support(void) { int i; - if (!gic_prio_masking_enabled()) + if (!gic_prio_masking_enabled() && !has_v3_3_nmi()) return; - if (gic_has_group0() && !gic_dist_security_disabled()) { + if (!has_v3_3_nmi() && gic_has_group0() && !gic_dist_security_disabled()) { pr_warn("SCR_EL3.FIQ is cleared, cannot enable use of pseudo-NMIs\n"); return; } @@ -1646,8 +1708,12 @@ static void gic_enable_nmi_support(void) for (i = 0; i < gic_data.ppi_nr; i++) refcount_set(&ppi_nmi_refs[i], 0); - static_branch_enable(&supports_pseudo_nmis); - + /* + * Initialize pseudo-NMIs only if GIC driver cannot take advantage + * of core (FEAT_NMI) and GIC (FEAT_GICv3_NMI) in HW + */ + if (!has_v3_3_nmi()) + gic_enable_pseudo_nmis(); if (static_branch_likely(&supports_deactivate_key)) gic_eoimode1_chip.flags |= IRQCHIP_SUPPORTS_NMI; else @@ -1705,6 +1771,7 @@ static int __init gic_init_bases(void __iomem *dist_base, irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED); gic_data.has_rss = !!(typer & GICD_TYPER_RSS); + gic_data.has_nmi = !!(typer & GICD_TYPER_NMI); pr_info("Distributor has %sRange Selector support\n", gic_data.has_rss ? "" : "no "); diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index c56c484b196c..9bf8c0c8b5d5 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -31,6 +31,7 @@ #define GICD_ICFGR 0x0C00 #define GICD_IGRPMODR 0x0D00 #define GICD_NSACR 0x0E00 +#define GICD_INMIR 0x0F80 #define GICD_IGROUPRnE 0x1000 #define GICD_ISENABLERnE 0x1200 #define GICD_ICENABLERnE 0x1400 @@ -40,6 +41,7 @@ #define GICD_ICACTIVERnE 0x1C00 #define GICD_IPRIORITYRnE 0x2000 #define GICD_ICFGRnE 0x3000 +#define GICD_INMIRnE 0x3B00 #define GICD_IROUTER 0x6000 #define GICD_IROUTERnE 0x8000 #define GICD_IDREGS 0xFFD0 @@ -84,6 +86,7 @@ #define GICD_TYPER_LPIS (1U << 17) #define GICD_TYPER_MBIS (1U << 16) #define GICD_TYPER_ESPI (1U << 8) +#define GICD_TYPER_NMI (1U << 9) #define GICD_TYPER_ID_BITS(typer) ((((typer) >> 19) & 0x1f) + 1) #define GICD_TYPER_NUM_LPIS(typer) ((((typer) >> 11) & 0x1f) + 1) @@ -240,6 +243,7 @@ #define GICR_ICFGR0 GICD_ICFGR #define GICR_IGRPMODR0 GICD_IGRPMODR #define GICR_NSACR GICD_NSACR +#define GICR_INMIR0 GICD_INMIR #define GICR_TYPER_PLPIS (1U << 0) #define GICR_TYPER_VLPIS (1U << 1) -- 2.25.1

From: Jinjie Ruan <ruanjinjie@huawei.com> When handling an exception, both daif and allint will be set by hardware. In __gic_handle_irq_from_irqson(), it only consider the Pseudo-NMI by clear daif.I and daif.F and set PMR to GIC_PRIO_IRQOFF to enable Pseudo-NMI and mask IRQ. If the hardwire NMI is enabled, it should also clear allint to enable hardware NMI and mask IRQ before handle a IRQ, otherwise the allint will be set in softirq context and local_irq_enable() can not enable IRQ, and watchdog NMI can not enter too which will cause below hard LOCKUP. And in gic_handle_irq(), it only consider the Pseudo-NMI when an exception has been taken from a context with IRQs disabled. So add a gic_supports_nmi() helper which consider both Pseudo-NMI and hardware NMI. And define PSR_ALLINT_BIT bit and update interrupts_enabled() as well as fast_interrupts_enabled() to consider the ALLINT bit. watchdog: Watchdog detected hard LOCKUP on cpu 1 Modules linked in: Sending NMI from CPU 0 to CPUs 1: Kernel panic - not syncing: Hard LOCKUP CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-gec40ec8c5e9f #295 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x98/0xf8 show_stack+0x20/0x38 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x28 panic+0x384/0x3e0 nmi_panic+0x94/0xa0 watchdog_hardlockup_check+0x1bc/0x1c8 watchdog_buddy_check_hardlockup+0x68/0x80 watchdog_timer_fn+0x88/0x2f8 __hrtimer_run_queues+0x17c/0x368 hrtimer_run_queues+0xd4/0x158 update_process_times+0x3c/0xc0 tick_periodic+0x44/0xc8 tick_handle_periodic+0x3c/0xb0 arch_timer_handler_virt+0x3c/0x58 handle_percpu_devid_irq+0x90/0x248 generic_handle_domain_irq+0x34/0x58 gic_handle_irq+0x58/0x110 call_on_irq_stack+0x24/0x58 do_interrupt_handler+0x88/0x98 el1_interrupt+0x40/0xc0 el1h_64_irq_handler+0x24/0x30 el1h_64_irq+0x64/0x68 default_idle_call+0x5c/0x160 do_idle+0x220/0x288 cpu_startup_entry+0x40/0x50 rest_init+0xf0/0xf8 arch_call_rest_init+0x18/0x20 start_kernel+0x520/0x668 __primary_switched+0xbc/0xd0 Fixes: dd8b74f04223 ("arm64/nmi: Manage masking for superpriority interrupts along with DAIF") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/ptrace.h | 5 +++-- arch/arm64/include/uapi/asm/ptrace.h | 1 + drivers/irqchip/irq-gic-v3.c | 5 +++++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index 92b2575b0191..a0ccbbbcc4d3 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -219,10 +219,11 @@ static inline void forget_syscall(struct pt_regs *regs) true) #define interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs)) + (!((regs)->pstate & PSR_ALLINT_BIT) && !((regs)->pstate & PSR_I_BIT) && \ + irqs_priority_unmasked(regs)) #define fast_interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_F_BIT)) + (!((regs)->pstate & PSR_ALLINT_BIT) && !(regs)->pstate & PSR_F_BIT) static inline unsigned long user_stack_pointer(struct pt_regs *regs) { diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index d1bb5b69f1ce..3c02f551bbe6 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -47,6 +47,7 @@ #define PSR_A_BIT 0x00000100 #define PSR_D_BIT 0x00000200 #define PSR_SSBS_BIT 0x00001000 +#define PSR_ALLINT_BIT 0x00002000 #define PSR_PAN_BIT 0x00400000 #define PSR_UAO_BIT 0x00800000 #define PSR_DIT_BIT 0x01000000 diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index d259dd0f28fa..c4503951b6b7 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -103,6 +103,7 @@ static DEFINE_PER_CPU(bool, has_rss); #define DEFAULT_PMR_VALUE 0xf0 #ifdef CONFIG_ARM64 +#include <asm/daifflags.h> #include <asm/cpufeature.h> static inline bool has_v3_3_nmi(void) @@ -748,6 +749,10 @@ static void __gic_handle_irq_from_irqson(struct pt_regs *regs) if (gic_prio_masking_enabled()) { gic_pmr_mask_irqs(); gic_arch_enable_irqs(); + } else if (has_v3_3_nmi()) { +#ifdef CONFIG_ARM64_NMI + _allint_clear(); +#endif } if (!is_nmi) -- 2.25.1

From: Jinjie Ruan <ruanjinjie@huawei.com> A system stall occurrs when using pseudo NMI with CONFIG_ARM64_NMI closed. If the hardware supports FEAT_NMI, the ALLINT bit in pstate may set or clear on exception trap whether the software enables it or not, so it is not safe to use it to check interrupts_enabled() or fast_interrupts_enabled() when FEAT_NMI not enabled in kernel, so recover it. After applying this patch, the system stall not happen again on hardware with FEAT_NMI feature. Fixes: eefea6156921 ("irqchip/gic-v3: Fix hard LOCKUP caused by NMI being masked") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/ptrace.h | 5 ++--- arch/arm64/include/uapi/asm/ptrace.h | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index a0ccbbbcc4d3..e235b7e94f9d 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -219,11 +219,10 @@ static inline void forget_syscall(struct pt_regs *regs) true) #define interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_ALLINT_BIT) && !((regs)->pstate & PSR_I_BIT) && \ - irqs_priority_unmasked(regs)) + (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs)) #define fast_interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_ALLINT_BIT) && !(regs)->pstate & PSR_F_BIT) + (!(regs)->pstate & PSR_F_BIT) static inline unsigned long user_stack_pointer(struct pt_regs *regs) { diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 3c02f551bbe6..d1bb5b69f1ce 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -47,7 +47,6 @@ #define PSR_A_BIT 0x00000100 #define PSR_D_BIT 0x00000200 #define PSR_SSBS_BIT 0x00001000 -#define PSR_ALLINT_BIT 0x00002000 #define PSR_PAN_BIT 0x00400000 #define PSR_UAO_BIT 0x00800000 #define PSR_DIT_BIT 0x01000000 -- 2.25.1

From: Ionela Voinescu <ionela.voinescu@arm.com> [Upstream commit bbce8eaa603236bf958b0d24e6377b3f3b623991] Add weak function to return the hardware maximum frequency of a CPU, with the default implementation returning cpuinfo.max_freq, which is the best information we can generically get from the cpufreq framework. The default can be overwritten by a strong function in platforms that want to provide an alternative implementation, with more accurate information, obtained either from hardware or firmware. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- drivers/cpufreq/cpufreq.c | 20 ++++++++++++++++++++ include/linux/cpufreq.h | 5 +++++ 2 files changed, 25 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 57a820ad1206..2920ddc74f73 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1802,6 +1802,26 @@ unsigned int cpufreq_quick_get_max(unsigned int cpu) } EXPORT_SYMBOL(cpufreq_quick_get_max); +/** + * cpufreq_get_hw_max_freq - get the max hardware frequency of the CPU + * @cpu: CPU number + * + * The default return value is the max_freq field of cpuinfo. + */ +__weak unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) +{ + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + unsigned int ret_freq = 0; + + if (policy) { + ret_freq = policy->cpuinfo.max_freq; + cpufreq_cpu_put(policy); + } + + return ret_freq; +} +EXPORT_SYMBOL(cpufreq_get_hw_max_freq); + static unsigned int __cpufreq_get(struct cpufreq_policy *policy) { if (unlikely(policy_is_inactive(policy))) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index bf47a225b8f6..24bd8b3f65d8 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -211,6 +211,7 @@ extern struct kobject *cpufreq_global_kobject; unsigned int cpufreq_get(unsigned int cpu); unsigned int cpufreq_quick_get(unsigned int cpu); unsigned int cpufreq_quick_get_max(unsigned int cpu); +unsigned int cpufreq_get_hw_max_freq(unsigned int cpu); void disable_cpufreq(void); u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy); @@ -238,6 +239,10 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu) { return 0; } +static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) +{ + return 0; +} static inline void disable_cpufreq(void) { } #endif -- 2.25.1

From: Douglas Anderson <dianders@chromium.org> [Upstream commit b17aa959330e8058452297049a0056ba4b9c72e8] On arm64, NMI support needs to be detected at runtime. Add a weak function to the perf hardlockup detector so that an architecture can implement it to detect whether NMIs are available. Link: https://lkml.kernel.org/r/20230519101840.v5.15.Ic55cb6f90ef5967d8aaa2b503a4e... Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/nmi.h | 1 + kernel/watchdog_hld.c | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 020768634b29..463095059659 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -203,6 +203,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu) #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF u64 hw_nmi_get_sample_period(int watchdog_thresh); +bool arch_perf_nmi_is_available(void); #endif #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \ diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index ce26950a8140..2e2d63ee0c4e 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -299,12 +299,22 @@ void __init hardlockup_detector_perf_restart(void) } } +bool __weak __init arch_perf_nmi_is_available(void) +{ + return true; +} + /** * hardlockup_detector_perf_init - Probe whether NMI event is available at all */ int __init hardlockup_detector_perf_init(void) { - int ret = hardlockup_detector_event_create(); + int ret; + + if (!arch_perf_nmi_is_available()) + return -ENODEV; + + ret = hardlockup_detector_event_create(); if (ret) { pr_info("Perf NMI watchdog permanently disabled\n"); -- 2.25.1

From: Lecopzer Chen <lecopzer.chen@mediatek.com> [Upstream commit 930d8f8dbab97cb05dba30e67a2dfa0c6dbf4bc7] When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not ready yet. E.g. on arm64, PMU is not ready until device_initcall(armv8_pmu_driver_init). And it is deeply integrated with the driver model and cpuhp. Hence it is hard to push this initialization before smp_init(). But it is easy to take an opposite approach and try to initialize the watchdog once again later. The delayed probe is called using workqueues. It need to allocate memory and must be proceed in a normal context. The delayed probe is able to use if watchdog_hardlockup_probe() returns non-zero which means the return code returned when PMU is not ready yet. Provide an API - lockup_detector_retry_init() for anyone who needs to delayed init lockup detector if they had ever failed at lockup_detector_init(). The original assumption is: nobody should use delayed probe after lockup_detector_check() which has __init attribute. That is, anyone uses this API must call between lockup_detector_init() and lockup_detector_check(), and the caller must have __init attribute Link: https://lkml.kernel.org/r/20230519101840.v5.16.If4ad5dd5d09fb1309cebf8bcead4... Reviewed-by: Petr Mladek <pmladek@suse.com> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/nmi.h | 2 ++ kernel/watchdog.c | 67 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 463095059659..4c570e364d1a 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -13,6 +13,7 @@ #ifdef CONFIG_LOCKUP_DETECTOR void lockup_detector_init(void); +void lockup_detector_retry_init(void); void lockup_detector_soft_poweroff(void); void lockup_detector_cleanup(void); bool is_hardlockup(void); @@ -35,6 +36,7 @@ extern int sysctl_hardlockup_all_cpu_backtrace; #else /* CONFIG_LOCKUP_DETECTOR */ static inline void lockup_detector_init(void) { } +static inline void lockup_detector_retry_init(void) { } static inline void lockup_detector_soft_poweroff(void) { } static inline void lockup_detector_cleanup(void) { } #endif /* !CONFIG_LOCKUP_DETECTOR */ diff --git a/kernel/watchdog.c b/kernel/watchdog.c index bc48dfdcf257..e4a4b91a5b47 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -114,7 +114,13 @@ void __weak watchdog_nmi_disable(unsigned int cpu) hardlockup_detector_perf_disable(); } -/* Return 0, if a NMI watchdog is available. Error code otherwise */ +/* + * Watchdog-detector specific API. + * + * Return 0 when NMI watchdog is available, negative value otherwise. + * Note that the negative value means that a delayed probe might + * succeed later. + */ int __weak __init watchdog_nmi_probe(void) { return hardlockup_detector_perf_init(); @@ -796,6 +802,62 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write, } #endif /* CONFIG_SYSCTL */ +static void __init lockup_detector_delay_init(struct work_struct *work); +static bool allow_lockup_detector_init_retry __initdata; + +static struct work_struct detector_work __initdata = + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); + +static void __init lockup_detector_delay_init(struct work_struct *work) +{ + int ret; + + ret = watchdog_nmi_probe(); + if (ret) { + pr_info("Delayed init of the lockup detector failed: %d\n", ret); + pr_info("Hard watchdog permanently disabled\n"); + return; + } + + allow_lockup_detector_init_retry = false; + + nmi_watchdog_available = true; + lockup_detector_setup(); +} + +/* + * lockup_detector_retry_init - retry init lockup detector if possible. + * + * Retry hardlockup detector init. It is useful when it requires some + * functionality that has to be initialized later on a particular + * platform. + */ +void __init lockup_detector_retry_init(void) +{ + /* Must be called before late init calls */ + if (!allow_lockup_detector_init_retry) + return; + + schedule_work(&detector_work); +} + +/* + * Ensure that optional delayed hardlockup init is proceed before + * the init code and memory is freed. + */ +static int __init lockup_detector_check(void) +{ + /* Prevent any later retry. */ + allow_lockup_detector_init_retry = false; + + /* Make sure no work is pending. */ + flush_work(&detector_work); + + return 0; + +} +late_initcall_sync(lockup_detector_check); + void __init lockup_detector_init(void) { if (tick_nohz_full_enabled()) @@ -806,5 +868,8 @@ void __init lockup_detector_init(void) if (!watchdog_nmi_probe()) nmi_watchdog_available = true; + else + allow_lockup_detector_init_retry = true; + lockup_detector_setup(); } -- 2.25.1

From: Lecopzer Chen <lecopzer.chen@mediatek.com> [Upstream commit 94946f9eaac116f2943ec79ec3df1ec2fc92ae07] Set safe maximum CPU frequency to 5 GHz in case a particular platform doesn't implement cpufreq driver. Although, architecture doesn't put any restrictions on maximum frequency but 5 GHz seems to be safe maximum given the available Arm CPUs in the market which are clocked much less than 5 GHz. On the other hand, we can't make it much higher as it would lead to a large hard-lockup detection timeout on parts which are running slower (eg. 1GHz on Developerbox) and doesn't possess a cpufreq driver. Link: https://lkml.kernel.org/r/20230519101840.v5.17.Ia9d02578e89c3f44d3cb12eec8b0... Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/watchdog_hld.c | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) create mode 100644 arch/arm64/kernel/watchdog_hld.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index d0813553ce2e..6aa52c5db00d 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_CPU_PM) += sleep.o suspend.o obj-$(CONFIG_CPU_IDLE) += cpuidle.o diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c new file mode 100644 index 000000000000..2401eb1b7e55 --- /dev/null +++ b/arch/arm64/kernel/watchdog_hld.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/cpufreq.h> + +/* + * Safe maximum CPU frequency in case a particular platform doesn't implement + * cpufreq driver. Although, architecture doesn't put any restrictions on + * maximum frequency but 5 GHz seems to be safe maximum given the available + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other + * hand, we can't make it much higher as it would lead to a large hard-lockup + * detection timeout on parts which are running slower (eg. 1GHz on + * Developerbox) and doesn't possess a cpufreq driver. + */ +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz +u64 hw_nmi_get_sample_period(int watchdog_thresh) +{ + unsigned int cpu = smp_processor_id(); + unsigned long max_cpu_freq; + + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; + if (!max_cpu_freq) + max_cpu_freq = SAFE_MAX_CPU_FREQ; + + return (u64)max_cpu_freq * watchdog_thresh; +} -- 2.25.1

From: Douglas Anderson <dianders@chromium.org> [Upstream commit d7a0fe9ef6d6484fca4ba55c19091932337d4272] With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support. One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). To cope with that, override arch_perf_nmi_is_available() to let the watchdog framework know PMU not ready, and inform the framework to re-initialize lockup detection once PMU has been initialized. [dianders@chromium.org: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled] Link: https://lkml.kernel.org/r/20230523073952.1.I60217a63acc35621e13f10be16c0cd7c... Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b4... Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/Kconfig | 3 +++ arch/arm64/kernel/perf_event.c | 12 ++++++++++-- arch/arm64/kernel/watchdog_hld.c | 12 ++++++++++++ drivers/perf/arm_pmu.c | 5 +++++ include/linux/perf/arm_pmu.h | 2 ++ 5 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fc074fd5f040..93e47a234ba9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -159,12 +159,15 @@ config ARM64 select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_GCC_PLUGINS + select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \ + HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING select HAVE_MEMBLOCK_NODE_MAP if NUMA select HAVE_NMI select HAVE_PATA_PLATFORM select HAVE_PERF_EVENTS + select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 43281f725f28..37d4f9e6ae5c 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -20,6 +20,7 @@ #include <linux/perf/arm_pmu.h> #include <linux/platform_device.h> #include <linux/smp.h> +#include <linux/nmi.h> /* ARMv8 Cortex-A53 specific event types. */ #define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2 @@ -1221,10 +1222,17 @@ static struct platform_driver armv8_pmu_driver = { static int __init armv8_pmu_driver_init(void) { + int ret; + if (acpi_disabled) - return platform_driver_register(&armv8_pmu_driver); + ret = platform_driver_register(&armv8_pmu_driver); else - return arm_pmu_acpi_probe(armv8_pmuv3_init); + ret = arm_pmu_acpi_probe(armv8_pmuv3_init); + + if (!ret) + lockup_detector_retry_init(); + + return ret; } device_initcall(armv8_pmu_driver_init) diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c index 2401eb1b7e55..dcd25322127c 100644 --- a/arch/arm64/kernel/watchdog_hld.c +++ b/arch/arm64/kernel/watchdog_hld.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 +#include <linux/nmi.h> #include <linux/cpufreq.h> +#include <linux/perf/arm_pmu.h> /* * Safe maximum CPU frequency in case a particular platform doesn't implement @@ -22,3 +24,13 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh) return (u64)max_cpu_freq * watchdog_thresh; } + +bool __init arch_perf_nmi_is_available(void) +{ + /* + * hardlockup_detector_perf_init() will success even if Pseudo-NMI turns off, + * however, the pmu interrupts will act like a normal interrupt instead of + * NMI and the hardlockup detector would be broken. + */ + return arm_pmu_irq_is_nmi(); +} diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 7fd11ef5cb8a..4a9fc2268f19 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -724,6 +724,11 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu) return per_cpu(hw_events->irq, cpu); } +bool arm_pmu_irq_is_nmi(void) +{ + return has_nmi; +} + /* * PMU hardware loses all context when a CPU goes offline. * When a CPU is hotplugged back in, since some hardware registers are diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 71f525a35ac2..8281ac531c42 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -159,6 +159,8 @@ int arm_pmu_acpi_probe(armpmu_init_fn init_fn); static inline int arm_pmu_acpi_probe(armpmu_init_fn init_fn) { return 0; } #endif +bool arm_pmu_irq_is_nmi(void); + /* Internal functions only for core arm_pmu code */ struct arm_pmu *armpmu_alloc(void); struct arm_pmu *armpmu_alloc_atomic(void); -- 2.25.1

From: Xiongfeng Wang <wangxiongfeng2@huawei.com> When I enable CONFIG_DEBUG_PREEMPT and CONFIG_PREEMPT on X86, I got the following Call Trace: [ 3.341853] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 3.344392] caller is debug_smp_processor_id+0x17/0x20 [ 3.344395] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #398 [ 3.344397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 3.344399] Call Trace: [ 3.344410] dump_stack+0x60/0x76 [ 3.344412] check_preemption_disabled+0xba/0xc0 [ 3.344415] debug_smp_processor_id+0x17/0x20 [ 3.344422] hardlockup_detector_event_create+0xf/0x60 [ 3.344427] hardlockup_detector_perf_init+0xf/0x41 [ 3.344430] watchdog_nmi_probe+0xe/0x10 [ 3.344432] lockup_detector_init+0x22/0x5b [ 3.344437] kernel_init_freeable+0x20c/0x245 [ 3.344439] ? rest_init+0xd0/0xd0 [ 3.344441] kernel_init+0xe/0x110 [ 3.344446] ret_from_fork+0x22/0x30 It is because sched_init_smp() set 'current->nr_cpus_allowed' to possible cpu number, and check_preemption_disabled() failed. This issue is introduced by commit a79050434b45, which move down lockup_detector_init() after do_basic_setup(). Fix it by moving lockup_detector_init() to its origin place when sdei_watchdog is disabled. There is no problem when sdei_watchdog is enabled because watchdog_nmi_probe() is overridden in 'arch/arm64/kernel/watchdog_sdei.c' in this case. Fixes: a79050434b45 ("lockup_detector: init lockup detector after all the init_calls") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> --- arch/arm64/kernel/watchdog_sdei.c | 2 +- include/linux/nmi.h | 2 ++ init/main.c | 6 +++++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index ad68ed383fce..414e869c5483 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -21,7 +21,7 @@ #define SDEI_NMI_WATCHDOG_HWIRQ 29 static int sdei_watchdog_event_num; -static bool disable_sdei_nmi_watchdog = true; +bool disable_sdei_nmi_watchdog = true; static bool sdei_watchdog_registered; static DEFINE_PER_CPU(ktime_t, last_check_time); diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 4c570e364d1a..15827822a493 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -234,8 +234,10 @@ extern int proc_watchdog_cpumask(struct ctl_table *, int, #ifdef CONFIG_SDEI_WATCHDOG void sdei_watchdog_clear_eoi(void); +extern bool disable_sdei_nmi_watchdog; #else static inline void sdei_watchdog_clear_eoi(void) { } +#define disable_sdei_nmi_watchdog 1 #endif #endif diff --git a/init/main.c b/init/main.c index 22d7de2866d0..adb23a1b67e3 100644 --- a/init/main.c +++ b/init/main.c @@ -1196,6 +1196,8 @@ static noinline void __init kernel_init_freeable(void) init_mm_internals(); do_pre_smp_initcalls(); + if (disable_sdei_nmi_watchdog) + lockup_detector_init(); smp_init(); sched_init_smp(); @@ -1206,7 +1208,9 @@ static noinline void __init kernel_init_freeable(void) do_basic_setup(); - lockup_detector_init(); + /* sdei_watchdog needs to be initialized after sdei_init */ + if (!disable_sdei_nmi_watchdog) + lockup_detector_init(); /* Open the /dev/console on the rootfs, this should never fail */ if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0) -- 2.25.1

From: Yicong Yang <yangyicong@hisilicon.com> Currently we cannot use watchdog_{perf, buddy} if CONFIG_SDEI_WATCHDOG=y. Not all the platforms has watchdog_sdei so this patch tries to make watchdog_sdei coexist with other watchdogs. Only one watchdog will finally works. By default watchdog_sdei will be used. If boot with "disable_sdei_nmi_watchdog", other watchdogs will be used if probed. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/kernel/watchdog_sdei.c | 6 +++--- include/linux/nmi.h | 6 ++++++ kernel/watchdog.c | 16 ++++++++++++---- 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index 414e869c5483..50d545813cbc 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -25,7 +25,7 @@ bool disable_sdei_nmi_watchdog = true; static bool sdei_watchdog_registered; static DEFINE_PER_CPU(ktime_t, last_check_time); -int watchdog_nmi_enable(unsigned int cpu) +int sdei_watchdog_nmi_enable(unsigned int cpu) { int ret; @@ -49,7 +49,7 @@ int watchdog_nmi_enable(unsigned int cpu) return 0; } -void watchdog_nmi_disable(unsigned int cpu) +void sdei_watchdog_nmi_disable(unsigned int cpu) { int ret; @@ -111,7 +111,7 @@ void sdei_watchdog_clear_eoi(void) sdei_api_clear_eoi(SDEI_NMI_WATCHDOG_HWIRQ); } -int __init watchdog_nmi_probe(void) +int __init sdei_watchdog_nmi_probe(void) { int ret; diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 15827822a493..e0bcc0356898 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -233,10 +233,16 @@ extern int proc_watchdog_cpumask(struct ctl_table *, int, #endif #ifdef CONFIG_SDEI_WATCHDOG +int sdei_watchdog_nmi_enable(unsigned int cpu); +void sdei_watchdog_nmi_disable(unsigned int cpu); void sdei_watchdog_clear_eoi(void); +int sdei_watchdog_nmi_probe(void); extern bool disable_sdei_nmi_watchdog; #else +static inline int sdei_watchdog_nmi_enable(unsigned int cpu) { return -ENODEV; } +static inline void sdei_watchdog_nmi_disable(unsigned int cpu) { } static inline void sdei_watchdog_clear_eoi(void) { } +static inline int sdei_watchdog_nmi_probe(void) { return -ENODEV; } #define disable_sdei_nmi_watchdog 1 #endif diff --git a/kernel/watchdog.c b/kernel/watchdog.c index e4a4b91a5b47..e733c6f4b5c7 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -514,8 +514,12 @@ static void watchdog_enable(unsigned int cpu) /* Initialize timestamp */ __touch_watchdog(); /* Enable the perf event */ - if (watchdog_enabled & NMI_WATCHDOG_ENABLED) - watchdog_nmi_enable(cpu); + if (watchdog_enabled & NMI_WATCHDOG_ENABLED) { + if (disable_sdei_nmi_watchdog) + watchdog_nmi_enable(cpu); + else + sdei_watchdog_nmi_enable(cpu); + } } static void watchdog_disable(unsigned int cpu) @@ -529,7 +533,10 @@ static void watchdog_disable(unsigned int cpu) * between disabling the timer and disabling the perf event causes * the perf NMI to detect a false positive. */ - watchdog_nmi_disable(cpu); + if (disable_sdei_nmi_watchdog) + watchdog_nmi_disable(cpu); + else + sdei_watchdog_nmi_enable(cpu); hrtimer_cancel(hrtimer); wait_for_completion(this_cpu_ptr(&softlockup_completion)); } @@ -866,7 +873,8 @@ void __init lockup_detector_init(void) cpumask_copy(&watchdog_cpumask, housekeeping_cpumask(HK_FLAG_TIMER)); - if (!watchdog_nmi_probe()) + if ((!disable_sdei_nmi_watchdog && !sdei_watchdog_nmi_probe()) || + !watchdog_nmi_probe()) nmi_watchdog_available = true; else allow_lockup_detector_init_retry = true; -- 2.25.1

From: Yicong Yang <yangyicong@hisilicon.com> 1509d06c9c41 ("init: only move down lockup_detector_init() when sdei_watchdog is enabled") In the above commit, sdei_watchdog needs to move down lockup_detector_init (), while nmi_watchdog does not. So when sdei_watchdog fails to be initialized, nmi_watchdog should not be initialized. [ 0.706631][ T1] SDEI NMI watchdog: Disable SDEI NMI Watchdog in VM [ 0.707405][ T1] ------------[ cut here ]------------ [ 0.708020][ T1] WARNING: CPU: 0 PID: 1 at kernel/watchdog_perf.c:117 hardlockup_detector_event_create+0x24/0x108 [ 0.709230][ T1] Modules linked in: [ 0.709665][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0 #1 [ 0.710700][ T1] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 0.711625][ T1] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.712547][ T1] pc : hardlockup_detector_event_create+0x24/0x108 [ 0.713316][ T1] lr : watchdog_hardlockup_probe+0x28/0xa8 [ 0.714010][ T1] sp : ffff8000831cbdc0 [ 0.714501][ T1] pmr_save: 000000e0 [ 0.714957][ T1] x29: ffff8000831cbdc0 x28: 0000000000000000 x27: 0000000000000000 [ 0.715899][ T1] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 0.716839][ T1] x23: 0000000000000000 x22: 0000000000000000 x21: ffff80008218fab0 [ 0.717775][ T1] x20: ffff8000821af000 x19: ffff0000c0261900 x18: 0000000000000020 [ 0.718713][ T1] x17: 00000000cb551c45 x16: ffff800082625e48 x15: ffffffffffffffff [ 0.719663][ T1] x14: 0000000000000000 x13: 205d315420202020 x12: 5b5d313336363037 [ 0.720607][ T1] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffff800081b5f630 [ 0.721590][ T1] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 000000000005fff4 [ 0.722528][ T1] x5 : 00000000002bffa8 x4 : 0000000000000000 x3 : 0000000000000000 [ 0.723482][ T1] x2 : 0000000000000000 x1 : 0000000000000140 x0 : ffff0000c02c0000 [ 0.724426][ T1] Call trace: [ 0.724808][ T1] hardlockup_detector_event_create+0x24/0x108 [ 0.725535][ T1] watchdog_hardlockup_probe+0x28/0xa8 [ 0.726174][ T1] lockup_detector_init+0x110/0x158 [ 0.726776][ T1] kernel_init_freeable+0x208/0x288 [ 0.727387][ T1] kernel_init+0x2c/0x200 [ 0.727902][ T1] ret_from_fork+0x10/0x20 [ 0.728420][ T1] ---[ end trace 0000000000000000 ]--- Fixes: f61b11535a0b ("watchdog: Support watchdog_sdei coexist with existing watchdogs") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- kernel/watchdog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index e733c6f4b5c7..a1aeebe9c53a 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -874,7 +874,7 @@ void __init lockup_detector_init(void) housekeeping_cpumask(HK_FLAG_TIMER)); if ((!disable_sdei_nmi_watchdog && !sdei_watchdog_nmi_probe()) || - !watchdog_nmi_probe()) + (disable_sdei_nmi_watchdog && !watchdog_nmi_probe())) nmi_watchdog_available = true; else allow_lockup_detector_init_retry = true; -- 2.25.1

From: Yicong Yang <yangyicong@hisilicon.com> The introduce of FEAT_NMI/FEAT_GICv3_NMI will cause a race problem that we may handle the normal interrupt in interrupt disabled context due to the withdraw of NMI interrupt. The flow will be like below: [interrupt disabled] <- normal interrupt pending, for example timer interrupt <- NMI occurs, ISR_EL1.nmi = 1 do_el1_interrupt() <- NMI withdraw, ISR_EL1.nmi = 0 ISR_EL1.nmi = 0, not an NMI interrupt gic_handle_irq() __gic_handle_irq_from_irqson() irqnr = gic_read_iar() <- Oops, ack and handle an normal interrupt in interrupt disabled context! Fix this by checking the interrupt status in __gic_handle_irq_from_irqson() and ignore the interrupt if we're in interrupt disabled context. Fixes: 0408b5bc4300 ("irqchip/gic-v3: Implement FEAT_GICv3_NMI support") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> --- drivers/irqchip/irq-gic-v3.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index c4503951b6b7..e83b24e2fad8 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -736,6 +736,28 @@ static void __gic_handle_irq_from_irqson(struct pt_regs *regs) bool is_nmi; u32 irqnr; + /* + * We should enter here with interrupts disabled, otherwise we may met + * a race here with FEAT_NMI/FEAT_GICv3_NMI: + * + * [interrupt disabled] + * <- normal interrupt pending, for example timer interrupt + * <- NMI occurs, ISR_EL1.nmi = 1 + * do_el1_interrupt() + * <- NMI withdraw, ISR_EL1.nmi = 0 + * ISR_EL1.nmi = 0, not an NMI interrupt + * gic_handle_irq() + * __gic_handle_irq_from_irqson() + * irqnr = gic_read_iar() <- Oops, ack and handle an normal interrupt + * in interrupt disabled context! + * + * So if we met this case here, just return from the interrupt context. + * Since the interrupt is still pending, we can handle it once the + * interrupt re-enabled and it'll not be missing. + */ + if (!interrupts_enabled(regs)) + return; + irqnr = gic_read_iar(); is_nmi = gic_rpr_is_nmi_prio(); -- 2.25.1
participants (1)
-
Yang Yingliang