[PATCH 00/24] Support NMI Watchdog

Douglas Anderson (1): arm64: enable perf events based hard lockup detector Ionela Voinescu (1): cpufreq: add function to get the hardware max frequency Jingyi Wang (1): arm64: watchdog: add switch to select sdei_watchdog/pmu_watchdog Jinjie Ruan (2): irqchip/gic-v3: Fix hard LOCKUP caused by NMI being masked irqchip/gic-v3: Fix a system stall when using pseudo NMI with CONFIG_ARM64_NMI closed Lecopzer Chen (1): arm64: add hw_nmi_get_sample_period for preparation of lockup detector Lorenzo Pieralisi (1): irqchip/gic-v3: Implement FEAT_GICv3_NMI support Mark Brown (11): arm64/booting: Document boot requirements for FEAT_NMI arm64/sysreg: Add definitions for immediate versions of MSR ALLINT arm64/asm: Introduce assembly macros for managing ALLINT arm64/hyp-stub: Enable access to ALLINT arm64/cpufeature: Detect PE support for FEAT_NMI KVM: arm64: Hide FEAT_NMI from guests arm64/nmi: Manage masking for superpriority interrupts along with DAIF arm64/entry: Don't call preempt_schedule_irq() with NMIs masked arm64/irq: Document handling of FEAT_NMI in irqflags.h arm64/nmi: Add handling of superpriority interrupts as NMIs arm64/nmi: Add Kconfig for NMI Mark Rutland (3): irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling irqchip/gic-v3: Refactor ISB + EOIR at ack time irqchip/gic-v3: Fix priority mask handling Wei Li (1): arm64: add new config CONFIG_PMU_WATCHDOG Yang Yingliang (1): config: enable CONFIG_PMU_WATCHDOG Yicong Yang (1): irqchip/gic-v3: Fix one race condition due to NMI withdraw Documentation/arm64/booting.rst | 6 + arch/arm/include/asm/arch_gicv3.h | 7 +- arch/arm64/Kconfig | 14 ++ arch/arm64/configs/tencent.config | 1 + arch/arm64/include/asm/assembler.h | 34 +++ arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 8 + arch/arm64/include/asm/daifflags.h | 20 ++ arch/arm64/include/asm/irqflags.h | 10 + arch/arm64/include/asm/ptrace.h | 2 +- arch/arm64/include/asm/sysreg.h | 26 +++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/cpufeature.c | 64 +++++- arch/arm64/kernel/head.S | 13 ++ arch/arm64/kernel/process.c | 9 + arch/arm64/kernel/watchdog_hld.c | 24 +++ arch/arm64/kernel/watchdog_sdei.c | 23 +- arch/arm64/kvm/hyp/switch.c | 7 + arch/arm64/kvm/sys_regs.c | 2 + drivers/cpufreq/cpufreq.c | 20 ++ drivers/irqchip/irq-gic-v3.c | 314 +++++++++++++++++++++++----- include/linux/cpufreq.h | 5 + include/linux/irqchip/arm-gic-v3.h | 4 + include/linux/nmi.h | 11 + kernel/watchdog.c | 28 ++- lib/Kconfig.debug | 10 + 26 files changed, 594 insertions(+), 71 deletions(-) create mode 100644 arch/arm64/kernel/watchdog_hld.c -- 2.25.1

From: Mark Rutland <mark.rutland@arm.com> [Upstream commit adf14453d2c037ab529040c1186ea32e277e783a] There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. We identified and fixes this for regular IRQs in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Unfortunately, we forgot to do the same for psuedo-NMIs when support for those was added in commit: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Which means that when pseudo-NMIs are used for PMU support, we'll hit the same problem. Apply the same fix as for regular IRQs. Note that when EOI mode 1 is in use, the call to gic_write_eoir() will provide an ISB. Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-2-mark.rutland@arm.com --- drivers/irqchip/irq-gic-v3.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index f875e168c873..720cc39b0a2d 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -581,6 +581,9 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) if (static_branch_likely(&supports_deactivate_key)) gic_write_eoir(irqnr); + else + isb() + /* * Leave the PSR.I bit set to prevent other NMIs to be * received while handling this one. -- 2.25.1

From: Mark Rutland <mark.rutland@arm.com> [Upstream commit 6efb50923771f392122f5ce69dfc43b08f16e449] There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. To fix this, we place an ISB between a read of IAR and the subsequent invocation of an IRQ handler. When EOI mode 1 is in use, we need to EOI an interrupt prior to invoking its handler, and we have a write to EOIR for this. As this write to EOIR requires an ISB, and this is provided by the gic_write_eoir() helper, we omit the usual ISB in this case, with the logic being: | if (static_branch_likely(&supports_deactivate_key)) | gic_write_eoir(irqnr); | else | isb(); This is somewhat opaque, and it would be a little clearer if there were an unconditional ISB, with only the write to EOIR being conditional, e.g. | if (static_branch_likely(&supports_deactivate_key)) | write_gicreg(irqnr, ICC_EOIR1_EL1); | | isb(); This patch rewrites the code that way, with this logic factored into a new helper function with comments explaining what the ISB is for, as were originally laid out in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Note that since then, we removed the IAR polling in commit: 342677d70ab92142 ("irqchip/gic-v3: Remove acknowledge loop") ... which removed one of the two race conditions. For consistency, other portions of the driver are made to manipulate EOIR using write_gicreg() and explcit ISBs, and the gic_write_eoir() helper function is removed. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-3-mark.rutland@arm.com --- arch/arm/include/asm/arch_gicv3.h | 7 +---- drivers/irqchip/irq-gic-v3.c | 43 ++++++++++++++++++++++++------- 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/arch_gicv3.h b/arch/arm/include/asm/arch_gicv3.h index c815477b4303..29797945f299 100644 --- a/arch/arm/include/asm/arch_gicv3.h +++ b/arch/arm/include/asm/arch_gicv3.h @@ -128,6 +128,7 @@ static inline u64 read_ ## a64(void) \ return val; \ } +CPUIF_MAP(ICC_EOIR1, ICC_EOIR1_EL1) CPUIF_MAP(ICC_PMR, ICC_PMR_EL1) CPUIF_MAP(ICC_AP0R0, ICC_AP0R0_EL1) CPUIF_MAP(ICC_AP0R1, ICC_AP0R1_EL1) @@ -177,12 +178,6 @@ CPUIF_MAP_LO_HI(ICH_LR0, ICH_LRC0, ICH_LR0_EL2) /* Low-level accessors */ -static inline void gic_write_eoir(u32 irq) -{ - write_sysreg(irq, ICC_EOIR1); - isb(); -} - static inline void gic_write_dir(u32 val) { write_sysreg(val, ICC_DIR); diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 720cc39b0a2d..3f440dcb3523 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -486,7 +486,8 @@ static void gic_irq_nmi_teardown(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { - gic_write_eoir(gic_irq(d)); + write_gicreg(gic_irq(d), ICC_EOIR1_EL1); + isb(); } static void gic_eoimode1_eoi_irq(struct irq_data *d) @@ -567,10 +568,38 @@ static void gic_deactivate_unhandled(u32 irqnr) if (irqnr < 8192) gic_write_dir(irqnr); } else { - gic_write_eoir(irqnr); + write_gicreg(irqnr, ICC_EOIR1_EL1); + isb(); } } +/* + * Follow a read of the IAR with any HW maintenance that needs to happen prior + * to invoking the relevant IRQ handler. We must do two things: + * + * (1) Ensure instruction ordering between a read of IAR and subsequent + * instructions in the IRQ handler using an ISB. + * + * It is possible for the IAR to report an IRQ which was signalled *after* + * the CPU took an IRQ exception as multiple interrupts can race to be + * recognized by the GIC, earlier interrupts could be withdrawn, and/or + * later interrupts could be prioritized by the GIC. + * + * For devices which are tightly coupled to the CPU, such as PMUs, a + * context synchronization event is necessary to ensure that system + * register state is not stale, as these may have been indirectly written + * *after* exception entry. + * + * (2) Deactivate the interrupt when EOI mode 1 is in use. + */ +static inline void gic_complete_ack(u32 irqnr) +{ + if (static_branch_likely(&supports_deactivate_key)) + write_gicreg(irqnr, ICC_EOIR1_EL1); + + isb(); +} + static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) { bool irqs_enabled = interrupts_enabled(regs); @@ -579,10 +608,7 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) if (irqs_enabled) nmi_enter(); - if (static_branch_likely(&supports_deactivate_key)) - gic_write_eoir(irqnr); - else - isb() + gic_complete_ack(irqnr); /* * Leave the PSR.I bit set to prevent other NMIs to be @@ -623,10 +649,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs if (likely(irqnr > 15)) { int err; - if (static_branch_likely(&supports_deactivate_key)) - gic_write_eoir(irqnr); - else - isb(); + gic_complete_ack(irqnr); err = handle_domain_irq(gic_data.domain, irqnr, regs); if (err) { -- 2.25.1

From: Mark Rutland <mark.rutland@arm.com> [Upstream commit 614ab80c96474682157cabb14f8c8602b3422e90] When a kernel is built with CONFIG_ARM64_PSEUDO_NMI=y and pseudo-NMIs are enabled at runtime, GICv3's gic_handle_irq() can leave DAIF and ICC_PMR_EL1 in an unexpected state in some cases, breaking subsequent usage of local_irq_enable() and resulting in softirqs being run with IRQs erroneously masked (possibly resulting in deadlocks). This can happen when an IRQ exception is taken from a context where regular IRQs were unmasked, and either: (1) ICC_IAR1_EL1 indicates a special INTID (e.g. as a result of an IRQ being withdrawn since the IRQ exception was taken). (2) ICC_IAR1_EL1 and ICC_RPR_EL1 indicate an NMI was acknowledged. When an NMI is taken from a context where regular IRQs were masked, there is no problem. When CONFIG_ARM64_DEBUG_PRIORITY_MASKING=y, this can be detected with perf, e.g. | # ./perf record -a -g -e cycles:k ls -alR / > /dev/null 2>&1 | ------------[ cut here ]------------ | WARNING: CPU: 0 PID: 14 at arch/arm64/include/asm/irqflags.h:32 arch_local_irq_enable+0x4c/0x6c | Modules linked in: | CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 5.18.0-rc5-00004-g876c38e3d20b #12 | Hardware name: linux,dummy-virt (DT) | pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : arch_local_irq_enable+0x4c/0x6c | lr : __do_softirq+0x110/0x5d8 | sp : ffff8000080bbbc0 | pmr_save: 000000f0 | x29: ffff8000080bbbc0 x28: ffff316ac3a6ca40 x27: 0000000000000000 | x26: 0000000000000000 x25: ffffa04611c06008 x24: ffffa04611c06008 | x23: 0000000040400005 x22: 0000000000000200 x21: ffff8000080bbe20 | x20: ffffa0460fe10320 x19: 0000000000000009 x18: 0000000000000000 | x17: ffff91252dfa9000 x16: ffff800008004000 x15: 0000000000004000 | x14: 0000000000000028 x13: ffffa0460fe17578 x12: ffffa0460fed4294 | x11: ffffa0460fedc168 x10: ffffffffffffff80 x9 : ffffa0460fe10a70 | x8 : ffffa0460fedc168 x7 : 000000000000b762 x6 : 00000000057c3bdf | x5 : ffff8000080bbb18 x4 : 0000000000000000 x3 : 0000000000000001 | x2 : ffff91252dfa9000 x1 : 0000000000000060 x0 : 00000000000000f0 | Call trace: | arch_local_irq_enable+0x4c/0x6c | __irq_exit_rcu+0x180/0x1ac | irq_exit_rcu+0x1c/0x44 | el1_interrupt+0x4c/0xe4 | el1h_64_irq_handler+0x18/0x24 | el1h_64_irq+0x74/0x78 | smpboot_thread_fn+0x68/0x2c0 | kthread+0x124/0x130 | ret_from_fork+0x10/0x20 | irq event stamp: 193241 | hardirqs last enabled at (193240): [<ffffa0460fe10a9c>] __do_softirq+0x10c/0x5d8 | hardirqs last disabled at (193241): [<ffffa0461102ffe4>] el1_dbg+0x24/0x90 | softirqs last enabled at (193234): [<ffffa0460fe10e00>] __do_softirq+0x470/0x5d8 | softirqs last disabled at (193239): [<ffffa0460fea9944>] __irq_exit_rcu+0x180/0x1ac | ---[ end trace 0000000000000000 ]--- The necessary manipulation of DAIF and ICC_PMR_EL1 depends on the interrupted context, but the structure of gic_handle_irq() makes this also depend on whether the GIC reports an IRQ, NMI, or special INTID: * When the interrupted context had regular IRQs masked (and hence the interrupt must be an NMI), the entry code performs the NMI entry/exit and gic_handle_irq() should return with DAIF and ICC_PMR_EL1 unchanged. This is handled correctly today. * When the interrupted context had regular IRQs unmasked, the entry code performs IRQ entry/exit, but expects gic_handle_irq() to always update ICC_PMR_EL1 and DAIF.IF to unmask NMIs (but not regular IRQs) prior to returning (which it must do prior to invoking any regular IRQ handler). This unbalanced calling convention is necessary because we don't know whether an NMI has been taken until acknowledged by a read from ICC_IAR1_EL1, and so we need to perform the read with NMI masked in case an NMI has been taken (and needs to be handled with NMIs masked). Unfortunately, this is not handled consistently: - When ICC_IAR1_EL1 reports a special INTID, gic_handle_irq() returns immediately without manipulating ICC_PMR_EL1 and DAIF. - When RPR_EL1 indicates an NMI, gic_handle_irq() calls gic_handle_nmi() to invoke the NMI handler, then returns without manipulating ICC_PMR_EL1 and DAIF. - For regular IRQs, gic_handle_irq() manipulates ICC_PMR_EL1 and DAIF prior to invoking the IRQ handler. There were related problems with special INTID handling in the past, where if an exception was taken from a context with regular IRQs masked and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously unmask NMIs in NMI context permitted an unexpected nested NMI. That case specifically was fixed by commit: a97709f563a078e2 ("irqchip/gic-v3: Do not enable irqs when handling spurious interrups") ... but unfortunately that commit added an inverse problem, where if an exception was taken from a context with regular IRQs *unmasked* and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously fail to unmask NMIs (and consequently regular IRQs could not be unmasked during softirq processing). Before and after that commit, if an NMI was taken from a context with regular IRQs unmasked gic_handle_irq() would not unmask NMIs prior to returning, leading to the same problem with softirq handling. This patch fixes this by restructuring gic_handle_irq(), splitting it into separate irqson/irqsoff helper functions which consistently perform the DAIF + ICC_PMR1_EL1 manipulation based upon the interrupted context, regardless of the event indicated by ICC_IAR1_EL1. The special INTID handling is moved into the low-level IRQ/NMI handler invocation helper functions, so that early returns don't prevent the required manipulation of DAIF + ICC_PMR_EL1. Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-4-mark.rutland@arm.com --- drivers/irqchip/irq-gic-v3.c | 138 +++++++++++++++++++++++++---------- 1 file changed, 101 insertions(+), 37 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 3f440dcb3523..5b57493f02b2 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -600,50 +600,23 @@ static inline void gic_complete_ack(u32 irqnr) isb(); } -static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) +static bool gic_rpr_is_nmi_prio(void) { - bool irqs_enabled = interrupts_enabled(regs); - int err; - - if (irqs_enabled) - nmi_enter(); - - gic_complete_ack(irqnr); - - /* - * Leave the PSR.I bit set to prevent other NMIs to be - * received while handling this one. - * PSR.I will be restored when we ERET to the - * interrupted context. - */ - err = handle_domain_nmi(gic_data.domain, irqnr, regs); - if (err) - gic_deactivate_unhandled(irqnr); + if (!gic_supports_nmi()) + return false; - if (irqs_enabled) - nmi_exit(); + return unlikely(gic_read_rpr() == GICD_INT_NMI_PRI); } -static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) +static bool gic_irqnr_is_special(u32 irqnr) { - u32 irqnr; - - irqnr = gic_read_iar(); - - /* Check for special IDs first */ - if ((irqnr >= 1020 && irqnr <= 1023)) - return; + return irqnr >= 1020 && irqnr <= 1023; +} - if (gic_supports_nmi() && - unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) { - gic_handle_nmi(irqnr, regs); +static void __gic_handle_irq(u32 irqnr, struct pt_regs *regs) +{ + if (gic_irqnr_is_special(irqnr)) return; - } - - if (gic_prio_masking_enabled()) { - gic_pmr_mask_irqs(); - gic_arch_enable_irqs(); - } /* Treat anything but SGIs in a uniform way */ if (likely(irqnr > 15)) { @@ -675,6 +648,97 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs WARN_ONCE(true, "Unexpected SGI received!\n"); #endif } + +} + +static void __gic_handle_nmi(u32 irqnr, struct pt_regs *regs) +{ + if (gic_irqnr_is_special(irqnr)) + return; + + gic_complete_ack(irqnr); + + if (handle_domain_nmi(gic_data.domain, irqnr, regs)) + gic_deactivate_unhandled(irqnr); +} + +/* + * An exception has been taken from a context with IRQs enabled, and this could + * be an IRQ or an NMI. + * + * The entry code called us with DAIF.IF set to keep NMIs masked. We must clear + * DAIF.IF (and update ICC_PMR_EL1 to mask regular IRQs) prior to returning, + * after handling any NMI but before handling any IRQ. + * + * The entry code has performed IRQ entry, and if an NMI is detected we must + * perform NMI entry/exit around invoking the handler. + */ +static void __gic_handle_irq_from_irqson(struct pt_regs *regs) +{ + bool is_nmi; + u32 irqnr; + + irqnr = gic_read_iar(); + + is_nmi = gic_rpr_is_nmi_prio(); + + if (is_nmi) { + nmi_enter(); + __gic_handle_nmi(irqnr, regs); + nmi_exit(); + } + + if (gic_prio_masking_enabled()) { + gic_pmr_mask_irqs(); + gic_arch_enable_irqs(); + } + + if (!is_nmi) + __gic_handle_irq(irqnr, regs); +} + +/* + * An exception has been taken from a context with IRQs disabled, which can only + * be an NMI. + * + * The entry code called us with DAIF.IF set to keep NMIs masked. We must leave + * DAIF.IF (and ICC_PMR_EL1) unchanged. + * + * The entry code has performed NMI entry. + */ +static void __gic_handle_irq_from_irqsoff(struct pt_regs *regs) +{ + u64 pmr; + u32 irqnr; + + /* + * We were in a context with IRQs disabled. However, the + * entry code has set PMR to a value that allows any + * interrupt to be acknowledged, and not just NMIs. This can + * lead to surprising effects if the NMI has been retired in + * the meantime, and that there is an IRQ pending. The IRQ + * would then be taken in NMI context, something that nobody + * wants to debug twice. + * + * Until we sort this, drop PMR again to a level that will + * actually only allow NMIs before reading IAR, and then + * restore it to what it was. + */ + pmr = gic_read_pmr(); + gic_pmr_mask_irqs(); + isb(); + irqnr = gic_read_iar(); + gic_write_pmr(pmr); + + __gic_handle_nmi(irqnr, regs); +} + +static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) +{ + if (unlikely(gic_supports_nmi() && !interrupts_enabled(regs))) + __gic_handle_irq_from_irqsoff(regs); + else + __gic_handle_irq_from_irqson(regs); } static u32 gic_get_pribits(void) -- 2.25.1

From: Mark Brown <broonie@kernel.org> In order to use FEAT_NMI we must be able to use ALLINT, require that it behave as though not trapped when it is present. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- Documentation/arm64/booting.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst index d3f3a60fbf25..eca3e4d1d18d 100644 --- a/Documentation/arm64/booting.rst +++ b/Documentation/arm64/booting.rst @@ -245,6 +245,12 @@ Before jumping into the kernel, the following conditions must be met: - HCR_EL2.APK (bit 40) must be initialised to 0b1 - HCR_EL2.API (bit 41) must be initialised to 0b1 + For CPUs with Non-maskable Interrupts (FEAT_NMI): + + - If the kernel is entered at EL1 and EL2 is present: + + - HCRX_EL2.TALLINT must be initialised to 0b0. + The requirements described above for CPU mode, caches, MMUs, architected timers, coherency and system registers apply to all CPUs. All CPUs must enter the kernel in the same exception level. -- 2.25.1

From: Mark Brown <broonie@kernel.org> Encodings are provided for ALLINT which allow setting of ALLINT.ALLINT using an immediate rather than requiring that a register be loaded with the value to write. Since these don't currently fit within the scheme we have for sysreg generation add manual encodings like we currently do for other similar registers such as SVCR. Since it is required that these immediate versions be encoded with xzr as the source register provide asm wrapper which ensure this is the case. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/daifflags.h | 9 +++++++++ arch/arm64/include/asm/sysreg.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h index 48bfbf70dbb0..b6b4dea197ab 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -15,6 +15,15 @@ #define DAIF_ERRCTX (PSR_I_BIT | PSR_A_BIT) #define DAIF_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT) +static __always_inline void _allint_clear(void) +{ + asm volatile(__msr_s(SYS_ALLINT_CLR, "xzr")); +} + +static __always_inline void _allint_set(void) +{ + asm volatile(__msr_s(SYS_ALLINT_SET, "xzr")); +} /* mask/save/unmask/restore all exceptions, including interrupts. */ static inline void local_daif_mask(void) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index a769d9d11aaa..fb7e120b893a 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -105,6 +105,8 @@ #define SYS_DC_CSW sys_insn(1, 0, 7, 10, 2) #define SYS_DC_CISW sys_insn(1, 0, 7, 14, 2) +#define SYS_ALLINT_CLR sys_reg(0, 1, 4, 0, 0) +#define SYS_ALLINT_SET sys_reg(0, 1, 4, 1, 0) #define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0) #define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2) -- 2.25.1

From: Mark Brown <broonie@kernel.org> In order to allow assembly code to ensure that not even superpriorty interrupts can preempt it provide macros for enabling and disabling ALLINT.ALLINT. This is not integrated into the existing DAIF macros since we do not always wish to manage ALLINT along with DAIF and the use of DAIF in the naming of the existing macros might lead to surprises if ALLINT is also managed. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/assembler.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 4e0c07c60f84..192d61b68d7d 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -23,6 +23,22 @@ #include <asm/ptrace.h> #include <asm/thread_info.h> + .macro disable_allint +#ifdef CONFIG_ARM64_NMI +alternative_if ARM64_HAS_NMI + msr_s SYS_ALLINT_SET, xzr +alternative_else_nop_endif +#endif + .endm + + .macro enable_allint +#ifdef CONFIG_ARM64_NMI +alternative_if ARM64_HAS_NMI + msr_s SYS_ALLINT_CLR, xzr +alternative_else_nop_endif +#endif + .endm + .macro save_and_disable_daif, flags mrs \flags, daif msr daifset, #0xf -- 2.25.1

From: Mark Brown <broonie@kernel.org> In order to use NMIs we need to ensure that traps are disabled for it so update HCRX_EL2 to ensure that TALLINT is not set when we detect support for NMIs. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 5 +++++ arch/arm64/kernel/head.S | 13 +++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index fb7e120b893a..fecc829377a1 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -223,6 +223,8 @@ #define SYS_PAR_EL1_F BIT(0) #define SYS_PAR_EL1_FST GENMASK(6, 1) +#define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) + /*** Statistical Profiling Extension ***/ /* ID registers */ #define SYS_PMSIDR_EL1 sys_reg(3, 0, 9, 9, 7) @@ -419,6 +421,7 @@ #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7) #define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) +#define SYS_HCRX_EL2 sys_reg(3, 4, 1, 2, 2) #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) @@ -635,6 +638,7 @@ #define ID_AA64PFR0_EL0_32BIT_64BIT 0x2 /* id_aa64pfr1 */ +#define ID_AA64PFR1_NMI_SHIFT 36 #define ID_AA64PFR1_SSBS_SHIFT 4 #define ID_AA64PFR1_SSBS_PSTATE_NI 0 @@ -686,6 +690,7 @@ /* id_aa64mmfr1 */ #define ID_AA64MMFR1_ECBHB_SHIFT 60 #define ID_AA64MMFR1_TIDCP1_SHIFT 52 +#define ID_AA64MMFR1_HCX_SHIFT 40 #define ID_AA64MMFR1_PAN_SHIFT 20 #define ID_AA64MMFR1_LOR_SHIFT 16 #define ID_AA64MMFR1_HPD_SHIFT 12 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 2f784d3b4b39..4558f6d94dab 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -497,11 +497,24 @@ ENTRY(el2_setup) msr sctlr_el2, x0 #ifdef CONFIG_ARM64_VHE + mrs x2, id_aa64pfr1_el1 + ubfx x2, x2, #ID_AA64PFR1_NMI_SHIFT, #4 + cbz x2, .Lskip_nmi +.Linit_nmi: + mrs x2, id_aa64mmfr1_el1 + ubfx x2, x2, #ID_AA64MMFR1_HCX_SHIFT, #4 + cbz x2, .Lskip_nmi + + mrs_s x2, SYS_HCRX_EL2 + bic x2, x2, #HCRX_EL2_TALLINT_MASK // Don't trap ALLINT + msr_s SYS_HCRX_EL2, x2 + /* * Check for VHE being present. For the rest of the EL2 setup, * x2 being non-zero indicates that we do have VHE, and that the * kernel is intended to run at EL2. */ +.Lskip_nmi: mrs x2, id_aa64mmfr1_el1 ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4 #else -- 2.25.1

From: Mark Brown <broonie@kernel.org> Use of FEAT_NMI requires that all the PEs in the system and the GIC have NMI support. This patch implements the PE part of that detection. In order to avoid problematic interactions between real and pseudo NMIs we disable the architected feature if the user has enabled pseudo NMIs on the command line. If this is done on a system where support for the architected feature is detected then a warning is printed during boot in order to help users spot what is likely to be a misconfiguration. In order to allow KVM to offer the feature to guests even if pseudo NMIs are in use by the host we have a separate feature for the raw feature which is used in KVM. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 8 ++++ arch/arm64/include/asm/sysreg.h | 5 +++ arch/arm64/kernel/cpufeature.c | 64 ++++++++++++++++++++++++++++- 4 files changed, 78 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index 9a2d5148f9cc..2246c379a68d 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -67,6 +67,8 @@ #define ARM64_HAS_TIDCP1 57 #define ARM64_HAS_FGT 58 #define ARM64_HAFT 59 +#define ARM64_HAS_NMI 60 +#define ARM64_USES_NMI 61 #define ARM64_NCAPS 80 diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 528115f7e361..6e7ea1750010 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -284,6 +284,8 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0; * CPUs must match the state of the capability as detected by the boot CPU. */ #define ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE ARM64_CPUCAP_SCOPE_BOOT_CPU +#define ARM64_CPUCAP_BOOT_CPU_FEATURE \ + (ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU) struct arm64_cpu_capabilities { const char *desc; @@ -658,6 +660,12 @@ static __always_inline bool system_uses_irq_prio_masking(void) cpus_have_const_cap(ARM64_HAS_IRQ_PRIO_MASKING); } +static __always_inline bool system_uses_nmi(void) +{ + return IS_ENABLED(CONFIG_ARM64_NMI) && + cpus_have_const_cap(ARM64_USES_NMI); +} + static inline bool system_has_prio_mask_debugging(void) { return IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING) && diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index fecc829377a1..64eda6356005 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -529,6 +529,8 @@ /* SCTLR_EL1 specific flags. */ #define SCTLR_EL1_TIDCP (BIT(63)) +#define SCTLR_EL1_SPINTMASK (BIT(62)) +#define SCTLR_EL1_NMI (BIT(61)) #define SCTLR_EL1_UCI (BIT(26)) #define SCTLR_EL1_E0E (BIT(24)) #define SCTLR_EL1_SPAN (BIT(23)) @@ -645,6 +647,9 @@ #define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 #define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 +#define ID_AA64PFR1_NMI_IMP_DEF 0x1 +#define ID_AA64PFR1_NMI_IMP_NI 0x0 + /* id_aa64zfr0 */ #define ID_AA64ZFR0_SM4_SHIFT 40 #define ID_AA64ZFR0_SHA3_SHIFT 32 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 3ad5bc874e79..43fc63acb986 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -28,6 +28,7 @@ #include <asm/traps.h> #include <asm/vectors.h> #include <asm/virt.h> +#include <asm/daifflags.h> /* Kernel representation of AT_HWCAP and AT_HWCAP2 */ static unsigned long elf_hwcap __read_mostly; @@ -181,6 +182,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { }; static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_NMI_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI), ARM64_FTR_END, }; @@ -1275,9 +1277,11 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap) } #endif /* CONFIG_ARM64_PTR_AUTH */ -#ifdef CONFIG_ARM64_PSEUDO_NMI +#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) || IS_ENABLED(CONFIG_ARM64_NMI) static bool enable_pseudo_nmi; +#endif +#ifdef CONFIG_ARM64_PSEUDO_NMI static int __init early_enable_pseudo_nmi(char *p) { return strtobool(p, &enable_pseudo_nmi); @@ -1291,6 +1295,41 @@ static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry, } #endif +#ifdef CONFIG_ARM64_NMI +static bool use_nmi(const struct arm64_cpu_capabilities *entry, int scope) +{ + if (!has_cpuid_feature(entry, scope)) + return false; + + /* + * Having both real and pseudo NMIs enabled simultaneously is + * likely to cause confusion. Since pseudo NMIs must be + * enabled with an explicit command line option, if the user + * has set that option on a system with real NMIs for some + * reason assume they know what they're doing. + */ + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && enable_pseudo_nmi) { + pr_info("Pseudo NMI enabled, not using architected NMI\n"); + return false; + } + + return true; +} + +static void nmi_enable(const struct arm64_cpu_capabilities *__unused) +{ + /* + * Enable use of NMIs controlled by ALLINT, SPINTMASK should + * be clear by default but make it explicit that we are using + * this mode. Ensure that ALLINT is clear first in order to + * avoid leaving things masked. + */ + _allint_clear(); + sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPINTMASK, SCTLR_EL1_NMI); + isb(); +} +#endif + static void elf_hwcap_fixup(void) { #ifdef CONFIG_ARM64_ERRATUM_1742098 @@ -1672,6 +1711,29 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sign = FTR_UNSIGNED, .min_field_value = 1, }, +#ifdef CONFIG_ARM64_NMI + { + .desc = "Non-maskable Interrupts present", + .capability = ARM64_HAS_NMI, + .type = ARM64_CPUCAP_BOOT_CPU_FEATURE, + .sys_reg = SYS_ID_AA64PFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64PFR1_NMI_SHIFT, + .min_field_value = ID_AA64PFR1_NMI_IMP_DEF, + .matches = has_cpuid_feature, + }, + { + .desc = "Non-maskable Interrupts enabled", + .capability = ARM64_USES_NMI, + .type = ARM64_CPUCAP_BOOT_CPU_FEATURE, + .sys_reg = SYS_ID_AA64PFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64PFR1_NMI_SHIFT, + .min_field_value = ID_AA64PFR1_NMI_IMP_DEF, + .matches = use_nmi, + .cpu_enable = nmi_enable, + }, +#endif {}, }; -- 2.25.1

From: Mark Brown <broonie@kernel.org> FEAT_NMI is not yet useful to guests pending implementation of vGIC support. Mask out the feature from the ID register and prevent guests creating state in ALLINT.ALLINT by activating the trap on write provided in HCRX_EL2.TALLINT when they are running. There is no trap available for reads from ALLINT. We do not need to check for FEAT_HCRX since it is mandatory since v8.7 and FEAT_NMI is a v8.8 feature. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 9 +++++++++ arch/arm64/kvm/hyp/switch.c | 7 +++++++ arch/arm64/kvm/sys_regs.c | 2 ++ 3 files changed, 18 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 64eda6356005..3da501312196 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -223,6 +223,8 @@ #define SYS_PAR_EL1_F BIT(0) #define SYS_PAR_EL1_FST GENMASK(6, 1) +#define ID_AA64PFR1_NMI_MASK GENMASK(39, 36) +#define HCRX_EL2_TALLINT BIT(6) #define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) /*** Statistical Profiling Extension ***/ @@ -908,6 +910,13 @@ write_sysreg(__scs_new, sysreg); \ } while (0) +#define sysreg_clear_set_s(sysreg, clear, set) do { \ + u64 __scs_val = read_sysreg_s(sysreg); \ + u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set); \ + if (__scs_new != __scs_val) \ + write_sysreg_s(__scs_new, sysreg); \ +} while (0) + #endif #endif /* __ASM_SYSREG_H */ diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 768983bd2326..624e5c83a497 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -87,11 +87,18 @@ static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu) */ write_sysreg(0, pmselr_el0); write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0); + + if (cpus_have_final_cap(ARM64_HAS_NMI)) + sysreg_clear_set_s(SYS_HCRX_EL2, 0, HCRX_EL2_TALLINT); + write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); } static void __hyp_text __deactivate_traps_common(void) { + if (cpus_have_final_cap(ARM64_HAS_NMI)) + sysreg_clear_set_s(SYS_HCRX_EL2, HCRX_EL2_TALLINT, 0); + write_sysreg(0, hstr_el2); write_sysreg(0, pmuserenr_el0); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 605222ed2943..4fad598eb3dd 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1095,6 +1095,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, (0xfUL << ID_AA64ISAR1_API_SHIFT) | (0xfUL << ID_AA64ISAR1_GPA_SHIFT) | (0xfUL << ID_AA64ISAR1_GPI_SHIFT)); + } else if (id == SYS_ID_AA64PFR1_EL1) { + val &= ~ID_AA64PFR1_NMI_MASK; } return val; -- 2.25.1

From: Mark Brown <broonie@kernel.org> As we do for pseudo NMIs add code to our DAIF management which keeps superpriority interrupts unmasked when we have asynchronous exceptions enabled. Since superpriority interrupts are not masked through DAIF like pseduo NMIs are we also need to modify the assembler macros for managing DAIF to ensure that the masking is done in the assembly code. At present users of the assembly macros always mask pseudo NMIs. There is a difference to the actual handling between pseudo NMIs and superpriority interrupts in the assembly save_and_disable_irq and restore_irq macros, these cover both interrupts and FIQs using DAIF without regard for the use of pseudo NMIs so also mask those but are not updated here to mask superpriority interrupts. Given the names it is not clear that the behaviour with pseudo NMIs is particularly intentional, and in any case these macros are only used in the implementation of alternatives for software PAN while hardware PAN has been mandatory since v8.1 so it is not anticipated that practical systems with support for FEAT_NMI will ever execute the affected code. This should be a conservative set of masked regions, we may be able to relax this in future, but this should represent a good starting point. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/assembler.h | 18 ++++++++++++++++++ arch/arm64/include/asm/daifflags.h | 11 +++++++++++ 2 files changed, 29 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 192d61b68d7d..9c0c22540b1c 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -39,27 +39,45 @@ alternative_else_nop_endif #endif .endm + .macro restore_allint, flags +#ifdef CONFIG_ARM64_NMI +alternative_if ARM64_HAS_NMI + and \flags, \flags, #PSR_A_BIT + .if \flags == PSR_A_BIT + msr_s SYS_ALLINT_SET, xzr + .else + msr_s SYS_ALLINT_CLR, xzr + .endif +alternative_else_nop_endif +#endif + .endm + .macro save_and_disable_daif, flags + disable_allint mrs \flags, daif msr daifset, #0xf .endm .macro disable_daif + disable_allint msr daifset, #0xf .endm .macro enable_daif msr daifclr, #0xf + enable_allint .endm .macro restore_daif, flags:req msr daif, \flags + restore_allint \flags .endm /* Only on aarch64 pstate, PSR_D_BIT is different for aarch32 */ .macro inherit_daif, pstate:req, tmp:req and \tmp, \pstate, #(PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT) msr daif, \tmp + restore_allint \pstate .endm /* IRQ is the lowest priority flag, unconditionally unmask the rest. */ diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h index b6b4dea197ab..29f88942a184 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -42,6 +42,9 @@ static inline void local_daif_mask(void) if (system_uses_irq_prio_masking()) gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET); + if (system_uses_nmi()) + _allint_set(); + trace_hardirqs_off(); } @@ -123,6 +126,14 @@ static inline void local_daif_restore(unsigned long flags) write_sysreg(flags, daif); + /* If we can take asynchronous errors we can take NMIs */ + if (system_uses_nmi()) { + if (flags & PSR_A_BIT) + _allint_set(); + else + _allint_clear(); + } + if (irq_disabled) trace_hardirqs_off(); } -- 2.25.1

From: Mark Brown <broonie@kernel.org> As we do for pseudo NMIs don't call preempt_schedule_irq() when architechted NMIs are masked. If they are masked then we are calling from a preempting context so skip preemption. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/process.c | 9 +++++++++ 2 files changed, 12 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 3da501312196..521f2c6cce86 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -202,6 +202,8 @@ #define SYS_SPSR_EL1 sys_reg(3, 0, 4, 0, 0) #define SYS_ELR_EL1 sys_reg(3, 0, 4, 0, 1) +#define SYS_ALLINT sys_reg(3, 0, 4, 3, 0) + #define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0) #define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0) @@ -226,6 +228,7 @@ #define ID_AA64PFR1_NMI_MASK GENMASK(39, 36) #define HCRX_EL2_TALLINT BIT(6) #define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) +#define ALLINT_ALLINT BIT(13) /*** Statistical Profiling Extension ***/ /* ID registers */ diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 9bc43e35e87f..0b9193dff1e2 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -693,6 +693,15 @@ core_initcall(tagged_addr_init); asmlinkage void __sched arm64_preempt_schedule_irq(void) { + /* + * Architected NMIs are unmasked prior to handling regular + * IRQs and masked while handling FIQs. If ALLINT is set then + * we are in a NMI or other preempting context so skip + * preemption. + */ + if (system_uses_nmi() && (read_sysreg_s(SYS_ALLINT) & ALLINT_ALLINT)) + return; + lockdep_assert_irqs_disabled(); /* -- 2.25.1

From: Mark Brown <broonie@kernel.org> We have documentation at the top of irqflags.h which explains the DAIF masking. Since the additional masking with NMIs is related and also covers the IF in DAIF extend the comment to note what's going on with NMIs though none of the code in irqflags.h is updated to handle NMIs. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/irqflags.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index 1a59f0ed1ae3..2b2cdd7bb6fb 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -18,6 +18,16 @@ * side effects for other flags. Keeping to this order makes it easier for * entry.S to know which exceptions should be unmasked. * + * With the addition of the FEAT_NMI extension we gain an additional + * class of superpriority IRQ/FIQ which is separately masked with a + * choice of modes controlled by SCTLR_ELn.{SPINTMASK,NMI}. Linux + * sets SPINTMASK to 0 and NMI to 1 which results in ALLINT.ALLINT + * masking both superpriority interrupts and IRQ/FIQ regardless of the + * I and F settings. Since these superpriority interrupts are being + * used as NMIs we do not include them in the interrupt masking here, + * anything that requires that NMIs be masked needs to explicitly do + * so. + * * FIQ is never expected, but we mask it when we disable debug exceptions, and * unmask it at all other times. */ -- 2.25.1

From: Mark Brown <broonie@kernel.org> Our goal with superpriority interrupts is to use them as NMIs, taking advantage of the much smaller regions where they are masked to allow prompt handling of the most time critical interrupts. When an interrupt configured with superpriority we will enter EL1 as normal for any interrupt, the presence of a superpriority interrupt is indicated with a status bit in ISR_EL1. We use this to check for the presence of a superpriority interrupt before we unmask anything in elX_interrupt(), reporting without unmasking any interrupts. If no superpriority interrupt is present then we handle normal interrupts as normal, superpriority interrupts will be unmasked while doing so as a result of setting DAIF_PROCCTX. Both IRQs and FIQs may be configured with superpriority so we handle both, passing an additional root handler into the elX_interrupt() function along with the mask for the bit in ISR_EL1 which indicates the presence of the relevant kind of superpriority interrupt. These root handlers can be configured by the interrupt controller similarly to the root handlers for normal interrupts using the newly added set_handle_nmi_irq() and set_handle_nmi_fiq() functions. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/sysreg.h | 2 ++ drivers/irqchip/irq-gic-v3.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 521f2c6cce86..761b26417a5d 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -205,6 +205,7 @@ #define SYS_ALLINT sys_reg(3, 0, 4, 3, 0) #define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0) +#define SYS_ICC_NMIAR1_EL1 sys_reg(3, 0, 12, 9, 5) #define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0) #define SYS_AFSR1_EL1 sys_reg(3, 0, 5, 1, 1) @@ -229,6 +230,7 @@ #define HCRX_EL2_TALLINT BIT(6) #define HCRX_EL2_TALLINT_MASK GENMASK(6, 6) #define ALLINT_ALLINT BIT(13) +#define ISR_EL1_IS BIT(10) /*** Statistical Profiling Extension ***/ /* ID registers */ diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 5b57493f02b2..356ee8fd368a 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -733,8 +733,38 @@ static void __gic_handle_irq_from_irqsoff(struct pt_regs *regs) __gic_handle_nmi(irqnr, regs); } +#ifdef CONFIG_ARM64 +static inline u64 gic_read_nmiar(void) +{ + u64 irqstat; + + irqstat = read_sysreg_s(SYS_ICC_NMIAR1_EL1); + + dsb(sy); + + return irqstat; +} + +static __always_inline void __el1_nmi(struct pt_regs *regs) +{ + u32 irqnr = gic_read_nmiar(); + + nmi_enter(); + __gic_handle_nmi(irqnr, regs); + nmi_exit(); +} +#endif + static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) { +#ifdef CONFIG_ARM64 + /* Is there a NMI to handle? */ + if (system_uses_nmi() && (read_sysreg(isr_el1) & ISR_EL1_IS)) { + __el1_nmi(regs); + return; + } +#endif + if (unlikely(gic_supports_nmi() && !interrupts_enabled(regs))) __gic_handle_irq_from_irqsoff(regs); else -- 2.25.1

From: Mark Brown <broonie@kernel.org> Since NMI handling is in some fairly hot paths we provide a Kconfig option which allows support to be compiled out when not needed. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/Kconfig | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2b74d81fce9e..fc074fd5f040 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1651,6 +1651,19 @@ endmenu menu "ARMv8.8 architectural features" +config ARM64_NMI + bool "Enable support for Non-maskable Interrupts (NMI)" + default y + help + Non-maskable interrupts are an architecture and GIC feature + which allow the system to configure some interrupts to be + configured to have superpriority, allowing them to be handled + before other interrupts and masked for shorter periods of time. + + The feature is detected at runtime, and will remain disabled + if the cpu does not implement the feature. It will also be + disabled if pseudo NMIs are enabled at runtime. + config ARM64_HAFT bool "Support for Hardware managed Access Flag for Table Descriptors" depends on ARM64_HW_AFDBM -- 2.25.1

From: Lorenzo Pieralisi <lpieralisi@kernel.org> The FEAT_GICv3_NMI GIC feature coupled with the CPU FEAT_NMI enables handling NMI interrupts in HW on aarch64, by adding a superpriority interrupt to the existing GIC priority scheme. Implement GIC driver support for the FEAT_GICv3_NMI feature. Rename gic_supports_nmi() helper function to gic_supports_pseudo_nmis() to make the pseudo NMIs code path clearer and more explicit. Check, through the ARM64 capabilitity infrastructure, if support for FEAT_NMI was detected on the core and the system has not overridden the detection and forced pseudo-NMIs enablement. If FEAT_NMI is detected, it was not overridden (check embedded in the system_uses_nmi() call) and the GIC supports the FEAT_GICv3_NMI feature, install an NMI handler and initialize NMIs related HW GIC registers. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- drivers/irqchip/irq-gic-v3.c | 91 ++++++++++++++++++++++++++---- include/linux/irqchip/arm-gic-v3.h | 4 ++ 2 files changed, 83 insertions(+), 12 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 356ee8fd368a..d259dd0f28fa 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -51,6 +51,7 @@ struct gic_chip_data { u32 nr_redist_regions; u64 flags; bool has_rss; + bool has_nmi; unsigned int ppi_nr; struct partition_desc **ppi_descs; }; @@ -101,6 +102,20 @@ static DEFINE_PER_CPU(bool, has_rss); /* Our default, arbitrary priority value. Linux only uses one anyway. */ #define DEFAULT_PMR_VALUE 0xf0 +#ifdef CONFIG_ARM64 +#include <asm/cpufeature.h> + +static inline bool has_v3_3_nmi(void) +{ + return gic_data.has_nmi && system_uses_nmi(); +} +#else +static inline bool has_v3_3_nmi(void) +{ + return false; +} +#endif + #ifdef CONFIG_VIRT_VTIMER_IRQ_BYPASS phys_addr_t get_gicr_paddr(int cpu) { @@ -285,6 +300,42 @@ static int gic_peek_irq(struct irq_data *d, u32 offset) return !!(readl_relaxed(base + offset + (index / 32) * 4) & mask); } +static DEFINE_RAW_SPINLOCK(irq_controller_lock); + +static void gic_irq_configure_nmi(struct irq_data *d, bool enable) +{ + void __iomem *base, *addr; + u32 offset, index, mask, val; + + offset = convert_offset_index(d, GICD_INMIR, &index); + mask = 1 << (index % 32); + + if (gic_irq_in_rdist(d)) + base = gic_data_rdist_sgi_base(); + else + base = gic_data.dist_base; + + addr = base + offset + (index / 32) * 4; + + raw_spin_lock(&irq_controller_lock); + + val = readl_relaxed(addr); + val = enable ? (val | mask) : (val & ~mask); + writel_relaxed(val, addr); + + raw_spin_unlock(&irq_controller_lock); +} + +static void gic_irq_enable_nmi(struct irq_data *d) +{ + gic_irq_configure_nmi(d, true); +} + +static void gic_irq_disable_nmi(struct irq_data *d) +{ + gic_irq_configure_nmi(d, false); +} + static void gic_poke_irq(struct irq_data *d, u32 offset) { void (*rwp_wait)(void); @@ -331,7 +382,7 @@ static void gic_unmask_irq(struct irq_data *d) gic_poke_irq(d, GICD_ISENABLER); } -static inline bool gic_supports_nmi(void) +static inline bool gic_supports_pseudo_nmis(void) { return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && static_branch_likely(&supports_pseudo_nmis); @@ -418,7 +469,7 @@ static int gic_irq_nmi_setup(struct irq_data *d) { struct irq_desc *desc = irq_to_desc(d->irq); - if (!gic_supports_nmi()) + if (!gic_supports_pseudo_nmis() && !has_v3_3_nmi()) return -EINVAL; if (gic_peek_irq(d, GICD_ISENABLER)) { @@ -446,7 +497,10 @@ static int gic_irq_nmi_setup(struct irq_data *d) desc->handle_irq = handle_fasteoi_nmi; } - gic_irq_set_prio(d, GICD_INT_NMI_PRI); + if (has_v3_3_nmi()) + gic_irq_enable_nmi(d); + else + gic_irq_set_prio(d, GICD_INT_NMI_PRI); return 0; } @@ -455,7 +509,7 @@ static void gic_irq_nmi_teardown(struct irq_data *d) { struct irq_desc *desc = irq_to_desc(d->irq); - if (WARN_ON(!gic_supports_nmi())) + if (WARN_ON(!gic_supports_pseudo_nmis() && !has_v3_3_nmi())) return; if (gic_peek_irq(d, GICD_ISENABLER)) { @@ -481,7 +535,10 @@ static void gic_irq_nmi_teardown(struct irq_data *d) desc->handle_irq = handle_fasteoi_irq; } - gic_irq_set_prio(d, GICD_INT_DEF_PRI); + if (has_v3_3_nmi()) + gic_irq_disable_nmi(d); + else + gic_irq_set_prio(d, GICD_INT_DEF_PRI); } static void gic_eoi_irq(struct irq_data *d) @@ -602,7 +659,7 @@ static inline void gic_complete_ack(u32 irqnr) static bool gic_rpr_is_nmi_prio(void) { - if (!gic_supports_nmi()) + if (!gic_supports_pseudo_nmis()) return false; return unlikely(gic_read_rpr() == GICD_INT_NMI_PRI); @@ -765,7 +822,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs } #endif - if (unlikely(gic_supports_nmi() && !interrupts_enabled(regs))) + if (unlikely(gic_supports_pseudo_nmis() && !interrupts_enabled(regs))) __gic_handle_irq_from_irqsoff(regs); else __gic_handle_irq_from_irqson(regs); @@ -1049,7 +1106,7 @@ static void gic_cpu_sys_reg_init(void) * to die as interrupt masking will not work properly on all * CPUs */ - WARN_ON(gic_supports_nmi() && group0 && + WARN_ON(gic_supports_pseudo_nmis() && group0 && !gic_dist_security_disabled()); } @@ -1627,14 +1684,19 @@ static const struct gic_quirk gic_quirks[] = { } }; +static void gic_enable_pseudo_nmis(void) +{ + static_branch_enable(&supports_pseudo_nmis); +} + static void gic_enable_nmi_support(void) { int i; - if (!gic_prio_masking_enabled()) + if (!gic_prio_masking_enabled() && !has_v3_3_nmi()) return; - if (gic_has_group0() && !gic_dist_security_disabled()) { + if (!has_v3_3_nmi() && gic_has_group0() && !gic_dist_security_disabled()) { pr_warn("SCR_EL3.FIQ is cleared, cannot enable use of pseudo-NMIs\n"); return; } @@ -1646,8 +1708,12 @@ static void gic_enable_nmi_support(void) for (i = 0; i < gic_data.ppi_nr; i++) refcount_set(&ppi_nmi_refs[i], 0); - static_branch_enable(&supports_pseudo_nmis); - + /* + * Initialize pseudo-NMIs only if GIC driver cannot take advantage + * of core (FEAT_NMI) and GIC (FEAT_GICv3_NMI) in HW + */ + if (!has_v3_3_nmi()) + gic_enable_pseudo_nmis(); if (static_branch_likely(&supports_deactivate_key)) gic_eoimode1_chip.flags |= IRQCHIP_SUPPORTS_NMI; else @@ -1705,6 +1771,7 @@ static int __init gic_init_bases(void __iomem *dist_base, irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED); gic_data.has_rss = !!(typer & GICD_TYPER_RSS); + gic_data.has_nmi = !!(typer & GICD_TYPER_NMI); pr_info("Distributor has %sRange Selector support\n", gic_data.has_rss ? "" : "no "); diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index c56c484b196c..9bf8c0c8b5d5 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -31,6 +31,7 @@ #define GICD_ICFGR 0x0C00 #define GICD_IGRPMODR 0x0D00 #define GICD_NSACR 0x0E00 +#define GICD_INMIR 0x0F80 #define GICD_IGROUPRnE 0x1000 #define GICD_ISENABLERnE 0x1200 #define GICD_ICENABLERnE 0x1400 @@ -40,6 +41,7 @@ #define GICD_ICACTIVERnE 0x1C00 #define GICD_IPRIORITYRnE 0x2000 #define GICD_ICFGRnE 0x3000 +#define GICD_INMIRnE 0x3B00 #define GICD_IROUTER 0x6000 #define GICD_IROUTERnE 0x8000 #define GICD_IDREGS 0xFFD0 @@ -84,6 +86,7 @@ #define GICD_TYPER_LPIS (1U << 17) #define GICD_TYPER_MBIS (1U << 16) #define GICD_TYPER_ESPI (1U << 8) +#define GICD_TYPER_NMI (1U << 9) #define GICD_TYPER_ID_BITS(typer) ((((typer) >> 19) & 0x1f) + 1) #define GICD_TYPER_NUM_LPIS(typer) ((((typer) >> 11) & 0x1f) + 1) @@ -240,6 +243,7 @@ #define GICR_ICFGR0 GICD_ICFGR #define GICR_IGRPMODR0 GICD_IGRPMODR #define GICR_NSACR GICD_NSACR +#define GICR_INMIR0 GICD_INMIR #define GICR_TYPER_PLPIS (1U << 0) #define GICR_TYPER_VLPIS (1U << 1) -- 2.25.1

From: Jinjie Ruan <ruanjinjie@huawei.com> When handling an exception, both daif and allint will be set by hardware. In __gic_handle_irq_from_irqson(), it only consider the Pseudo-NMI by clear daif.I and daif.F and set PMR to GIC_PRIO_IRQOFF to enable Pseudo-NMI and mask IRQ. If the hardwire NMI is enabled, it should also clear allint to enable hardware NMI and mask IRQ before handle a IRQ, otherwise the allint will be set in softirq context and local_irq_enable() can not enable IRQ, and watchdog NMI can not enter too which will cause below hard LOCKUP. And in gic_handle_irq(), it only consider the Pseudo-NMI when an exception has been taken from a context with IRQs disabled. So add a gic_supports_nmi() helper which consider both Pseudo-NMI and hardware NMI. And define PSR_ALLINT_BIT bit and update interrupts_enabled() as well as fast_interrupts_enabled() to consider the ALLINT bit. watchdog: Watchdog detected hard LOCKUP on cpu 1 Modules linked in: Sending NMI from CPU 0 to CPUs 1: Kernel panic - not syncing: Hard LOCKUP CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-gec40ec8c5e9f #295 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x98/0xf8 show_stack+0x20/0x38 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x28 panic+0x384/0x3e0 nmi_panic+0x94/0xa0 watchdog_hardlockup_check+0x1bc/0x1c8 watchdog_buddy_check_hardlockup+0x68/0x80 watchdog_timer_fn+0x88/0x2f8 __hrtimer_run_queues+0x17c/0x368 hrtimer_run_queues+0xd4/0x158 update_process_times+0x3c/0xc0 tick_periodic+0x44/0xc8 tick_handle_periodic+0x3c/0xb0 arch_timer_handler_virt+0x3c/0x58 handle_percpu_devid_irq+0x90/0x248 generic_handle_domain_irq+0x34/0x58 gic_handle_irq+0x58/0x110 call_on_irq_stack+0x24/0x58 do_interrupt_handler+0x88/0x98 el1_interrupt+0x40/0xc0 el1h_64_irq_handler+0x24/0x30 el1h_64_irq+0x64/0x68 default_idle_call+0x5c/0x160 do_idle+0x220/0x288 cpu_startup_entry+0x40/0x50 rest_init+0xf0/0xf8 arch_call_rest_init+0x18/0x20 start_kernel+0x520/0x668 __primary_switched+0xbc/0xd0 Fixes: dd8b74f04223 ("arm64/nmi: Manage masking for superpriority interrupts along with DAIF") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Jie Liu <liujie375@h-partners.com> --- arch/arm64/include/asm/ptrace.h | 5 +++-- arch/arm64/include/uapi/asm/ptrace.h | 1 + drivers/irqchip/irq-gic-v3.c | 5 +++++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index 92b2575b0191..a0ccbbbcc4d3 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -219,10 +219,11 @@ static inline void forget_syscall(struct pt_regs *regs) true) #define interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs)) + (!((regs)->pstate & PSR_ALLINT_BIT) && !((regs)->pstate & PSR_I_BIT) && \ + irqs_priority_unmasked(regs)) #define fast_interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_F_BIT)) + (!((regs)->pstate & PSR_ALLINT_BIT) && !(regs)->pstate & PSR_F_BIT) static inline unsigned long user_stack_pointer(struct pt_regs *regs) { diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index d1bb5b69f1ce..3c02f551bbe6 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -47,6 +47,7 @@ #define PSR_A_BIT 0x00000100 #define PSR_D_BIT 0x00000200 #define PSR_SSBS_BIT 0x00001000 +#define PSR_ALLINT_BIT 0x00002000 #define PSR_PAN_BIT 0x00400000 #define PSR_UAO_BIT 0x00800000 #define PSR_DIT_BIT 0x01000000 diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index d259dd0f28fa..c4503951b6b7 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -103,6 +103,7 @@ static DEFINE_PER_CPU(bool, has_rss); #define DEFAULT_PMR_VALUE 0xf0 #ifdef CONFIG_ARM64 +#include <asm/daifflags.h> #include <asm/cpufeature.h> static inline bool has_v3_3_nmi(void) @@ -748,6 +749,10 @@ static void __gic_handle_irq_from_irqson(struct pt_regs *regs) if (gic_prio_masking_enabled()) { gic_pmr_mask_irqs(); gic_arch_enable_irqs(); + } else if (has_v3_3_nmi()) { +#ifdef CONFIG_ARM64_NMI + _allint_clear(); +#endif } if (!is_nmi) -- 2.25.1

From: Jinjie Ruan <ruanjinjie@huawei.com> A system stall occurrs when using pseudo NMI with CONFIG_ARM64_NMI closed. If the hardware supports FEAT_NMI, the ALLINT bit in pstate may set or clear on exception trap whether the software enables it or not, so it is not safe to use it to check interrupts_enabled() or fast_interrupts_enabled() when FEAT_NMI not enabled in kernel, so recover it. After applying this patch, the system stall not happen again on hardware with FEAT_NMI feature. Fixes: eefea6156921 ("irqchip/gic-v3: Fix hard LOCKUP caused by NMI being masked") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/ptrace.h | 5 ++--- arch/arm64/include/uapi/asm/ptrace.h | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index a0ccbbbcc4d3..e235b7e94f9d 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -219,11 +219,10 @@ static inline void forget_syscall(struct pt_regs *regs) true) #define interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_ALLINT_BIT) && !((regs)->pstate & PSR_I_BIT) && \ - irqs_priority_unmasked(regs)) + (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs)) #define fast_interrupts_enabled(regs) \ - (!((regs)->pstate & PSR_ALLINT_BIT) && !(regs)->pstate & PSR_F_BIT) + (!(regs)->pstate & PSR_F_BIT) static inline unsigned long user_stack_pointer(struct pt_regs *regs) { diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 3c02f551bbe6..d1bb5b69f1ce 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -47,7 +47,6 @@ #define PSR_A_BIT 0x00000100 #define PSR_D_BIT 0x00000200 #define PSR_SSBS_BIT 0x00001000 -#define PSR_ALLINT_BIT 0x00002000 #define PSR_PAN_BIT 0x00400000 #define PSR_UAO_BIT 0x00800000 #define PSR_DIT_BIT 0x01000000 -- 2.25.1

From: Ionela Voinescu <ionela.voinescu@arm.com> [Upstream commit bbce8eaa603236bf958b0d24e6377b3f3b623991] Add weak function to return the hardware maximum frequency of a CPU, with the default implementation returning cpuinfo.max_freq, which is the best information we can generically get from the cpufreq framework. The default can be overwritten by a strong function in platforms that want to provide an alternative implementation, with more accurate information, obtained either from hardware or firmware. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- drivers/cpufreq/cpufreq.c | 20 ++++++++++++++++++++ include/linux/cpufreq.h | 5 +++++ 2 files changed, 25 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 57a820ad1206..2920ddc74f73 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1802,6 +1802,26 @@ unsigned int cpufreq_quick_get_max(unsigned int cpu) } EXPORT_SYMBOL(cpufreq_quick_get_max); +/** + * cpufreq_get_hw_max_freq - get the max hardware frequency of the CPU + * @cpu: CPU number + * + * The default return value is the max_freq field of cpuinfo. + */ +__weak unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) +{ + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + unsigned int ret_freq = 0; + + if (policy) { + ret_freq = policy->cpuinfo.max_freq; + cpufreq_cpu_put(policy); + } + + return ret_freq; +} +EXPORT_SYMBOL(cpufreq_get_hw_max_freq); + static unsigned int __cpufreq_get(struct cpufreq_policy *policy) { if (unlikely(policy_is_inactive(policy))) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index bf47a225b8f6..24bd8b3f65d8 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -211,6 +211,7 @@ extern struct kobject *cpufreq_global_kobject; unsigned int cpufreq_get(unsigned int cpu); unsigned int cpufreq_quick_get(unsigned int cpu); unsigned int cpufreq_quick_get_max(unsigned int cpu); +unsigned int cpufreq_get_hw_max_freq(unsigned int cpu); void disable_cpufreq(void); u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy); @@ -238,6 +239,10 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu) { return 0; } +static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) +{ + return 0; +} static inline void disable_cpufreq(void) { } #endif -- 2.25.1

From: Lecopzer Chen <lecopzer.chen@mediatek.com> [Upstream commit 94946f9eaac116f2943ec79ec3df1ec2fc92ae07] Set safe maximum CPU frequency to 5 GHz in case a particular platform doesn't implement cpufreq driver. Although, architecture doesn't put any restrictions on maximum frequency but 5 GHz seems to be safe maximum given the available Arm CPUs in the market which are clocked much less than 5 GHz. On the other hand, we can't make it much higher as it would lead to a large hard-lockup detection timeout on parts which are running slower (eg. 1GHz on Developerbox) and doesn't possess a cpufreq driver. Link: https://lkml.kernel.org/r/20230519101840.v5.17.Ia9d02578e89c3f44d3cb12eec8b0... Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/watchdog_hld.c | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) create mode 100644 arch/arm64/kernel/watchdog_hld.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index d0813553ce2e..6aa52c5db00d 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_CPU_PM) += sleep.o suspend.o obj-$(CONFIG_CPU_IDLE) += cpuidle.o diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c new file mode 100644 index 000000000000..2401eb1b7e55 --- /dev/null +++ b/arch/arm64/kernel/watchdog_hld.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/cpufreq.h> + +/* + * Safe maximum CPU frequency in case a particular platform doesn't implement + * cpufreq driver. Although, architecture doesn't put any restrictions on + * maximum frequency but 5 GHz seems to be safe maximum given the available + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other + * hand, we can't make it much higher as it would lead to a large hard-lockup + * detection timeout on parts which are running slower (eg. 1GHz on + * Developerbox) and doesn't possess a cpufreq driver. + */ +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz +u64 hw_nmi_get_sample_period(int watchdog_thresh) +{ + unsigned int cpu = smp_processor_id(); + unsigned long max_cpu_freq; + + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; + if (!max_cpu_freq) + max_cpu_freq = SAFE_MAX_CPU_FREQ; + + return (u64)max_cpu_freq * watchdog_thresh; +} -- 2.25.1

From: Douglas Anderson <dianders@chromium.org> [Upstream commit d7a0fe9ef6d6484fca4ba55c19091932337d4272] With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support. One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). To cope with that, override arch_perf_nmi_is_available() to let the watchdog framework know PMU not ready, and inform the framework to re-initialize lockup detection once PMU has been initialized. [dianders@chromium.org: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled] Link: https://lkml.kernel.org/r/20230523073952.1.I60217a63acc35621e13f10be16c0cd7c... Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b4... Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> --- arch/arm64/Kconfig | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fc074fd5f040..93e47a234ba9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -159,12 +159,15 @@ config ARM64 select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_GCC_PLUGINS + select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \ + HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING select HAVE_MEMBLOCK_NODE_MAP if NUMA select HAVE_NMI select HAVE_PATA_PLATFORM select HAVE_PERF_EVENTS + select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API -- 2.25.1

From: Wei Li <liwei391@huawei.com> Add new config CONFIG_PMU_WATCHDOG for watchdog implementation method configuration. Signed-off-by: Wei Li <liwei391@huawei.com> --- arch/arm64/Kconfig | 2 -- lib/Kconfig.debug | 13 +++++++++++++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 93e47a234ba9..ae4d2746923d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -159,8 +159,6 @@ config ARM64 select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_GCC_PLUGINS - select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \ - HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING select HAVE_MEMBLOCK_NODE_MAP if NUMA diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 74a7a2077c51..943af5ac0e08 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -853,12 +853,25 @@ config HARDLOCKUP_DETECTOR_PERF bool select SOFTLOCKUP_DETECTOR +choice + prompt "aarch64 NMI watchdog method" + depends on ARM64 + help + Watchdog implementation method configuration. + config SDEI_WATCHDOG bool "SDEI NMI Watchdog support" depends on ARM_SDE_INTERFACE && !HARDLOCKUP_CHECK_TIMESTAMP select HAVE_HARDLOCKUP_DETECTOR_ARCH select HARDLOCKUP_DETECTOR +config PMU_WATCHDOG + bool "PMU NMI Watchdog support" + depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI + select HAVE_HARDLOCKUP_DETECTOR_PERF + +endchoice + # # Enables a timestamp based low pass filter to compensate for perf based # hard lockup detection which runs too fast due to turbo modes. -- 2.25.1

From: Jingyi Wang <wangjingyi11@huawei.com> On aarch64, we can compile both SDEI_WATCHODG and PMU_WATCHDOG code instead of choosing one. SDEI_WATCHDOG is used by default, and if SDEI_WATCHDOG is disabled by kernel parameter "disable_sdei_nmi_watchdog", PMU_WATCHDOG is used instead. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> --- arch/arm64/kernel/watchdog_sdei.c | 23 +++++++++++++++++------ include/linux/nmi.h | 11 +++++++++++ kernel/watchdog.c | 28 +++++++++++++++++++++------- lib/Kconfig.debug | 7 ++----- 4 files changed, 51 insertions(+), 18 deletions(-) diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index ad68ed383fce..e7845f379ca2 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -25,7 +25,7 @@ static bool disable_sdei_nmi_watchdog = true; static bool sdei_watchdog_registered; static DEFINE_PER_CPU(ktime_t, last_check_time); -int watchdog_nmi_enable(unsigned int cpu) +int watchdog_sdei_enable(unsigned int cpu) { int ret; @@ -49,7 +49,7 @@ int watchdog_nmi_enable(unsigned int cpu) return 0; } -void watchdog_nmi_disable(unsigned int cpu) +void watchdog_sdei_disable(unsigned int cpu) { int ret; @@ -111,13 +111,10 @@ void sdei_watchdog_clear_eoi(void) sdei_api_clear_eoi(SDEI_NMI_WATCHDOG_HWIRQ); } -int __init watchdog_nmi_probe(void) +int __init watchdog_sdei_probe(void) { int ret; - if (disable_sdei_nmi_watchdog) - return -EINVAL; - if (!is_hyp_mode_available()) { pr_err("Disable SDEI NMI Watchdog in VM\n"); return -EINVAL; @@ -154,3 +151,17 @@ int __init watchdog_nmi_probe(void) return 0; } + +static struct watchdog_operations arch_watchdog_ops = { + .watchdog_nmi_stop = &watchdog_nmi_stop, + .watchdog_nmi_start = &watchdog_nmi_start, + .watchdog_nmi_probe = &watchdog_sdei_probe, + .watchdog_nmi_enable = &watchdog_sdei_enable, + .watchdog_nmi_disable = &watchdog_sdei_disable, +}; + +void watchdog_ops_init(void) +{ + if (!disable_sdei_nmi_watchdog) + nmi_watchdog_ops = arch_watchdog_ops; +} diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 020768634b29..a22dfa989a39 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -131,6 +131,17 @@ int watchdog_nmi_probe(void); int watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); +struct watchdog_operations { + void (*watchdog_nmi_stop)(void); + void (*watchdog_nmi_start)(void); + int (*watchdog_nmi_probe)(void); + int (*watchdog_nmi_enable)(unsigned int cpu); + void (*watchdog_nmi_disable)(unsigned int cpu); +}; + +extern struct watchdog_operations nmi_watchdog_ops; +void watchdog_ops_init(void); + void lockup_detector_reconfigure(void); /** diff --git a/kernel/watchdog.c b/kernel/watchdog.c index bc48dfdcf257..07e494879c4a 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -50,6 +50,14 @@ static struct cpumask watchdog_allowed_mask __read_mostly; struct cpumask watchdog_cpumask __read_mostly; unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); +struct watchdog_operations nmi_watchdog_ops = { + .watchdog_nmi_stop = &watchdog_nmi_stop, + .watchdog_nmi_start = &watchdog_nmi_start, + .watchdog_nmi_probe = &watchdog_nmi_probe, + .watchdog_nmi_enable = &watchdog_nmi_enable, + .watchdog_nmi_disable = &watchdog_nmi_disable, +}; + #ifdef CONFIG_HARDLOCKUP_DETECTOR /* * Should we panic when a soft-lockup or hard-lockup occurs: @@ -509,7 +517,7 @@ static void watchdog_enable(unsigned int cpu) __touch_watchdog(); /* Enable the perf event */ if (watchdog_enabled & NMI_WATCHDOG_ENABLED) - watchdog_nmi_enable(cpu); + nmi_watchdog_ops.watchdog_nmi_enable(cpu); } static void watchdog_disable(unsigned int cpu) @@ -523,7 +531,7 @@ static void watchdog_disable(unsigned int cpu) * between disabling the timer and disabling the perf event causes * the perf NMI to detect a false positive. */ - watchdog_nmi_disable(cpu); + nmi_watchdog_ops.watchdog_nmi_disable(cpu); hrtimer_cancel(hrtimer); wait_for_completion(this_cpu_ptr(&softlockup_completion)); } @@ -579,7 +587,7 @@ int lockup_detector_offline_cpu(unsigned int cpu) static void __lockup_detector_reconfigure(void) { cpus_read_lock(); - watchdog_nmi_stop(); + nmi_watchdog_ops.watchdog_nmi_stop(); softlockup_stop_all(); set_sample_period(); @@ -587,7 +595,7 @@ static void __lockup_detector_reconfigure(void) if (watchdog_enabled && watchdog_thresh) softlockup_start_all(); - watchdog_nmi_start(); + nmi_watchdog_ops.watchdog_nmi_start(); cpus_read_unlock(); /* * Must be called outside the cpus locked section to prevent @@ -632,9 +640,9 @@ static __init void lockup_detector_setup(void) static void __lockup_detector_reconfigure(void) { cpus_read_lock(); - watchdog_nmi_stop(); + nmi_watchdog_ops.watchdog_nmi_stop(); lockup_detector_update_enable(); - watchdog_nmi_start(); + nmi_watchdog_ops.watchdog_nmi_start(); cpus_read_unlock(); } void lockup_detector_reconfigure(void) @@ -796,15 +804,21 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write, } #endif /* CONFIG_SYSCTL */ +void __weak watchdog_ops_init(void) +{ +} + void __init lockup_detector_init(void) { + watchdog_ops_init(); + if (tick_nohz_full_enabled()) pr_info("Disabling watchdog on nohz_full cores by default\n"); cpumask_copy(&watchdog_cpumask, housekeeping_cpumask(HK_FLAG_TIMER)); - if (!watchdog_nmi_probe()) + if (!nmi_watchdog_ops.watchdog_nmi_probe()) nmi_watchdog_available = true; lockup_detector_setup(); } diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 943af5ac0e08..ea78dfe8ec3c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -853,11 +853,8 @@ config HARDLOCKUP_DETECTOR_PERF bool select SOFTLOCKUP_DETECTOR -choice - prompt "aarch64 NMI watchdog method" +menu "ARM64 NMI watchdog configuration" depends on ARM64 - help - Watchdog implementation method configuration. config SDEI_WATCHDOG bool "SDEI NMI Watchdog support" @@ -870,7 +867,7 @@ config PMU_WATCHDOG depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_HARDLOCKUP_DETECTOR_PERF -endchoice +endmenu # "ARM64 NMI watchdog configuration" # # Enables a timestamp based low pass filter to compensate for perf based -- 2.25.1

From: Yicong Yang <yangyicong@hisilicon.com> The introduce of FEAT_NMI/FEAT_GICv3_NMI will cause a race problem that we may handle the normal interrupt in interrupt disabled context due to the withdraw of NMI interrupt. The flow will be like below: [interrupt disabled] <- normal interrupt pending, for example timer interrupt <- NMI occurs, ISR_EL1.nmi = 1 do_el1_interrupt() <- NMI withdraw, ISR_EL1.nmi = 0 ISR_EL1.nmi = 0, not an NMI interrupt gic_handle_irq() __gic_handle_irq_from_irqson() irqnr = gic_read_iar() <- Oops, ack and handle an normal interrupt in interrupt disabled context! Fix this by checking the interrupt status in __gic_handle_irq_from_irqson() and ignore the interrupt if we're in interrupt disabled context. Fixes: 0408b5bc4300 ("irqchip/gic-v3: Implement FEAT_GICv3_NMI support") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> --- drivers/irqchip/irq-gic-v3.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index c4503951b6b7..e83b24e2fad8 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -736,6 +736,28 @@ static void __gic_handle_irq_from_irqson(struct pt_regs *regs) bool is_nmi; u32 irqnr; + /* + * We should enter here with interrupts disabled, otherwise we may met + * a race here with FEAT_NMI/FEAT_GICv3_NMI: + * + * [interrupt disabled] + * <- normal interrupt pending, for example timer interrupt + * <- NMI occurs, ISR_EL1.nmi = 1 + * do_el1_interrupt() + * <- NMI withdraw, ISR_EL1.nmi = 0 + * ISR_EL1.nmi = 0, not an NMI interrupt + * gic_handle_irq() + * __gic_handle_irq_from_irqson() + * irqnr = gic_read_iar() <- Oops, ack and handle an normal interrupt + * in interrupt disabled context! + * + * So if we met this case here, just return from the interrupt context. + * Since the interrupt is still pending, we can handle it once the + * interrupt re-enabled and it'll not be missing. + */ + if (!interrupts_enabled(regs)) + return; + irqnr = gic_read_iar(); is_nmi = gic_rpr_is_nmi_prio(); -- 2.25.1

Enable CONFIG_PMU_WATCHDOG to support NMI watchdog. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> --- arch/arm64/configs/tencent.config | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/configs/tencent.config b/arch/arm64/configs/tencent.config index b294d3f7be8f..e76929708286 100644 --- a/arch/arm64/configs/tencent.config +++ b/arch/arm64/configs/tencent.config @@ -1475,6 +1475,7 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_DEBUG_SHIRQ=y CONFIG_SDEI_WATCHDOG=y +CONFIG_PMU_WATCHDOG=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_KILL_BLOCK=y CONFIG_PANIC_ON_OOPS=y -- 2.25.1
participants (1)
-
Yang Yingliang