
From: Yipeng Zou <zouyipeng@huawei.com> Offering: HULK hulk inclusion category: bugfix bugzilla: 190750 CVE: NA ------------------------ Recently, A issue has been reported that CPU hang in x86 VM. The CPU halted during Kdump likely due to IPI issues when one CPU was rebooting and another was in Kdump: CPU0 CPU2 ======================== ====================== reboot Panic machine shutdown Kdump machine shutdown stop other cpus kdump_nmi_shootdown_cpus ... ... local_irq_disable local_irq_disable send_IPIs(REBOOT) [critical regions] [critical regions] 1) send_IPIs(REBOOT) wait timeout 2) send_IPIs(NMI); Halt,NMI context 3) lapic_shutdown [IPI is pending] ... second kernel start 4) init_bsp_APIC [IPI is pending] ... local irq enable Halt, IPI context In simple terms, when the Kdump jump to the second kernel, the IPI that was pending in the first kernel remains and is responded to by the second kernel. As the reboot IPI can only be sent after acquiring @stopping_cpu by storing the CPU number, this case can be detected when @stopping_cpu contains the bootup value -1. Just return and ignore it. Fixes: 9c7af565c58e (“centos 8.1: import linux-4.18.0-147.5.1.el8_1”) Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Lin Yujun <linyujun809@h-partners.com> --- arch/x86/kernel/smp.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index b2b87b91f336..07ccda0ad531 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -176,6 +176,29 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs) asmlinkage __visible void smp_reboot_interrupt(void) { ipi_entering_ack_irq(); + + /* + * Handle the case where a reboot IPI is stale in the IRR. This + * happens when: + * + * a CPU crashes with interrupts disabled before handling the + * reboot IPI and jumps into a crash kernel. The reboot IPI + * vector is kept set in the APIC IRR across the APIC soft + * disabled phase and as there is no way to clear a pending IRR + * bit, it is delivered to the crash kernel immediately when + * interrupts are enabled. + * + * As the reboot IPI can only be sent after acquiring @stopping_cpu + * by storing the CPU number, this case can be detected when + * @stopping_cpu contains the bootup value -1. Just return and + * ignore it. + */ + if (atomic_read(&stopping_cpu) == -1) { + pr_info("Ignoring stale reboot IPI\n"); + irq_exit(); + return; + } + cpu_emergency_vmxoff(); stop_this_cpu(NULL); irq_exit(); -- 2.34.1