From: Mahesh Salgaonkar mahesh@linux.ibm.com
stable inclusion from stable-5.10.73 commit f73ca4961d51304a8f1418a6b6b9d9c77ea95041 bugzilla: 182983 https://gitee.com/openeuler/kernel/issues/I4I3M0
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit eb8257a12192f43ffd41bd90932c39dade958042 ]
On pseries LPAR when an empty slot is assigned to partition OR in single LPAR mode, kdump kernel crashes during issuing PHB reset.
In the kdump scenario, we traverse all PHBs and issue reset using the pe_config_addr of the first child device present under each PHB. However the code assumes that none of the PHB slots can be empty and uses list_first_entry() to get the first child device under the PHB. Since list_first_entry() expects the list to be non-empty, it returns an invalid pci_dn entry and ends up accessing NULL phb pointer under pci_dn->phb causing kdump kernel crash.
This patch fixes the below kdump kernel crash by skipping empty slots:
audit: initializing netlink subsys (disabled) thermal_sys: Registered thermal governor 'fair_share' thermal_sys: Registered thermal governor 'step_wise' cpuidle: using governor menu pstore: Registered nvram as persistent store backend Issue PHB reset ... audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1 BUG: Kernel NULL pointer dereference on read at 0x00000268 Faulting instruction address: 0xc000000008101fb0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1 NIP: c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70 REGS: c00000001161b840 TRAP: 0300 Not tainted (5.14.0) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000224 XER: 20040002 CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0 ... NIP pseries_eeh_get_pe_config_addr+0x100/0x1b0 LR __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350 Call Trace: 0xc00000001161bb80 (unreliable) __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350 do_one_initcall+0x60/0x2d0 kernel_init_freeable+0x350/0x3f8 kernel_init+0x3c/0x17c ret_from_kernel_thread+0x5c/0x64
Fixes: 5a090f7c363fd ("powerpc/pseries: PCIE PHB reset") Signed-off-by: Mahesh Salgaonkar mahesh@linux.ibm.com [mpe: Tweak wording and trim oops] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/163215558252.413351.8600189949820258982.stgit@jupi... Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Chen Jun chenjun102@huawei.com Acked-by: Weilong Chen chenweilong@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/powerpc/platforms/pseries/eeh_pseries.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index cf024fa37bda..7ed38ebd0c7b 100644 --- a/arch/powerpc/platforms/pseries/eeh_pseries.c +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -868,6 +868,10 @@ static int __init eeh_pseries_init(void) if (is_kdump_kernel() || reset_devices) { pr_info("Issue PHB reset ...\n"); list_for_each_entry(phb, &hose_list, list_node) { + // Skip if the slot is empty + if (list_empty(&PCI_DN(phb->dn)->child_list)) + continue; + pdn = list_first_entry(&PCI_DN(phb->dn)->child_list, struct pci_dn, list); config_addr = pseries_eeh_get_pe_config_addr(pdn);