- Kernel - mailweb.openeuler.org

[PATCH OLK-6.6] PCI: Fix NVMe link speed downgrade to 2.5GT/s during hotplug
by Hongtao Zhang 03 Mar '26

03 Mar '26

From: Jinhui Guo <guojinhui.liam(a)bytedance.com> maillist inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13837 Reference: https://lore.kernel.org/all/20241107143758.12643-1-guojinhui.liam@bytedance… ------------------------------------------- During extreme hotplug stress tests (e.g., NVMe insertions 50 times), the link speed is permanently downgraded to 2.5 GT/s. "Card present" is reported twice via external interrupts due to a slight tremor when the NVMe device is plugged in. The failure of the link activation for the first time leads to the link speed of the root port being mistakenly downgraded to 2.5GT/s by the ASMedia workaround. If the link subsequently trains successfully at 2.5GT/s, the device gets trapped at Gen1 speed because the recovery logic only applies to ASMedia chips. Upstream community addressed this by relying on the newly introduced pci/bwctrl subsystem (Commit de9a6c8d5dbf), which is not present in stable kernels like 6.6. Backporting the mainline fix would require pulling in massive architectural changes and risking system stability. This patch adopts the RFC approach to avoid wrongly setting the link speed to 2.5GT/s by moving the whitelist check to the very beginning. Only allow the specific ASMedia PCIe switches to perform link retrain. Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") Signed-off-by: Jinhui Guo <guojinhui.liam(a)bytedance.com> Signed-off-by: Zhang Hongtao <zhanghongtao35(a)huawei.com> --- drivers/pci/quirks.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index a2a2bde69a026..fbdd54cc81c71 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -94,7 +94,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) int ret = -ENOTTY; if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || - !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting || + !pci_match_id(ids, dev)) return ret; pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); @@ -122,8 +123,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) } if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && - pci_match_id(ids, dev)) { + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { u32 lnkcap; pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); -- 2.43.0

2 1

[PATCH v2 OLK-6.6 0/8] arm64: Add memory poison recovery support for various paths
by Wupeng Ma 02 Mar '26

02 Mar '26

This series enhances the arm64 kernel's ability to recover from memory poison (hardware memory errors) encountered during common I/O and data copy operations. The work extends the existing RAS error handling framework to gracefully handle poison in scenarios that previously could lead to kernel panics or silent data corruption. Memory errors are increasingly common in large-scale deployments; recovering from them where possible improves system reliability and availability. The patches implement recovery in key paths such as copy_{from/to}_user, iov_iter, __kernel_read, and shmem_file_read_iter, ensuring that when poison is consumed, the kernel can either complete the operation safely (by returning an error) or deliver the appropriate signal to the affected user process. Additionally, the series refines the SEA (Synchronous External Abort) handling path: it moves siaddr extraction earlier for accurate signal info, adds RAS logging for better diagnostics, and ensures a SIGBUS is sent to user tasks when APEI claims fail. The series has been tested on ARM64 platforms with RAS capabilities, using error injection to verify that poison is correctly handled and that appropriate errors or signals are delivered. Feedback and review are appreciated. Changelog since v1: - use correct issue Linus Torvalds (1): iov_iter: get rid of 'copy_mc' flag Wupeng Ma (7): arm64: mm: support recovery from copy_{from/to}_user_page() iov_iter: add mc support for copy_page_to_iter arm64: fs: support poison recovery from __kernel_read() arm64: mm: shmem: support poison recovery from shmem_file_read_iter() arm64: move getting valid siaddr to the top of do_sea arm64: add ras log during do_sea arm64: send sig fault for user task when apei_claim_sea fails arch/arm64/include/asm/cacheflush.h | 7 +++++ arch/arm64/kernel/acpi.c | 2 ++ arch/arm64/mm/extable.c | 5 +++- arch/arm64/mm/fault.c | 46 ++++++++++++++++++++++------- arch/arm64/mm/flush.c | 11 +++++++ fs/coredump.c | 45 ++++++++++++++++++++++++++-- include/asm-generic/cacheflush.h | 24 +++++++++++++++ lib/iov_iter.c | 16 ++++++++-- mm/filemap.c | 3 ++ mm/memory.c | 15 +++++++--- mm/shmem.c | 4 +++ 11 files changed, 158 insertions(+), 20 deletions(-) -- 2.43.0

2 9

[PATCH OLK-6.6 0/8] arm64: Add memory poison recovery support for various paths
by Wupeng Ma 02 Mar '26

02 Mar '26

This series enhances the arm64 kernel's ability to recover from memory poison (hardware memory errors) encountered during common I/O and data copy operations. The work extends the existing RAS error handling framework to gracefully handle poison in scenarios that previously could lead to kernel panics or silent data corruption. Memory errors are increasingly common in large-scale deployments; recovering from them where possible improves system reliability and availability. The patches implement recovery in key paths such as copy_{from/to}_user, iov_iter, __kernel_read, and shmem_file_read_iter, ensuring that when poison is consumed, the kernel can either complete the operation safely (by returning an error) or deliver the appropriate signal to the affected user process. Additionally, the series refines the SEA (Synchronous External Abort) handling path: it moves siaddr extraction earlier for accurate signal info, adds RAS logging for better diagnostics, and ensures a SIGBUS is sent to user tasks when APEI claims fail. The series has been tested on ARM64 platforms with RAS capabilities, using error injection to verify that poison is correctly handled and that appropriate errors or signals are delivered. Feedback and review are appreciated. Linus Torvalds (1): iov_iter: get rid of 'copy_mc' flag Wupeng Ma (7): arm64: mm: support recovery from copy_{from/to}_user_page() iov_iter: add mc support for copy_page_to_iter arm64: fs: support poison recovery from __kernel_read() arm64: mm: shmem: support poison recovery from shmem_file_read_iter() arm64: move getting valid siaddr to the top of do_sea arm64: add ras log during do_sea arm64: send sig fault for user task when apei_claim_sea fails arch/arm64/include/asm/cacheflush.h | 7 +++++ arch/arm64/kernel/acpi.c | 2 ++ arch/arm64/mm/extable.c | 5 +++- arch/arm64/mm/fault.c | 46 ++++++++++++++++++++++------- arch/arm64/mm/flush.c | 11 +++++++ fs/coredump.c | 45 ++++++++++++++++++++++++++-- include/asm-generic/cacheflush.h | 24 +++++++++++++++ lib/iov_iter.c | 16 ++++++++-- mm/filemap.c | 3 ++ mm/memory.c | 15 +++++++--- mm/shmem.c | 4 +++ 11 files changed, 158 insertions(+), 20 deletions(-) -- 2.43.0

2 9

[PATCH openEuler-1.0-LTS] PCI: Fix pci_slot_trylock() error handling
by Ziming Du 02 Mar '26

02 Mar '26

From: Jinhui Guo <guojinhui.liam(a)bytedance.com> mainline inclusion from mainline-v7.0-rc1 commit 9368d1ee62829b08aa31836b3ca003803caf0b72 category: bugfix bugzilla: 190823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Commit a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()") delegates the bridge device's pci_dev_trylock() to pci_bus_trylock() in pci_slot_trylock(), but it forgets to remove the corresponding pci_dev_unlock() when pci_bus_trylock() fails. Before a4e772898f8b, the code did: if (!pci_dev_trylock(dev)) /* <- lock bridge device */ goto unlock; if (dev->subordinate) { if (!pci_bus_trylock(dev->subordinate)) { pci_dev_unlock(dev); /* <- unlock bridge device */ goto unlock; } } After a4e772898f8b the bridge-device lock is no longer taken, but the pci_dev_unlock(dev) on the failure path was left in place, leading to the bug. This yields one of two errors: 1. A warning that the lock is being unlocked when no one holds it. 2. An incorrect unlock of a lock that belongs to another thread. Fix it by removing the now-redundant pci_dev_unlock(dev) on the failure path. [Same patch later posted by Keith at https://patch.msgid.link/20260116184150.3013258-1-kbusch@meta.com] Fixes: a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()") Signed-off-by: Jinhui Guo <guojinhui.liam(a)bytedance.com> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Reviewed-by: Dan Williams <dan.j.williams(a)intel.com> Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/20251212145528.2555-1-guojinhui.liam@bytedance.com Signed-off-by: Ziming Du <duziming2(a)huawei.com> --- drivers/pci/pci.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b93605616d4e4..fbef3c37f37fa 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5144,10 +5144,8 @@ static int pci_slot_trylock(struct pci_slot *slot) if (!dev->slot || dev->slot != slot) continue; if (dev->subordinate) { - if (!pci_bus_trylock(dev->subordinate)) { - pci_dev_unlock(dev); + if (!pci_bus_trylock(dev->subordinate)) goto unlock; - } } else if (!pci_dev_trylock(dev)) goto unlock; } -- 2.43.0

2 1

[PATCH OLK-6.6 v2] KVM: arm64: Fix softirq masking in FPSIMD register saving sequence
by Yipeng Zou 02 Mar '26

02 Mar '26

From: Will Deacon <will(a)kernel.org> stable inclusion from stable-v6.6.111 commit 250b6e009ff97f69411aa109116ac720d54f45ec category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8596 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- Stable commit 28b82be094e2 ("KVM: arm64: Fix kernel BUG() due to bad backport of FPSIMD/SVE/SME fix") fixed a kernel BUG() caused by a bad backport of upstream commit fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state") by ensuring that softirqs are disabled/enabled across the fpsimd register save operation. Unfortunately, although this fixes the original issue, it can now lead to deadlock when re-enabling softirqs causes pending softirqs to be handled with locks already held: | BUG: spinlock recursion on CPU#7, CPU 3/KVM/57616 | lock: 0xffff3045ef850240, .magic: dead4ead, .owner: CPU 3/KVM/57616, .owner_cpu: 7 | CPU: 7 PID: 57616 Comm: CPU 3/KVM Tainted: G O 6.1.152 #1 | Hardware name: SoftIron SoftIron Platform Mainboard/SoftIron Platform Mainboard, BIOS 1.31 May 11 2023 | Call trace: | dump_backtrace+0xe4/0x110 | show_stack+0x20/0x30 | dump_stack_lvl+0x6c/0x88 | dump_stack+0x18/0x34 | spin_dump+0x98/0xac | do_raw_spin_lock+0x70/0x128 | _raw_spin_lock+0x18/0x28 | raw_spin_rq_lock_nested+0x18/0x28 | update_blocked_averages+0x70/0x550 | run_rebalance_domains+0x50/0x70 | handle_softirqs+0x198/0x328 | __do_softirq+0x1c/0x28 | ____do_softirq+0x18/0x28 | call_on_irq_stack+0x30/0x48 | do_softirq_own_stack+0x24/0x30 | do_softirq+0x74/0x90 | __local_bh_enable_ip+0x64/0x80 | fpsimd_save_and_flush_cpu_state+0x5c/0x68 | kvm_arch_vcpu_put_fp+0x4c/0x88 | kvm_arch_vcpu_put+0x28/0x88 | kvm_sched_out+0x38/0x58 | __schedule+0x55c/0x6c8 | schedule+0x60/0xa8 Take a tiny step towards the upstream fix in 9b19700e623f ("arm64: fpsimd: Drop unneeded 'busy' flag") by additionally disabling hardirqs while saving the fpsimd registers. Cc: Ard Biesheuvel <ardb(a)kernel.org> Cc: Lee Jones <lee(a)kernel.org> Cc: Sasha Levin <sashal(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> # 6.6.y Fixes: 208d4f745a52 ("KVM: arm64: Fix kernel BUG() due to bad backport of FPSIMD/SVE/SME fix") Reported-by: Kenneth Van Alstyne <kvanals(a)kvanals.org> Link: https://lore.kernel.org/r/010001999bae0958-4d80d25d-8dda-4006-a6b9-798f3e77… Signed-off-by: Will Deacon <will(a)kernel.org> Acked-by: Ard Biesheuvel <ardb(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- arch/arm64/kernel/fpsimd.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 75344016b14c..9da29ae8a045 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1870,13 +1870,17 @@ static void fpsimd_flush_cpu_state(void) */ void fpsimd_save_and_flush_cpu_state(void) { + unsigned long flags; + if (!system_supports_fpsimd()) return; WARN_ON(preemptible()); - get_cpu_fpsimd_context(); + local_irq_save(flags); + __get_cpu_fpsimd_context(); fpsimd_save(); fpsimd_flush_cpu_state(); - put_cpu_fpsimd_context(); + __put_cpu_fpsimd_context(); + local_irq_restore(flags); } #ifdef CONFIG_KERNEL_MODE_NEON -- 2.34.1

2 1

[PATCH OLK-6.6] KVM: arm64: Fix softirq masking in FPSIMD register saving sequence
by Yipeng Zou 02 Mar '26

02 Mar '26

From: Will Deacon <will(a)kernel.org> stable inclusion from stable-v6.6.111 commit 250b6e009ff97f69411aa109116ac720d54f45ec category: bugfix bugzilla: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- Stable commit 28b82be094e2 ("KVM: arm64: Fix kernel BUG() due to bad backport of FPSIMD/SVE/SME fix") fixed a kernel BUG() caused by a bad backport of upstream commit fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state") by ensuring that softirqs are disabled/enabled across the fpsimd register save operation. Unfortunately, although this fixes the original issue, it can now lead to deadlock when re-enabling softirqs causes pending softirqs to be handled with locks already held: | BUG: spinlock recursion on CPU#7, CPU 3/KVM/57616 | lock: 0xffff3045ef850240, .magic: dead4ead, .owner: CPU 3/KVM/57616, .owner_cpu: 7 | CPU: 7 PID: 57616 Comm: CPU 3/KVM Tainted: G O 6.1.152 #1 | Hardware name: SoftIron SoftIron Platform Mainboard/SoftIron Platform Mainboard, BIOS 1.31 May 11 2023 | Call trace: | dump_backtrace+0xe4/0x110 | show_stack+0x20/0x30 | dump_stack_lvl+0x6c/0x88 | dump_stack+0x18/0x34 | spin_dump+0x98/0xac | do_raw_spin_lock+0x70/0x128 | _raw_spin_lock+0x18/0x28 | raw_spin_rq_lock_nested+0x18/0x28 | update_blocked_averages+0x70/0x550 | run_rebalance_domains+0x50/0x70 | handle_softirqs+0x198/0x328 | __do_softirq+0x1c/0x28 | ____do_softirq+0x18/0x28 | call_on_irq_stack+0x30/0x48 | do_softirq_own_stack+0x24/0x30 | do_softirq+0x74/0x90 | __local_bh_enable_ip+0x64/0x80 | fpsimd_save_and_flush_cpu_state+0x5c/0x68 | kvm_arch_vcpu_put_fp+0x4c/0x88 | kvm_arch_vcpu_put+0x28/0x88 | kvm_sched_out+0x38/0x58 | __schedule+0x55c/0x6c8 | schedule+0x60/0xa8 Take a tiny step towards the upstream fix in 9b19700e623f ("arm64: fpsimd: Drop unneeded 'busy' flag") by additionally disabling hardirqs while saving the fpsimd registers. Cc: Ard Biesheuvel <ardb(a)kernel.org> Cc: Lee Jones <lee(a)kernel.org> Cc: Sasha Levin <sashal(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> # 6.6.y Fixes: 208d4f745a52 ("KVM: arm64: Fix kernel BUG() due to bad backport of FPSIMD/SVE/SME fix") Reported-by: Kenneth Van Alstyne <kvanals(a)kvanals.org> Link: https://lore.kernel.org/r/010001999bae0958-4d80d25d-8dda-4006-a6b9-798f3e77… Signed-off-by: Will Deacon <will(a)kernel.org> Acked-by: Ard Biesheuvel <ardb(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- arch/arm64/kernel/fpsimd.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 75344016b14c..9da29ae8a045 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1870,13 +1870,17 @@ static void fpsimd_flush_cpu_state(void) */ void fpsimd_save_and_flush_cpu_state(void) { + unsigned long flags; + if (!system_supports_fpsimd()) return; WARN_ON(preemptible()); - get_cpu_fpsimd_context(); + local_irq_save(flags); + __get_cpu_fpsimd_context(); fpsimd_save(); fpsimd_flush_cpu_state(); - put_cpu_fpsimd_context(); + __put_cpu_fpsimd_context(); + local_irq_restore(flags); } #ifdef CONFIG_KERNEL_MODE_NEON -- 2.34.1

2 1

[PATCH OLK-6.6] scsi: qla2xxx: Free sp in error path to fix system crash
by Zheng Qixing 02 Mar '26

02 Mar '26

From: Anil Gurumurthy <agurumurthy(a)marvell.com> stable inclusion from stable-v6.6.125 commit aed16d37696f494288a291b4b477484ed0be774b category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13785 CVE: CVE-2025-71232 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ------------------ commit 7adbd2b7809066c75f0433e5e2a8e114b429f30f upstream. System crash seen during load/unload test in a loop, [61110.449331] qla2xxx [0000:27:00.0]-0042:0: Disabled MSI-X. [61110.467494] ============================================================================= [61110.467498] BUG qla2xxx_srbs (Tainted: G OE -------- --- ): Objects remaining in qla2xxx_srbs on __kmem_cache_shutdown() [61110.467501] ----------------------------------------------------------------------------- [61110.467502] Slab 0x000000000ffc8162 objects=51 used=1 fp=0x00000000e25d3d85 flags=0x57ffffc0010200(slab|head|node=1|zone=2|lastcpupid=0x1fffff) [61110.467509] CPU: 53 PID: 455206 Comm: rmmod Kdump: loaded Tainted: G OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 [61110.467513] Hardware name: HPE ProLiant DL385 Gen10 Plus v2/ProLiant DL385 Gen10 Plus v2, BIOS A42 08/17/2023 [61110.467515] Call Trace: [61110.467516] <TASK> [61110.467519] dump_stack_lvl+0x34/0x48 [61110.467526] slab_err.cold+0x53/0x67 [61110.467534] __kmem_cache_shutdown+0x16e/0x320 [61110.467540] kmem_cache_destroy+0x51/0x160 [61110.467544] qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467607] ? __do_sys_delete_module.constprop.0+0x178/0x280 [61110.467613] ? syscall_trace_enter.constprop.0+0x145/0x1d0 [61110.467616] ? do_syscall_64+0x5c/0x90 [61110.467619] ? exc_page_fault+0x62/0x150 [61110.467622] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [61110.467626] </TASK> [61110.467627] Disabling lock debugging due to kernel taint [61110.467635] Object 0x0000000026f7e6e6 @offset=16000 [61110.467639] ------------[ cut here ]------------ [61110.467639] kmem_cache_destroy qla2xxx_srbs: Slab cache still has objects when called from qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467659] WARNING: CPU: 53 PID: 455206 at mm/slab_common.c:520 kmem_cache_destroy+0x14d/0x160 [61110.467718] CPU: 53 PID: 455206 Comm: rmmod Kdump: loaded Tainted: G B OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 [61110.467720] Hardware name: HPE ProLiant DL385 Gen10 Plus v2/ProLiant DL385 Gen10 Plus v2, BIOS A42 08/17/2023 [61110.467721] RIP: 0010:kmem_cache_destroy+0x14d/0x160 [61110.467724] Code: 99 7d 07 00 48 89 ef e8 e1 6a 07 00 eb b3 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 70 fc 66 90 48 c7 c7 f8 ef a1 90 e8 e1 ed 7c 00 <0f> 0b eb 93 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 [61110.467725] RSP: 0018:ffffa304e489fe80 EFLAGS: 00010282 [61110.467727] RAX: 0000000000000000 RBX: ffffffffc0d9a860 RCX: 0000000000000027 [61110.467729] RDX: ffff8fd5ff9598a8 RSI: 0000000000000001 RDI: ffff8fd5ff9598a0 [61110.467730] RBP: ffff8fb6aaf78700 R08: 0000000000000000 R09: 0000000100d863b7 [61110.467731] R10: ffffa304e489fd20 R11: ffffffff913bef48 R12: 0000000040002000 [61110.467731] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [61110.467733] FS: 00007f64c89fb740(0000) GS:ffff8fd5ff940000(0000) knlGS:0000000000000000 [61110.467734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [61110.467735] CR2: 00007f0f02bfe000 CR3: 00000020ad6dc005 CR4: 0000000000770ee0 [61110.467736] PKRU: 55555554 [61110.467737] Call Trace: [61110.467738] <TASK> [61110.467739] qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467755] ? __do_sys_delete_module.constprop.0+0x178/0x280 Free sp in the error path to fix the crash. Fixes: f352eeb75419 ("scsi: qla2xxx: Add ability to use GPNFT/GNNFT for RSCN handling") Cc: stable(a)vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy(a)marvell.com> Signed-off-by: Nilesh Javali <njavali(a)marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024(a)gmail.com> Link: https://patch.msgid.link/20251210101604.431868-9-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Zheng Qixing <zhengqixing(a)huawei.com> --- drivers/scsi/qla2xxx/qla_gs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index c09229b7c516..821e22731fed 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3625,8 +3625,8 @@ int qla_fab_async_scan(scsi_qla_host_t *vha, srb_t *sp) if (vha->scan.scan_flags & SF_SCANNING) { spin_unlock_irqrestore(&vha->work_lock, flags); ql_dbg(ql_dbg_disc + ql_dbg_verbose, vha, 0x2012, - "%s: scan active\n", __func__); - return rval; + "%s: scan active for sp:%p\n", __func__, sp); + goto done_free_sp; } vha->scan.scan_flags |= SF_SCANNING; if (!sp) -- 2.39.2

2 1

[PATCH OLK-6.6] scsi: qla2xxx: Delay module unload while fabric scan in progress
by Zheng Qixing 02 Mar '26

02 Mar '26

From: Anil Gurumurthy <agurumurthy(a)marvell.com> stable inclusion from stable-v6.6.125 commit 528b2f1027edfb52af0171f0f4b227fb356dde05 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13782 CVE: CVE-2025-71235 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ------------------ commit 8890bf450e0b6b283f48ac619fca5ac2f14ddd62 upstream. System crash seen during load/unload test in a loop. [105954.384919] RBP: ffff914589838dc0 R08: 0000000000000000 R09: 0000000000000086 [105954.384920] R10: 000000000000000f R11: ffffa31240904be5 R12: ffff914605f868e0 [105954.384921] R13: ffff914605f86910 R14: 0000000000008010 R15: 00000000ddb7c000 [105954.384923] FS: 0000000000000000(0000) GS:ffff9163fec40000(0000) knlGS:0000000000000000 [105954.384925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [105954.384926] CR2: 000055d31ce1d6a0 CR3: 0000000119f5e001 CR4: 0000000000770ee0 [105954.384928] PKRU: 55555554 [105954.384929] Call Trace: [105954.384931] <IRQ> [105954.384934] qla24xx_sp_unmap+0x1f3/0x2a0 [qla2xxx] [105954.384962] ? qla_async_scan_sp_done+0x114/0x1f0 [qla2xxx] [105954.384980] ? qla24xx_els_ct_entry+0x4de/0x760 [qla2xxx] [105954.384999] ? __wake_up_common+0x80/0x190 [105954.385004] ? qla24xx_process_response_queue+0xc2/0xaa0 [qla2xxx] [105954.385023] ? qla24xx_msix_rsp_q+0x44/0xb0 [qla2xxx] [105954.385040] ? __handle_irq_event_percpu+0x3d/0x190 [105954.385044] ? handle_irq_event+0x58/0xb0 [105954.385046] ? handle_edge_irq+0x93/0x240 [105954.385050] ? __common_interrupt+0x41/0xa0 [105954.385055] ? common_interrupt+0x3e/0xa0 [105954.385060] ? asm_common_interrupt+0x22/0x40 The root cause of this was that there was a free (dma_free_attrs) in the interrupt context. There was a device discovery/fabric scan in progress. A module unload was issued which set the UNLOADING flag. As part of the discovery, after receiving an interrupt a work queue was scheduled (which involved a work to be queued). Since the UNLOADING flag is set, the work item was not allocated and the mapped memory had to be freed. The free occurred in interrupt context leading to system crash. Delay the driver unload until the fabric scan is complete to avoid the crash. Reported-by: kernel test robot <lkp(a)intel.com> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Closes: https://lore.kernel.org/all/202512090414.07Waorz0-lkp@intel.com/ Fixes: 783e0dc4f66a ("qla2xxx: Check for device state before unloading the driver.") Cc: stable(a)vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy(a)marvell.com> Signed-off-by: Nilesh Javali <njavali(a)marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024(a)gmail.com> Link: https://patch.msgid.link/20251210101604.431868-8-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Zheng Qixing <zhengqixing(a)huawei.com> --- drivers/scsi/qla2xxx/qla_os.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 7fa7a658ba19..609acc97e887 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -1194,7 +1194,8 @@ qla2x00_wait_for_hba_ready(scsi_qla_host_t *vha) while ((qla2x00_reset_active(vha) || ha->dpc_active || ha->flags.mbox_busy) || test_bit(FX00_RESET_RECOVERY, &vha->dpc_flags) || - test_bit(FX00_TARGET_SCAN, &vha->dpc_flags)) { + test_bit(FX00_TARGET_SCAN, &vha->dpc_flags) || + (vha->scan.scan_flags & SF_SCANNING)) { if (test_bit(UNLOADING, &base_vha->dpc_flags)) break; msleep(1000); -- 2.39.2

2 1

[PATCH OLK-6.6] scsi: qla2xxx: Validate sp before freeing associated memory
by Zheng Qixing 02 Mar '26

02 Mar '26

From: Anil Gurumurthy <agurumurthy(a)marvell.com> stable inclusion from stable-v6.6.125 commit 949010291bb941d53733ed08a33454254d9afb1b category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13781 CVE: CVE-2025-71236 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ------------------ commit b6df15aec8c3441357d4da0eaf4339eb20f5999f upstream. System crash with the following signature [154563.214890] nvme nvme2: NVME-FC{1}: controller connect complete [154564.169363] qla2xxx [0000:b0:00.1]-3002:2: nvme: Sched: Set ZIO exchange threshold to 3. [154564.169405] qla2xxx [0000:b0:00.1]-ffffff:2: SET ZIO Activity exchange threshold to 5. [154565.539974] qla2xxx [0000:b0:00.1]-5013:2: RSCN database changed – 0078 0080 0000. [154565.545744] qla2xxx [0000:b0:00.1]-5013:2: RSCN database changed – 0078 00a0 0000. [154565.545857] qla2xxx [0000:b0:00.1]-11a2:2: FEC=enabled (data rate). [154565.552760] qla2xxx [0000:b0:00.1]-11a2:2: FEC=enabled (data rate). [154565.553079] BUG: kernel NULL pointer dereference, address: 00000000000000f8 [154565.553080] #PF: supervisor read access in kernel mode [154565.553082] #PF: error_code(0x0000) - not-present page [154565.553084] PGD 80000010488ab067 P4D 80000010488ab067 PUD 104978a067 PMD 0 [154565.553089] Oops: 0000 1 PREEMPT SMP PTI [154565.553092] CPU: 10 PID: 858 Comm: qla2xxx_2_dpc Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.11.1.el9_5.x86_64 #1 [154565.553096] Hardware name: HPE Synergy 660 Gen10/Synergy 660 Gen10 Compute Module, BIOS I43 09/30/2024 [154565.553097] RIP: 0010:qla_fab_async_scan.part.0+0x40b/0x870 [qla2xxx] [154565.553141] Code: 00 00 e8 58 a3 ec d4 49 89 e9 ba 12 20 00 00 4c 89 e6 49 c7 c0 00 ee a8 c0 48 c7 c1 66 c0 a9 c0 bf 00 80 00 10 e8 15 69 00 00 <4c> 8b 8d f8 00 00 00 4d 85 c9 74 35 49 8b 84 24 00 19 00 00 48 8b [154565.553143] RSP: 0018:ffffb4dbc8aebdd0 EFLAGS: 00010286 [154565.553145] RAX: 0000000000000000 RBX: ffff8ec2cf0908d0 RCX: 0000000000000002 [154565.553147] RDX: 0000000000000000 RSI: ffffffffc0a9c896 RDI: ffffb4dbc8aebd47 [154565.553148] RBP: 0000000000000000 R08: ffffb4dbc8aebd45 R09: 0000000000ffff0a [154565.553150] R10: 0000000000000000 R11: 000000000000000f R12: ffff8ec2cf0908d0 [154565.553151] R13: ffff8ec2cf090900 R14: 0000000000000102 R15: ffff8ec2cf084000 [154565.553152] FS: 0000000000000000(0000) GS:ffff8ed27f800000(0000) knlGS:0000000000000000 [154565.553154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [154565.553155] CR2: 00000000000000f8 CR3: 000000113ae0a005 CR4: 00000000007706f0 [154565.553157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [154565.553158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [154565.553159] PKRU: 55555554 [154565.553160] Call Trace: [154565.553162] <TASK> [154565.553165] ? show_trace_log_lvl+0x1c4/0x2df [154565.553172] ? show_trace_log_lvl+0x1c4/0x2df [154565.553177] ? qla_fab_async_scan.part.0+0x40b/0x870 [qla2xxx] [154565.553215] ? __die_body.cold+0x8/0xd [154565.553218] ? page_fault_oops+0x134/0x170 [154565.553223] ? snprintf+0x49/0x70 [154565.553229] ? exc_page_fault+0x62/0x150 [154565.553238] ? asm_exc_page_fault+0x22/0x30 Check for sp being non NULL before freeing any associated memory Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Cc: stable(a)vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy(a)marvell.com> Signed-off-by: Nilesh Javali <njavali(a)marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024(a)gmail.com> Link: https://patch.msgid.link/20251210101604.431868-10-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Zheng Qixing <zhengqixing(a)huawei.com> --- drivers/scsi/qla2xxx/qla_gs.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index d2bddca7045a..c09229b7c516 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3791,23 +3791,25 @@ int qla_fab_async_scan(scsi_qla_host_t *vha, srb_t *sp) return rval; done_free_sp: - if (sp->u.iocb_cmd.u.ctarg.req) { - dma_free_coherent(&vha->hw->pdev->dev, - sp->u.iocb_cmd.u.ctarg.req_allocated_size, - sp->u.iocb_cmd.u.ctarg.req, - sp->u.iocb_cmd.u.ctarg.req_dma); - sp->u.iocb_cmd.u.ctarg.req = NULL; - } - if (sp->u.iocb_cmd.u.ctarg.rsp) { - dma_free_coherent(&vha->hw->pdev->dev, - sp->u.iocb_cmd.u.ctarg.rsp_allocated_size, - sp->u.iocb_cmd.u.ctarg.rsp, - sp->u.iocb_cmd.u.ctarg.rsp_dma); - sp->u.iocb_cmd.u.ctarg.rsp = NULL; - } + if (sp) { + if (sp->u.iocb_cmd.u.ctarg.req) { + dma_free_coherent(&vha->hw->pdev->dev, + sp->u.iocb_cmd.u.ctarg.req_allocated_size, + sp->u.iocb_cmd.u.ctarg.req, + sp->u.iocb_cmd.u.ctarg.req_dma); + sp->u.iocb_cmd.u.ctarg.req = NULL; + } + if (sp->u.iocb_cmd.u.ctarg.rsp) { + dma_free_coherent(&vha->hw->pdev->dev, + sp->u.iocb_cmd.u.ctarg.rsp_allocated_size, + sp->u.iocb_cmd.u.ctarg.rsp, + sp->u.iocb_cmd.u.ctarg.rsp_dma); + sp->u.iocb_cmd.u.ctarg.rsp = NULL; + } - /* ref: INIT */ - kref_put(&sp->cmd_kref, qla2x00_sp_release); + /* ref: INIT */ + kref_put(&sp->cmd_kref, qla2x00_sp_release); + } spin_lock_irqsave(&vha->work_lock, flags); vha->scan.scan_flags &= ~SF_SCANNING; -- 2.39.2

2 1

[PATCH OLK-6.6] dcache: Limit the minimal number of bucket to two
by Zhihao Cheng 02 Mar '26

02 Mar '26

mailist inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13644 Reference: https://lore.kernel.org/linux-fsdevel/20260130034853.215819-1-chengzhihao1@… -------------------------------- There is an OOB read problem on dentry_hashtable when user sets 'dhash_entries=1': BUG: unable to handle page fault for address: ffff888b30b774b0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page Oops: Oops: 0000 [#1] SMP PTI RIP: 0010:__d_lookup+0x56/0x120 Call Trace: d_lookup.cold+0x16/0x5d lookup_dcache+0x27/0xf0 lookup_one_qstr_excl+0x2a/0x180 start_dirop+0x55/0xa0 simple_start_creating+0x8d/0xa0 debugfs_start_creating+0x8c/0x180 debugfs_create_dir+0x1d/0x1c0 pinctrl_init+0x6d/0x140 do_one_initcall+0x6d/0x3d0 kernel_init_freeable+0x39f/0x460 kernel_init+0x2a/0x260 There will be only one bucket in dentry_hashtable when dhash_entries is set as one, and d_hash_shift is calculated as 32 by dcache_init(). Then, following process will access more than one buckets(which memory region is not allocated) in dentry_hashtable: d_lookup b = d_hash(hash) dentry_hashtable + ((u32)hashlen >> d_hash_shift) // The C standard defines the behavior of right shift amounts // exceeding the bit width of the operand as undefined. The // result of '(u32)hashlen >> d_hash_shift' becomes 'hashlen', // so 'b' will point to an unallocated memory region. hlist_bl_for_each_entry_rcu(b) hlist_bl_first_rcu(head) h->first // read OOB! Fix it by limiting the minimal number of dentry_hashtable bucket to two, so that 'd_hash_shift' won't exceeds the bit width of type u32. Fixes: ceb5bdc2d246 ("fs: dcache per-bucket dcache hash locking") Cc: stable(a)vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> --- fs/dcache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index e0f476c567b0..ea0154f81497 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3394,7 +3394,7 @@ static void __init dcache_init_early(void) HASH_EARLY | HASH_ZERO, &d_hash_shift, NULL, - 0, + 2, 0); d_hash_shift = 32 - d_hash_shift; } @@ -3422,7 +3422,7 @@ static void __init dcache_init(void) HASH_ZERO, &d_hash_shift, NULL, - 0, + 2, 0); d_hash_shift = 32 - d_hash_shift; } -- 2.52.0

2 1