- Kernel - mailweb.openeuler.org

[PATCH openEuler-1.0-LTS] scsi: qla2xxx: Init 'nvme_ls_waitq' in qla_nvme_ls_req()
by Qi Xi 25 Sep '25

25 Sep '25

From: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICYAB4 CVE: CVE-2023-53280 -------------------------------- If we enter a error branch in qla_nvme_ls_req(), we will wakeup 'sp->nvme_ls_waitq', but it is not initilized. It will cause system crash. Fix it by initilizing 'nvme_ls_waitq' in qla_nvme_ls_req(). This commit is based on the mainline commit 20fce500b232b970e40312a9c97e7f3b6d7a709c 'scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue'. But we are still use nvme_ls_waitq wait queue because commit 219d27d7147e ("scsi: qla2xxx: Fix race conditions in the code for aborting SCSI commands") is not merged. Fixes: 5621b0dd7453 ("scsi: qla2xxx: Simpify unregistration of FC-NVMe local/remote ports") Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- drivers/scsi/qla2xxx/qla_nvme.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c index daa412667d6e..120bc13d8dcd 100644 --- a/drivers/scsi/qla2xxx/qla_nvme.c +++ b/drivers/scsi/qla2xxx/qla_nvme.c @@ -237,6 +237,7 @@ static int qla_nvme_ls_req(struct nvme_fc_local_port *lport, sp->name = "nvme_ls"; sp->done = qla_nvme_sp_ls_done; atomic_set(&sp->ref_count, 1); + init_waitqueue_head(&sp->nvme_ls_waitq); nvme = &sp->u.iocb_cmd; priv->sp = sp; priv->fd = fd; -- 2.20.1

2 1

[PATCH openEuler-1.0-LTS 0/2] fix race between memory offline and UCE handling
by Jinjiang Tu 25 Sep '25

25 Sep '25

Jinjiang Tu (1): mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range() Ma Wupeng (1): hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio mm/memory_hotplug.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) -- 2.43.0

2 3

[PATCH openEuler-1.0-LTS] watchdog: Fix kmemleak in watchdog_cdev_register
by Jinjiang Tu 25 Sep '25

25 Sep '25

From: Chen Jun <chenjun102(a)huawei.com> stable inclusion from stable-v4.19.276 commit 8c1655600f4f2839fb844fe8c70b2b65fadc7a56 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICY4HK CVE: CVE-2023-53234 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 13721a2ac66b246f5802ba1b75ad8637e53eeecc ] kmemleak reports memory leaks in watchdog_dev_register, as follows: unreferenced object 0xffff888116233000 (size 2048): comm ""modprobe"", pid 28147, jiffies 4353426116 (age 61.741s) hex dump (first 32 bytes): 80 fa b9 05 81 88 ff ff 08 30 23 16 81 88 ff ff .........0#..... 08 30 23 16 81 88 ff ff 00 00 00 00 00 00 00 00 .0#............. backtrace: [<000000007f001ffd>] __kmem_cache_alloc_node+0x157/0x220 [<000000006a389304>] kmalloc_trace+0x21/0x110 [<000000008d640eea>] watchdog_dev_register+0x4e/0x780 [watchdog] [<0000000053c9f248>] __watchdog_register_device+0x4f0/0x680 [watchdog] [<00000000b2979824>] watchdog_register_device+0xd2/0x110 [watchdog] [<000000001f730178>] 0xffffffffc10880ae [<000000007a1a8bcc>] do_one_initcall+0xcb/0x4d0 [<00000000b98be325>] do_init_module+0x1ca/0x5f0 [<0000000046d08e7c>] load_module+0x6133/0x70f0 ... unreferenced object 0xffff888105b9fa80 (size 16): comm ""modprobe"", pid 28147, jiffies 4353426116 (age 61.741s) hex dump (first 16 bytes): 77 61 74 63 68 64 6f 67 31 00 b9 05 81 88 ff ff watchdog1....... backtrace: [<000000007f001ffd>] __kmem_cache_alloc_node+0x157/0x220 [<00000000486ab89b>] __kmalloc_node_track_caller+0x44/0x1b0 [<000000005a39aab0>] kvasprintf+0xb5/0x140 [<0000000024806f85>] kvasprintf_const+0x55/0x180 [<000000009276cb7f>] kobject_set_name_vargs+0x56/0x150 [<00000000a92e820b>] dev_set_name+0xab/0xe0 [<00000000cec812c6>] watchdog_dev_register+0x285/0x780 [watchdog] [<0000000053c9f248>] __watchdog_register_device+0x4f0/0x680 [watchdog] [<00000000b2979824>] watchdog_register_device+0xd2/0x110 [watchdog] [<000000001f730178>] 0xffffffffc10880ae [<000000007a1a8bcc>] do_one_initcall+0xcb/0x4d0 [<00000000b98be325>] do_init_module+0x1ca/0x5f0 [<0000000046d08e7c>] load_module+0x6133/0x70f0 ... The reason is that put_device is not be called if cdev_device_add fails and wdd->id != 0. watchdog_cdev_register wd_data = kzalloc [1] err = dev_set_name [2] .. err = cdev_device_add if (err) { if (wdd->id == 0) { // wdd->id != 0 .. } return err; // [1],[2] would be leaked To fix it, call put_device in all wdd->id cases. Fixes: 72139dfa2464 ("watchdog: Fix the race between the release of watchdog_core_data and cdev") Signed-off-by: Chen Jun <chenjun102(a)huawei.com> Reviewed-by: Guenter Roeck <linux(a)roeck-us.net> Link: https://lore.kernel.org/r/20221116012714.102066-1-chenjun102@huawei.com Signed-off-by: Guenter Roeck <linux(a)roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim(a)linux-watchdog.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> --- drivers/watchdog/watchdog_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index 2b652df44593..e6d0247d349d 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -1052,8 +1052,8 @@ static int watchdog_cdev_register(struct watchdog_device *wdd) if (wdd->id == 0) { misc_deregister(&watchdog_miscdev); old_wd_data = NULL; - put_device(&wd_data->dev); } + put_device(&wd_data->dev); return err; } -- 2.43.0

2 1

[PATCH OLK-6.6 1/9] [Backport] workqueue: Rename __cancel_work_timer() to __cancel_timer_sync()
by Qi Xi 25 Sep '25

25 Sep '25

From: Tejun Heo <tj(a)kernel.org> mainline inclusion from mainline-v6.9-rc1 commit c5140688d19a4579f7b01e6ca4b6e5f5d23d3d4d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEANP CVE: CVE-2024-56591 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- __cancel_work_timer() is used to implement cancel_work_sync() and cancel_delayed_work_sync(), similarly to how __cancel_work() is used to implement cancel_work() and cancel_delayed_work(). ie. The _timer part of the name is a complete misnomer. The difference from __cancel_work() is the fact that it syncs against work item execution not whether it handles timers or not. Let's rename it to less confusing __cancel_work_sync(). No functional change. Signed-off-by: Tejun Heo <tj(a)kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com> Signed-off-by: Qi Xi <xiqi2(a)huawei.com> --- kernel/workqueue.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bc5e508bbe9b..e1d14bf8c54b 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3466,7 +3466,7 @@ static int cwt_wakefn(wait_queue_entry_t *wait, unsigned mode, int sync, void *k return autoremove_wake_function(wait, mode, sync, key); } -static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) +static bool __cancel_work_sync(struct work_struct *work, bool is_dwork) { static DECLARE_WAIT_QUEUE_HEAD(cancel_waitq); unsigned long flags; @@ -3550,7 +3550,7 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) */ bool cancel_work_sync(struct work_struct *work) { - return __cancel_work_timer(work, false); + return __cancel_work_sync(work, false); } EXPORT_SYMBOL_GPL(cancel_work_sync); @@ -3655,7 +3655,7 @@ EXPORT_SYMBOL(cancel_delayed_work); */ bool cancel_delayed_work_sync(struct delayed_work *dwork) { - return __cancel_work_timer(&dwork->work, true); + return __cancel_work_sync(&dwork->work, true); } EXPORT_SYMBOL(cancel_delayed_work_sync); -- 2.33.0

2 9

[PATCH openEuler-1.0-LTS 0/2] fix CVE-2022-50384 and CVE-2022-50249
by Tong Tiangen 25 Sep '25

25 Sep '25

Gaosheng Cui (1): staging: vme_user: Fix possible UAF in tsi148_dma_list_add Liang He (1): memory: of: Fix refcount leak bug in of_get_ddr_timings() drivers/memory/of_memory.c | 1 + drivers/vme/bridges/vme_tsi148.c | 1 + 2 files changed, 2 insertions(+) -- 2.25.1

2 3

[PATCH OLK-6.6] usb: core: config: Prevent OOB read in SS endpoint companion parsing
by Huang Xiaojia 25 Sep '25

25 Sep '25

From: Xinyu Liu <katieeliu(a)tencent.com> stable inclusion from stable-v6.6.103 commit 5badd56c711e2c8371d1670f9bd486697575423c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICXO12 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit cf16f408364efd8a68f39011a3b073c83a03612d upstream. usb_parse_ss_endpoint_companion() checks descriptor type before length, enabling a potentially odd read outside of the buffer size. Fix this up by checking the size first before looking at any of the fields in the descriptor. Signed-off-by: Xinyu Liu <katieeliu(a)tencent.com> Cc: stable <stable(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Huang Xiaojia <huangxiaojia2(a)huawei.com> --- drivers/usb/core/config.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index 847dd32c0f5e..3180419424c0 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -81,8 +81,14 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno, */ desc = (struct usb_ss_ep_comp_descriptor *) buffer; - if (desc->bDescriptorType != USB_DT_SS_ENDPOINT_COMP || - size < USB_DT_SS_EP_COMP_SIZE) { + if (size < USB_DT_SS_EP_COMP_SIZE) { + dev_notice(ddev, + "invalid SuperSpeed endpoint companion descriptor " + "of length %d, skipping\n", size); + return; + } + + if (desc->bDescriptorType != USB_DT_SS_ENDPOINT_COMP) { dev_notice(ddev, "No SuperSpeed endpoint companion for config %d " " interface %d altsetting %d ep %d: " "using minimum values\n", -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] cifs: prevent NULL pointer dereference in UTF16 conversion
by Long Li 25 Sep '25

25 Sep '25

From: Makar Semyonov <m.semenov(a)tssltd.ru> mainline inclusion from mainline-v6.17-rc4 commit 70bccd9855dae56942f2b18a08ba137bb54093a0 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICYXW3 CVE: CVE-2025-39838 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- There can be a NULL pointer dereference bug here. NULL is passed to __cifs_sfu_make_node without checks, which passes it unchecked to cifs_strndup_to_utf16, which in turn passes it to cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash. This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and returns NULL early to prevent dereferencing NULL pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE Signed-off-by: Makar Semyonov <m.semenov(a)tssltd.ru> Cc: stable(a)vger.kernel.org Signed-off-by: Steve French <stfrench(a)microsoft.com> Conflicts: fs/cifs/cifs_unicode.c fs/smb/client/cifs_unicode.c [Code move to fs/smb dirctory in mainline] Signed-off-by: Long Li <leo.lilong(a)huawei.com> --- fs/cifs/cifs_unicode.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/cifs/cifs_unicode.c b/fs/cifs/cifs_unicode.c index 7932e20555d2..47e38cf7ef89 100644 --- a/fs/cifs/cifs_unicode.c +++ b/fs/cifs/cifs_unicode.c @@ -633,6 +633,9 @@ cifs_strndup_to_utf16(const char *src, const int maxlen, int *utf16_len, int len; __le16 *dst; + if (!src) + return NULL; + len = cifs_local_to_utf16_bytes(src, maxlen, cp); len += 2; /* NULL */ dst = kmalloc(len, GFP_KERNEL); -- 2.39.2

2 1

[PATCH OLK-5.10] NFSD: Protect against send buffer overflow in NFSv3 READ
by Li Lingfeng 25 Sep '25

25 Sep '25

From: Chuck Lever <chuck.lever(a)oracle.com> mainline inclusion from mainline-v6.1-rc1 commit fa6be9cc6e80ec79892ddf08a8c10cabab9baf38 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICYBVR CVE: CVE-2022-50345 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: <stable(a)vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> Conflicts: fs/nfsd/nfs3proc.c [Commit 507df40ebf31 ("NFSD: Hoist rq_vec preparation into nfsd_read()") removed the definition of len in nfsd3_proc_read().] Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- fs/nfsd/nfs3proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 0b61080d6234..345b667aeadf 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -144,14 +144,14 @@ nfsd3_proc_read(struct svc_rqst *rqstp) { struct nfsd3_readargs *argp = rqstp->rq_argp; struct nfsd3_readres *resp = rqstp->rq_resp; - u32 max_blocksize = svc_max_payload(rqstp); dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n", SVCFH_fmt(&argp->fh), (unsigned long) argp->count, (unsigned long long) argp->offset); - argp->count = min_t(u32, argp->count, max_blocksize); + argp->count = min_t(u32, argp->count, svc_max_payload(rqstp)); + argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen); if (argp->offset > (u64)OFFSET_MAX) argp->offset = (u64)OFFSET_MAX; if (argp->offset + argp->count > (u64)OFFSET_MAX) -- 2.46.1

2 1

[PATCH openEuler-1.0-LTS v4] x86/irq: Plug vector setup race
by Jinjie Ruan 25 Sep '25

25 Sep '25

From: Thomas Gleixner <tglx(a)linutronix.de> mainline inclusion from mainline-v6.17-rc1 commit ce0b5eedcb753697d43f61dd2e27d68eb5d3150f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICZARI Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit ce0b5eedcb753697d43f61dd2e27d68eb5d3150f upstream. Hogan reported a vector setup race, which overwrites the interrupt descriptor in the per CPU vector array resulting in a disfunctional device. CPU0 CPU1 interrupt is raised in APIC IRR but not handled free_irq() per_cpu(vector_irq, CPU1)[vector] = VECTOR_SHUTDOWN; request_irq() common_interrupt() d = this_cpu_read(vector_irq[vector]); per_cpu(vector_irq, CPU1)[vector] = desc; if (d == VECTOR_SHUTDOWN) this_cpu_write(vector_irq[vector], VECTOR_UNUSED); free_irq() cannot observe the pending vector in the CPU1 APIC as there is no way to query the remote CPUs APIC IRR. This requires that request_irq() uses the same vector/CPU as the one which was freed, but this also can be triggered by a spurious interrupt. Interestingly enough this problem managed to be hidden for more than a decade. Prevent this by reevaluating vector_irq under the vector lock, which is held by the interrupt activation code when vector_irq is updated. To avoid ifdeffery or IS_ENABLED() nonsense, move the [un]lock_vector_lock() declarations out under the CONFIG_IRQ_DOMAIN_HIERARCHY guard as it's only provided when CONFIG_X86_LOCAL_APIC=y. The current CONFIG_IRQ_DOMAIN_HIERARCHY guard is selected by CONFIG_X86_LOCAL_APIC, but can also be selected by other parts of the Kconfig system, which makes 32-bit UP builds with CONFIG_X86_LOCAL_APIC=n fail. Can we just get rid of this !APIC nonsense once and forever? Fixes: 9345005f4eed ("x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs") Cc: stable(a)vger.kernel.org#6.6.x Cc: gregkh(a)linuxfoundation.org Reported-by: Hogan Wang <hogan.wang(a)huawei.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Hogan Wang <hogan.wang(a)huawei.com> Link: https://lore.kernel.org/all/draft-87ikjhrhhh.ffs@tglx [ Conflicts in arch/x86/kernel/irq.c because call_irq_handler() has been refactored to do apic_eoi() according to the return value. Conflicts in arch/x86/include/asm/hw_irq.h because (un)lock_vector_lock() are already controlled by CONFIG_X86_LOCAL_APIC. ] Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> --- arch/x86/kernel/irq.c | 68 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index a975246074b5..55597dc96abe 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -223,6 +223,60 @@ u64 arch_irq_stat(void) return sum; } +static struct irq_desc *reevaluate_vector(int vector) +{ + struct irq_desc *desc = __this_cpu_read(vector_irq[vector]); + + if (!IS_ERR_OR_NULL(desc)) + return desc; + + if (desc != VECTOR_RETRIGGERED && desc != VECTOR_SHUTDOWN) + pr_emerg_ratelimited("No irq handler for %d.%u\n", smp_processor_id(), vector); + else + __this_cpu_write(vector_irq[vector], VECTOR_UNUSED); + return NULL; +} + +static __always_inline bool call_irq_handler(int vector, struct pt_regs *regs) +{ + struct irq_desc *desc = __this_cpu_read(vector_irq[vector]); + + if (likely(!IS_ERR_OR_NULL(desc))) { + handle_irq(desc, regs); + return true; + } + + /* + * Reevaluate with vector_lock held to prevent a race against + * request_irq() setting up the vector: + * + * CPU0 CPU1 + * interrupt is raised in APIC IRR + * but not handled + * free_irq() + * per_cpu(vector_irq, CPU1)[vector] = VECTOR_SHUTDOWN; + * + * request_irq() common_interrupt() + * d = this_cpu_read(vector_irq[vector]); + * + * per_cpu(vector_irq, CPU1)[vector] = desc; + * + * if (d == VECTOR_SHUTDOWN) + * this_cpu_write(vector_irq[vector], VECTOR_UNUSED); + * + * This requires that the same vector on the same target CPU is + * handed out or that a spurious interrupt hits that CPU/vector. + */ + lock_vector_lock(); + desc = reevaluate_vector(vector); + unlock_vector_lock(); + + if (!desc) + return false; + + handle_irq(desc, regs); + return true; +} /* * do_IRQ handles all normal device IRQ's (the special @@ -232,7 +286,6 @@ u64 arch_irq_stat(void) __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - struct irq_desc * desc; /* high bit used in ret_from_ code */ unsigned vector = ~regs->orig_ax; @@ -241,20 +294,9 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) /* entering_irq() tells RCU that we're not quiescent. Check it. */ RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU"); - desc = __this_cpu_read(vector_irq[vector]); - - if (!handle_irq(desc, regs)) { + if (unlikely(!call_irq_handler(vector, regs))) ack_APIC_irq(); - if (desc != VECTOR_RETRIGGERED && desc != VECTOR_SHUTDOWN) { - pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n", - __func__, smp_processor_id(), - vector); - } else { - __this_cpu_write(vector_irq[vector], VECTOR_UNUSED); - } - } - exiting_irq(); set_irq_regs(old_regs); -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS v3] x86/irq: Plug vector setup race
by Jinjie Ruan 25 Sep '25

25 Sep '25

From: Thomas Gleixner <tglx(a)linutronix.de> mainline inclusion from mainline-v6.17-rc1 commit ce0b5eedcb753697d43f61dd2e27d68eb5d3150f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICZARI Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit ce0b5eedcb753697d43f61dd2e27d68eb5d3150f upstream. Hogan reported a vector setup race, which overwrites the interrupt descriptor in the per CPU vector array resulting in a disfunctional device. CPU0 CPU1 interrupt is raised in APIC IRR but not handled free_irq() per_cpu(vector_irq, CPU1)[vector] = VECTOR_SHUTDOWN; request_irq() common_interrupt() d = this_cpu_read(vector_irq[vector]); per_cpu(vector_irq, CPU1)[vector] = desc; if (d == VECTOR_SHUTDOWN) this_cpu_write(vector_irq[vector], VECTOR_UNUSED); free_irq() cannot observe the pending vector in the CPU1 APIC as there is no way to query the remote CPUs APIC IRR. This requires that request_irq() uses the same vector/CPU as the one which was freed, but this also can be triggered by a spurious interrupt. Interestingly enough this problem managed to be hidden for more than a decade. Prevent this by reevaluating vector_irq under the vector lock, which is held by the interrupt activation code when vector_irq is updated. To avoid ifdeffery or IS_ENABLED() nonsense, move the [un]lock_vector_lock() declarations out under the CONFIG_IRQ_DOMAIN_HIERARCHY guard as it's only provided when CONFIG_X86_LOCAL_APIC=y. The current CONFIG_IRQ_DOMAIN_HIERARCHY guard is selected by CONFIG_X86_LOCAL_APIC, but can also be selected by other parts of the Kconfig system, which makes 32-bit UP builds with CONFIG_X86_LOCAL_APIC=n fail. Can we just get rid of this !APIC nonsense once and forever? Fixes: 9345005f4eed ("x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs") Cc: stable(a)vger.kernel.org#6.6.x Cc: gregkh(a)linuxfoundation.org Reported-by: Hogan Wang <hogan.wang(a)huawei.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Hogan Wang <hogan.wang(a)huawei.com> Link: https://lore.kernel.org/all/draft-87ikjhrhhh.ffs@tglx [ Conflicts in arch/x86/kernel/irq.c because call_irq_handler() has been refactored to do apic_eoi() according to the return value. Conflicts in arch/x86/include/asm/hw_irq.h because (un)lock_vector_lock() are already controlled by CONFIG_X86_LOCAL_APIC. ] Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> --- arch/x86/kernel/irq.c | 68 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index a975246074b5..117a1bd9e9fa 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -223,6 +223,60 @@ u64 arch_irq_stat(void) return sum; } +static struct irq_desc *reevaluate_vector(int vector) +{ + struct irq_desc * desc = __this_cpu_read(vector_irq[vector]); + + if (!IS_ERR_OR_NULL(desc)) + return desc; + + if (desc != VECTOR_RETRIGGERED && desc != VECTOR_SHUTDOWN) + pr_emerg_ratelimited("No irq handler for %d.%u\n", smp_processor_id(), vector); + else + __this_cpu_write(vector_irq[vector], VECTOR_UNUSED); + return NULL; +} + +static __always_inline bool call_irq_handler(int vector, struct pt_regs *regs) +{ + struct irq_desc * desc = __this_cpu_read(vector_irq[vector]); + + if (likely(!IS_ERR_OR_NULL(desc))) { + handle_irq(desc, regs); + return true; + } + + /* + * Reevaluate with vector_lock held to prevent a race against + * request_irq() setting up the vector: + * + * CPU0 CPU1 + * interrupt is raised in APIC IRR + * but not handled + * free_irq() + * per_cpu(vector_irq, CPU1)[vector] = VECTOR_SHUTDOWN; + * + * request_irq() common_interrupt() + * d = this_cpu_read(vector_irq[vector]); + * + * per_cpu(vector_irq, CPU1)[vector] = desc; + * + * if (d == VECTOR_SHUTDOWN) + * this_cpu_write(vector_irq[vector], VECTOR_UNUSED); + * + * This requires that the same vector on the same target CPU is + * handed out or that a spurious interrupt hits that CPU/vector. + */ + lock_vector_lock(); + desc = reevaluate_vector(vector); + unlock_vector_lock(); + + if (!desc) + return false; + + handle_irq(desc, regs); + return true; +} /* * do_IRQ handles all normal device IRQ's (the special @@ -232,7 +286,6 @@ u64 arch_irq_stat(void) __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - struct irq_desc * desc; /* high bit used in ret_from_ code */ unsigned vector = ~regs->orig_ax; @@ -241,20 +294,9 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) /* entering_irq() tells RCU that we're not quiescent. Check it. */ RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU"); - desc = __this_cpu_read(vector_irq[vector]); - - if (!handle_irq(desc, regs)) { + if (unlikely(!call_irq_handler(vector, regs))) ack_APIC_irq(); - if (desc != VECTOR_RETRIGGERED && desc != VECTOR_SHUTDOWN) { - pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n", - __func__, smp_processor_id(), - vector); - } else { - __this_cpu_write(vector_irq[vector], VECTOR_UNUSED); - } - } - exiting_irq(); set_irq_regs(old_regs); -- 2.34.1

2 1