August 2024 - Kernel - mailweb.openeuler.org

[PATCH openEuler-22.03-LTS-SP1] RDMA/restrack: Fix potential invalid address access
by Zhengchao Shao 03 Aug '24

03 Aug '24

From: Wenchao Hao <haowenchao2(a)huawei.com> mainline inclusion from mainline-v6.10-rc1 commit ca537a34775c103f7b14d7bbd976403f1d1525d8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEOR CVE: CVE-2024-42080 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ------------------------------------------- struct rdma_restrack_entry's kern_name was set to KBUILD_MODNAME in ib_create_cq(), while if the module exited but forgot del this rdma_restrack_entry, it would cause a invalid address access in rdma_restrack_clean() when print the owner of this rdma_restrack_entry. These code is used to help find one forgotten PD release in one of the ULPs. But it is not needed anymore, so delete them. Signed-off-by: Wenchao Hao <haowenchao2(a)huawei.com> Link: https://lore.kernel.org/r/20240318092320.1215235-1-haowenchao2@huawei.com Signed-off-by: Leon Romanovsky <leon(a)kernel.org> Conflicts: drivers/infiniband/core/restrack.c [The conflict occurs because the commit 48f8a70e899f("RDMA/restrack: Add support to get resource tracking for SRQ") is not merged] Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- drivers/infiniband/core/restrack.c | 50 +----------------------------- 1 file changed, 1 insertion(+), 49 deletions(-) diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c index bbbbec5b1593..ea6ce7662456 100644 --- a/drivers/infiniband/core/restrack.c +++ b/drivers/infiniband/core/restrack.c @@ -37,21 +37,6 @@ int rdma_restrack_init(struct ib_device *dev) return 0; } -static const char *type2str(enum rdma_restrack_type type) -{ - static const char * const names[RDMA_RESTRACK_MAX] = { - [RDMA_RESTRACK_PD] = "PD", - [RDMA_RESTRACK_CQ] = "CQ", - [RDMA_RESTRACK_QP] = "QP", - [RDMA_RESTRACK_CM_ID] = "CM_ID", - [RDMA_RESTRACK_MR] = "MR", - [RDMA_RESTRACK_CTX] = "CTX", - [RDMA_RESTRACK_COUNTER] = "COUNTER", - }; - - return names[type]; -}; - /** * rdma_restrack_clean() - clean resource tracking * @dev: IB device @@ -59,47 +44,14 @@ static const char *type2str(enum rdma_restrack_type type) void rdma_restrack_clean(struct ib_device *dev) { struct rdma_restrack_root *rt = dev->res; - struct rdma_restrack_entry *e; - char buf[TASK_COMM_LEN]; - bool found = false; - const char *owner; int i; for (i = 0 ; i < RDMA_RESTRACK_MAX; i++) { struct xarray *xa = &dev->res[i].xa; - if (!xa_empty(xa)) { - unsigned long index; - - if (!found) { - pr_err("restrack: %s", CUT_HERE); - dev_err(&dev->dev, "BUG: RESTRACK detected leak of resources\n"); - } - xa_for_each(xa, index, e) { - if (rdma_is_kernel_res(e)) { - owner = e->kern_name; - } else { - /* - * There is no need to call get_task_struct here, - * because we can be here only if there are more - * get_task_struct() call than put_task_struct(). - */ - get_task_comm(buf, e->task); - owner = buf; - } - - pr_err("restrack: %s %s object allocated by %s is not freed\n", - rdma_is_kernel_res(e) ? "Kernel" : - "User", - type2str(e->type), owner); - } - found = true; - } + WARN_ON(!xa_empty(xa)); xa_destroy(xa); } - if (found) - pr_err("restrack: %s", CUT_HERE); - kfree(rt); } -- 2.34.1

2 1

[PATCH OLK-5.10] RDMA/restrack: Fix potential invalid address access
by Zhengchao Shao 03 Aug '24

03 Aug '24

From: Wenchao Hao <haowenchao2(a)huawei.com> mainline inclusion from mainline-v6.10-rc1 commit ca537a34775c103f7b14d7bbd976403f1d1525d8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEOR CVE: CVE-2024-42080 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ------------------------------------------- struct rdma_restrack_entry's kern_name was set to KBUILD_MODNAME in ib_create_cq(), while if the module exited but forgot del this rdma_restrack_entry, it would cause a invalid address access in rdma_restrack_clean() when print the owner of this rdma_restrack_entry. These code is used to help find one forgotten PD release in one of the ULPs. But it is not needed anymore, so delete them. Signed-off-by: Wenchao Hao <haowenchao2(a)huawei.com> Link: https://lore.kernel.org/r/20240318092320.1215235-1-haowenchao2@huawei.com Signed-off-by: Leon Romanovsky <leon(a)kernel.org> Conflicts: drivers/infiniband/core/restrack.c [The conflict occurs because the commit 48f8a70e899f("RDMA/restrack: Add support to get resource tracking for SRQ") is not merged] Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- drivers/infiniband/core/restrack.c | 50 +----------------------------- 1 file changed, 1 insertion(+), 49 deletions(-) diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c index bbbbec5b1593..ea6ce7662456 100644 --- a/drivers/infiniband/core/restrack.c +++ b/drivers/infiniband/core/restrack.c @@ -37,21 +37,6 @@ int rdma_restrack_init(struct ib_device *dev) return 0; } -static const char *type2str(enum rdma_restrack_type type) -{ - static const char * const names[RDMA_RESTRACK_MAX] = { - [RDMA_RESTRACK_PD] = "PD", - [RDMA_RESTRACK_CQ] = "CQ", - [RDMA_RESTRACK_QP] = "QP", - [RDMA_RESTRACK_CM_ID] = "CM_ID", - [RDMA_RESTRACK_MR] = "MR", - [RDMA_RESTRACK_CTX] = "CTX", - [RDMA_RESTRACK_COUNTER] = "COUNTER", - }; - - return names[type]; -}; - /** * rdma_restrack_clean() - clean resource tracking * @dev: IB device @@ -59,47 +44,14 @@ static const char *type2str(enum rdma_restrack_type type) void rdma_restrack_clean(struct ib_device *dev) { struct rdma_restrack_root *rt = dev->res; - struct rdma_restrack_entry *e; - char buf[TASK_COMM_LEN]; - bool found = false; - const char *owner; int i; for (i = 0 ; i < RDMA_RESTRACK_MAX; i++) { struct xarray *xa = &dev->res[i].xa; - if (!xa_empty(xa)) { - unsigned long index; - - if (!found) { - pr_err("restrack: %s", CUT_HERE); - dev_err(&dev->dev, "BUG: RESTRACK detected leak of resources\n"); - } - xa_for_each(xa, index, e) { - if (rdma_is_kernel_res(e)) { - owner = e->kern_name; - } else { - /* - * There is no need to call get_task_struct here, - * because we can be here only if there are more - * get_task_struct() call than put_task_struct(). - */ - get_task_comm(buf, e->task); - owner = buf; - } - - pr_err("restrack: %s %s object allocated by %s is not freed\n", - rdma_is_kernel_res(e) ? "Kernel" : - "User", - type2str(e->type), owner); - } - found = true; - } + WARN_ON(!xa_empty(xa)); xa_destroy(xa); } - if (found) - pr_err("restrack: %s", CUT_HERE); - kfree(rt); } -- 2.34.1

2 1

[PATCH] s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()
by Huang Xiaojia 03 Aug '24

03 Aug '24

From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> mainline inclusion from mainline-v6.11-rc1 commit df39038cd89525d465c2c8827eb64116873f141a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEF2 CVE: CVE-2024-41021 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- There is no support for HWPOISON, MEMORY_FAILURE, or ARCH_HAS_COPY_MC on s390. Therefore we do not expect to see VM_FAULT_HWPOISON in do_exception(). However, since commit af19487f00f3 ("mm: make PTE_MARKER_SWAPIN_ERROR more general"), it is possible to see VM_FAULT_HWPOISON in combination with PTE_MARKER_POISONED, even on architectures that do not support HWPOISON otherwise. In this case, we will end up on the BUG() in do_exception(). Fix this by treating VM_FAULT_HWPOISON the same as VM_FAULT_SIGBUS, similar to x86 when MEMORY_FAILURE is not configured. Also print unexpected fault flags, for easier debugging. Note that VM_FAULT_HWPOISON_LARGE is not expected, because s390 cannot support swap entries on other levels than PTE level. Cc: stable(a)vger.kernel.org # 6.6+ Fixes: af19487f00f3 ("mm: make PTE_MARKER_SWAPIN_ERROR more general") Reported-by: Yunseong Kim <yskelg(a)gmail.com> Tested-by: Yunseong Kim <yskelg(a)gmail.com> Acked-by: Alexander Gordeev <agordeev(a)linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> Message-ID: <20240715180416.3632453-1-gerald.schaefer(a)linux.ibm.com> Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com> Signed-off-by: Huang Xiaojia <huangxiaojia2(a)huawei.com> --- arch/s390/mm/fault.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index f5463535013a..0f3ce563056f 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -331,14 +331,16 @@ static noinline void do_fault_error(struct pt_regs *regs, vm_fault_t fault) do_no_context(regs, fault); else do_sigsegv(regs, SEGV_MAPERR); - } else if (fault & VM_FAULT_SIGBUS) { + } else if (fault & (VM_FAULT_SIGBUS | VM_FAULT_HWPOISON)) { /* Kernel mode? Handle exceptions or die */ if (!user_mode(regs)) do_no_context(regs, fault); else do_sigbus(regs); - } else + } else { + pr_emerg("Unexpected fault flags: %08x\n", fault); BUG(); + } break; } } -- 2.34.1

1 0

[PATCH OLK-6.6] nvme: avoid double free special payload
by Li Lingfeng 03 Aug '24

03 Aug '24

From: Chunguang Xu <chunguang.xu(a)shopee.com> stable inclusion from stable-v6.6.42 commit ae84383c96d6662c24697ab6b44aae855ab670aa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEML CVE: CVE-2024-41073 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… --------------------------- [ Upstream commit e5d574ab37f5f2e7937405613d9b1a724811e5ad ] If a discard request needs to be retried, and that retry may fail before a new special payload is added, a double free will result. Clear the RQF_SPECIAL_LOAD when the request is cleaned. Signed-off-by: Chunguang Xu <chunguang.xu(a)shopee.com> Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me> Reviewed-by: Max Gurtovoy <mgurtovoy(a)nvidia.com> Signed-off-by: Keith Busch <kbusch(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/nvme/host/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 94a0916f9cb7..e969da0a681b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -938,6 +938,7 @@ void nvme_cleanup_cmd(struct request *req) clear_bit_unlock(0, &ctrl->discard_page_busy); else kfree(bvec_virt(&req->special_vec)); + req->rq_flags &= ~RQF_SPECIAL_PAYLOAD; } } EXPORT_SYMBOL_GPL(nvme_cleanup_cmd); -- 2.31.1

2 1

[PATCH openEuler-1.0-LTS 1/1] usb: atm: cxacru: fix endpoint checking in cxacru_bind()
by Luo Gengkun 03 Aug '24

03 Aug '24

From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru> mainline inclusion from mainline-v6.10-rc6 commit 2eabb655a968b862bc0c31629a09f0fbf3c80d51 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEOL CVE: CVE-2024-41097 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Syzbot is still reporting quite an old issue [1] that occurs due to incomplete checking of present usb endpoints. As such, wrong endpoints types may be used at urb sumbitting stage which in turn triggers a warning in usb_submit_urb(). Fix the issue by verifying that required endpoint types are present for both in and out endpoints, taking into account cmd endpoint type. Unfortunately, this patch has not been tested on real hardware. [1] Syzbot report: usb 1-1: BOGUS urb xfer, pipe 1 != type 3 WARNING: CPU: 0 PID: 8667 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502 Modules linked in: CPU: 0 PID: 8667 Comm: kworker/0:4 Not tainted 5.14.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event RIP: 0010:usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502 ... Call Trace: cxacru_cm+0x3c0/0x8e0 drivers/usb/atm/cxacru.c:649 cxacru_card_status+0x22/0xd0 drivers/usb/atm/cxacru.c:760 cxacru_bind+0x7ac/0x11a0 drivers/usb/atm/cxacru.c:1209 usbatm_usb_probe+0x321/0x1ae0 drivers/usb/atm/usbatm.c:1055 cxacru_usb_probe+0xdf/0x1e0 drivers/usb/atm/cxacru.c:1363 usb_probe_interface+0x315/0x7f0 drivers/usb/core/driver.c:396 call_driver_probe drivers/base/dd.c:517 [inline] really_probe+0x23c/0xcd0 drivers/base/dd.c:595 __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777 __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427 __device_attach+0x228/0x4a0 drivers/base/dd.c:965 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487 device_add+0xc2f/0x2180 drivers/base/core.c:3354 usb_set_configuration+0x113a/0x1910 drivers/usb/core/message.c:2170 usb_generic_driver_probe+0xba/0x100 drivers/usb/core/generic.c:238 usb_probe_device+0xd9/0x2c0 drivers/usb/core/driver.c:293 Reported-and-tested-by: syzbot+00c18ee8497dd3be6ade(a)syzkaller.appspotmail.com Fixes: 902ffc3c707c ("USB: cxacru: Use a bulk/int URB to access the command endpoint") Cc: stable <stable(a)kernel.org> Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru> Link: https://lore.kernel.org/r/20240609131546.3932-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Luo Gengkun <luogengkun2(a)huawei.com> --- drivers/usb/atm/cxacru.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/usb/atm/cxacru.c b/drivers/usb/atm/cxacru.c index e62a770a5d3b..1d2c736dbf6a 100644 --- a/drivers/usb/atm/cxacru.c +++ b/drivers/usb/atm/cxacru.c @@ -1134,6 +1134,7 @@ static int cxacru_bind(struct usbatm_data *usbatm_instance, struct cxacru_data *instance; struct usb_device *usb_dev = interface_to_usbdev(intf); struct usb_host_endpoint *cmd_ep = usb_dev->ep_in[CXACRU_EP_CMD]; + struct usb_endpoint_descriptor *in, *out; int ret; /* instance init */ @@ -1180,6 +1181,19 @@ static int cxacru_bind(struct usbatm_data *usbatm_instance, goto fail; } + if (usb_endpoint_xfer_int(&cmd_ep->desc)) + ret = usb_find_common_endpoints(intf->cur_altsetting, + NULL, NULL, &in, &out); + else + ret = usb_find_common_endpoints(intf->cur_altsetting, + &in, &out, NULL, NULL); + + if (ret) { + usb_err(usbatm_instance, "cxacru_bind: interface has incorrect endpoints\n"); + ret = -ENODEV; + goto fail; + } + if ((cmd_ep->desc.bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_INT) { usb_fill_int_urb(instance->rcv_urb, -- 2.34.1

2 1

[PATCH OLK-6.6] s390/pkey: Wipe copies of clear-key structures on failure
by Li Huafei 03 Aug '24

03 Aug '24

From: Holger Dengler <dengler(a)linux.ibm.com> mainline inclusion from mainline-v6.10-rc1 commit d65d76a44ffe74c73298ada25b0f578680576073 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAH6LY CVE: CVE-2024-42156 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Wipe all sensitive data from stack for all IOCTLs, which convert a clear-key into a protected- or secure-key. Reviewed-by: Harald Freudenberger <freude(a)linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki(a)linux.ibm.com> Acked-by: Heiko Carstens <hca(a)linux.ibm.com> Signed-off-by: Holger Dengler <dengler(a)linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com> Conflicts: drivers/s390/crypto/pkey_api.c [ Resolved context conflict due to commit 6d749b4e0208 ("s390/pkey: introduce dynamic debugging for pkey") not backport. ] Signed-off-by: Li Huafei <lihuafei1(a)huawei.com> --- drivers/s390/crypto/pkey_api.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/s390/crypto/pkey_api.c b/drivers/s390/crypto/pkey_api.c index d2ffdf2491da..70fcb5c40cfe 100644 --- a/drivers/s390/crypto/pkey_api.c +++ b/drivers/s390/crypto/pkey_api.c @@ -1366,9 +1366,7 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd, rc = cca_clr2seckey(kcs.cardnr, kcs.domain, kcs.keytype, kcs.clrkey.clrkey, kcs.seckey.seckey); DEBUG_DBG("%s cca_clr2seckey()=%d\n", __func__, rc); - if (rc) - break; - if (copy_to_user(ucs, &kcs, sizeof(kcs))) + if (!rc && copy_to_user(ucs, &kcs, sizeof(kcs))) rc = -EFAULT; memzero_explicit(&kcs, sizeof(kcs)); break; @@ -1401,9 +1399,7 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd, kcp.protkey.protkey, &kcp.protkey.len, &kcp.protkey.type); DEBUG_DBG("%s pkey_clr2protkey()=%d\n", __func__, rc); - if (rc) - break; - if (copy_to_user(ucp, &kcp, sizeof(kcp))) + if (!rc && copy_to_user(ucp, &kcp, sizeof(kcp))) rc = -EFAULT; memzero_explicit(&kcp, sizeof(kcp)); break; @@ -1555,11 +1551,14 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd, if (copy_from_user(&kcs, ucs, sizeof(kcs))) return -EFAULT; apqns = _copy_apqns_from_user(kcs.apqns, kcs.apqn_entries); - if (IS_ERR(apqns)) + if (IS_ERR(apqns)) { + memzero_explicit(&kcs, sizeof(kcs)); return PTR_ERR(apqns); + } kkey = kzalloc(klen, GFP_KERNEL); if (!kkey) { kfree(apqns); + memzero_explicit(&kcs, sizeof(kcs)); return -ENOMEM; } rc = pkey_clr2seckey2(apqns, kcs.apqn_entries, @@ -1569,15 +1568,18 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd, kfree(apqns); if (rc) { kfree(kkey); + memzero_explicit(&kcs, sizeof(kcs)); break; } if (kcs.key) { if (kcs.keylen < klen) { kfree(kkey); + memzero_explicit(&kcs, sizeof(kcs)); return -EINVAL; } if (copy_to_user(kcs.key, kkey, klen)) { kfree(kkey); + memzero_explicit(&kcs, sizeof(kcs)); return -EFAULT; } } -- 2.25.1

2 1

[PATCH OLK-5.10] mm: thp: support to control numa migration
by Nanyong Sun 03 Aug '24

03 Aug '24

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAHJKC CVE: NA -------------------------------- Sometimes migrate THP is not beneficial, for example, when 64K page size is set on ARM64, THP will be 512M, migration may result in performance regression. This featrue add a interface to contrl THP migration when do numa balancing: /sys/kernel/mm/transparent_hugepage/numa_control Default value is 0 which means do nothing. Write 1 to disable migrate THP while task still have chance to migrate. Write 2 to disable autonuma for THP totally. Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> --- Documentation/admin-guide/mm/transhuge.rst | 9 ++++++ arch/arm64/Kconfig | 1 + include/linux/huge_mm.h | 24 +++++++++++++++ mm/Kconfig | 10 ++++++ mm/huge_memory.c | 36 ++++++++++++++++++++++ mm/mem_sampling.c | 3 ++ mm/migrate.c | 3 ++ 7 files changed, 86 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 2bfb380e8380..1038ff8e184d 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,15 @@ library) may want to know the size (in bytes) of a transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +If CONFIG_THP_NUMA_CONTROL is on, user can control THP migration when +do numa balancing, 0 is default which means keep the default behavior, +writing 1 will disable thp migrate while tasks still have chance to +migrate, writing 2 will skip THP totally from numa balancing:: + + echo 0 > /sys/kernel/mm/transparent_hugepage/numa_control + echo 1 > /sys/kernel/mm/transparent_hugepage/numa_control + echo 2 > /sys/kernel/mm/transparent_hugepage/numa_control + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index cae54a9bf65d..bde9ec4af773 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -216,6 +216,7 @@ config ARM64 select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK select HAVE_LIVEPATCH_WO_FTRACE + select THP_NUMA_CONTROL if ARM64_64K_PAGES help ARM 64-bit (AArch64) Linux support. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index efb370e79ac3..c2bf15d2d969 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -498,6 +498,30 @@ static inline unsigned long thp_size(struct page *page) return PAGE_SIZE << thp_order(page); } +#ifdef CONFIG_THP_NUMA_CONTROL +#define THP_DISABLE_NUMA_MIGRATE 1 +#define THP_DISABLE_AUTONUMA 2 +extern unsigned long thp_numa_control; +static inline bool thp_numa_migrate_disabled(void) +{ + return thp_numa_control == THP_DISABLE_NUMA_MIGRATE; +} + +static inline bool thp_autonuma_disabled(void) +{ + return thp_numa_control == THP_DISABLE_AUTONUMA; +} +#else +static inline bool thp_numa_migrate_disabled(void) +{ + return false; +} + +static inline bool thp_numa_migrate_disabled(void) +{ + return false; +} +#endif /* * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to * limitations in the implementation like arm64 MTE can override this to diff --git a/mm/Kconfig b/mm/Kconfig index ccbad233f2b1..cc43f5124cb3 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1038,6 +1038,16 @@ config NUMABALANCING_MEM_SAMPLING if unsure, say N to disable the NUMABALANCING_MEM_SAMPLING. +config THP_NUMA_CONTROL + bool "Control THP migration when numa balancing" + depends on NUMA_BALANCING && TRANSPARENT_HUGEPAGE + default n + help + Sometimes migrate THP is not beneficial, for example, when 64K page + size is set on ARM64, THP will be 512M, migration will be expensive. + This featrue add a switch to control the behavior of THP migration + when do numa balancing. + source "mm/damon/Kconfig" endmenu diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb293d17a104..332f712906e1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -316,6 +316,36 @@ static ssize_t hpage_pmd_size_show(struct kobject *kobj, static struct kobj_attribute hpage_pmd_size_attr = __ATTR_RO(hpage_pmd_size); +#ifdef CONFIG_THP_NUMA_CONTROL +unsigned long thp_numa_control; + +static ssize_t numa_control_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", thp_numa_control); +} + +static ssize_t numa_control_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + unsigned long value; + int ret; + + ret = kstrtoul(buf, 10, &value); + if (ret < 0) + return ret; + if (value > THP_DISABLE_AUTONUMA) + return -EINVAL; + + thp_numa_control = value; + + return count; +} + +static struct kobj_attribute numa_control_attr = + __ATTR(numa_control, 0644, numa_control_show, numa_control_store); +#endif + static struct attribute *hugepage_attr[] = { &enabled_attr.attr, &defrag_attr.attr, @@ -323,6 +353,9 @@ static struct attribute *hugepage_attr[] = { &hpage_pmd_size_attr.attr, #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, +#endif +#ifdef CONFIG_THP_NUMA_CONTROL + &numa_control_attr.attr, #endif NULL, }; @@ -1743,6 +1776,9 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, if (prot_numa && !thp_migration_supported()) return 1; + if (prot_numa && thp_autonuma_disabled()) + return 1; + ptl = __pmd_trans_huge_lock(pmd, vma); if (!ptl) return 0; diff --git a/mm/mem_sampling.c b/mm/mem_sampling.c index 1d8a831be531..ffc0e4964cad 100644 --- a/mm/mem_sampling.c +++ b/mm/mem_sampling.c @@ -145,6 +145,9 @@ static inline void do_thp_numa_access(struct mm_struct *mm, pmd_t *pmd, pmde; spinlock_t *ptl; + if (thp_autonuma_disabled()) + return; + pgd = pgd_offset(mm, vaddr); if (!pgd_present(*pgd)) return; diff --git a/mm/migrate.c b/mm/migrate.c index 3f5b217d5af1..faaa7b790da0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2157,6 +2157,9 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, */ compound = PageTransHuge(page); + if (compound && thp_numa_migrate_disabled()) + return 0; + if (compound) new = alloc_misplaced_dst_page_thp; else -- 2.25.1

2 1

[PATCH openEuler-22.03-LTS-SP1] sched/deadline: Fix task_struct reference leak
by Zheng Zucheng 03 Aug '24

03 Aug '24

From: Wander Lairson Costa <wander(a)redhat.com> mainline inclusion from mainline-v6.10-rc5 commit b58652db66c910c2245f5bee7deca41c12d707b9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEN4 CVE: CVE-2024-41023 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- During the execution of the following stress test with linux-rt: stress-ng --cyclic 30 --timeout 30 --minimize --quiet kmemleak frequently reported a memory leak concerning the task_struct: unreferenced object 0xffff8881305b8000 (size 16136): comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s) object hex dump (first 32 bytes): 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ debug hex dump (first 16 bytes): 53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S............... backtrace: [<00000000046b6790>] dup_task_struct+0x30/0x540 [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0 [<00000000ced59777>] kernel_clone+0xb0/0x770 [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0 [<000000001dbf2008>] do_syscall_64+0x5d/0xf0 [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 The issue occurs in start_dl_timer(), which increments the task_struct reference count and sets a timer. The timer callback, dl_task_timer, is supposed to decrement the reference count upon expiration. However, if enqueue_task_dl() is called before the timer expires and cancels it, the reference count is not decremented, leading to the leak. This patch fixes the reference leak by ensuring the task_struct reference count is properly decremented when the timer is canceled. Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing") Signed-off-by: Wander Lairson Costa <wander(a)redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Acked-by: Juri Lelli <juri.lelli(a)redhat.com> Link: https://lore.kernel.org/r/20240620125618.11419-1-wander@redhat.com Conflicts: kernel/sched/deadline.c [remove !dl_server(&p->dl) condition] Signed-off-by: Zheng Zucheng <zhengzucheng(a)huawei.com> --- kernel/sched/deadline.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 71b55d9de2d9..cff58dba27ff 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1528,8 +1528,12 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) * The replenish timer needs to be canceled. No * problem if it fires concurrently: boosted threads * are ignored in dl_task_timer(). + * + * If the timer callback was running (hrtimer_try_to_cancel == -1), + * it will eventually call put_task_struct(). */ - hrtimer_try_to_cancel(&p->dl.dl_timer); + if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1) + put_task_struct(p); p->dl.dl_throttled = 0; } } else if (!dl_prio(p->normal_prio)) { -- 2.34.1

2 1

[PATCH OLK-6.6] sched/deadline: Fix task_struct reference leak
by Zheng Zucheng 03 Aug '24

03 Aug '24

From: Wander Lairson Costa <wander(a)redhat.com> mainline inclusion from mainline-v6.10-rc5 commit b58652db66c910c2245f5bee7deca41c12d707b9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEN4 CVE: CVE-2024-41023 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- During the execution of the following stress test with linux-rt: stress-ng --cyclic 30 --timeout 30 --minimize --quiet kmemleak frequently reported a memory leak concerning the task_struct: unreferenced object 0xffff8881305b8000 (size 16136): comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s) object hex dump (first 32 bytes): 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ debug hex dump (first 16 bytes): 53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S............... backtrace: [<00000000046b6790>] dup_task_struct+0x30/0x540 [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0 [<00000000ced59777>] kernel_clone+0xb0/0x770 [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0 [<000000001dbf2008>] do_syscall_64+0x5d/0xf0 [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 The issue occurs in start_dl_timer(), which increments the task_struct reference count and sets a timer. The timer callback, dl_task_timer, is supposed to decrement the reference count upon expiration. However, if enqueue_task_dl() is called before the timer expires and cancels it, the reference count is not decremented, leading to the leak. This patch fixes the reference leak by ensuring the task_struct reference count is properly decremented when the timer is canceled. Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing") Signed-off-by: Wander Lairson Costa <wander(a)redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Acked-by: Juri Lelli <juri.lelli(a)redhat.com> Link: https://lore.kernel.org/r/20240620125618.11419-1-wander@redhat.com Conflicts: kernel/sched/deadline.c [remove !dl_server(&p->dl) condition] Signed-off-by: Zheng Zucheng <zhengzucheng(a)huawei.com> --- kernel/sched/deadline.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 71b55d9de2d9..cff58dba27ff 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1528,8 +1528,12 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) * The replenish timer needs to be canceled. No * problem if it fires concurrently: boosted threads * are ignored in dl_task_timer(). + * + * If the timer callback was running (hrtimer_try_to_cancel == -1), + * it will eventually call put_task_struct(). */ - hrtimer_try_to_cancel(&p->dl.dl_timer); + if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1) + put_task_struct(p); p->dl.dl_throttled = 0; } } else if (!dl_prio(p->normal_prio)) { -- 2.34.1

2 1

[PATCH OLK-5.10] sched/deadline: Fix task_struct reference leak
by Zheng Zucheng 03 Aug '24

03 Aug '24

From: Wander Lairson Costa <wander(a)redhat.com> mainline inclusion from mainline-v6.10-rc5 commit b58652db66c910c2245f5bee7deca41c12d707b9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEN4 CVE: CVE-2024-41023 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- During the execution of the following stress test with linux-rt: stress-ng --cyclic 30 --timeout 30 --minimize --quiet kmemleak frequently reported a memory leak concerning the task_struct: unreferenced object 0xffff8881305b8000 (size 16136): comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s) object hex dump (first 32 bytes): 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ debug hex dump (first 16 bytes): 53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S............... backtrace: [<00000000046b6790>] dup_task_struct+0x30/0x540 [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0 [<00000000ced59777>] kernel_clone+0xb0/0x770 [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0 [<000000001dbf2008>] do_syscall_64+0x5d/0xf0 [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 The issue occurs in start_dl_timer(), which increments the task_struct reference count and sets a timer. The timer callback, dl_task_timer, is supposed to decrement the reference count upon expiration. However, if enqueue_task_dl() is called before the timer expires and cancels it, the reference count is not decremented, leading to the leak. This patch fixes the reference leak by ensuring the task_struct reference count is properly decremented when the timer is canceled. Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing") Signed-off-by: Wander Lairson Costa <wander(a)redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Acked-by: Juri Lelli <juri.lelli(a)redhat.com> Link: https://lore.kernel.org/r/20240620125618.11419-1-wander@redhat.com Conflicts: kernel/sched/deadline.c [remove !dl_server(&p->dl) condition] Signed-off-by: Zheng Zucheng <zhengzucheng(a)huawei.com> --- kernel/sched/deadline.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 71b55d9de2d9..cff58dba27ff 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1528,8 +1528,12 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) * The replenish timer needs to be canceled. No * problem if it fires concurrently: boosted threads * are ignored in dl_task_timer(). + * + * If the timer callback was running (hrtimer_try_to_cancel == -1), + * it will eventually call put_task_struct(). */ - hrtimer_try_to_cancel(&p->dl.dl_timer); + if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1) + put_task_struct(p); p->dl.dl_throttled = 0; } } else if (!dl_prio(p->normal_prio)) { -- 2.34.1

2 1