mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 62 participants
  • 18380 discussions
[PATCH OLK-6.6] iio: adc: ad7923: Fix buffer overflow for tx_buf and ring_xfer
by Yi Yang 06 Jan '25

06 Jan '25
From: Nuno Sa <nuno.sa(a)analog.com> stable inclusion from stable-v6.6.64 commit e5cac32721997cb8bcb208a29f4598b3faf46338 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAMO CVE: CVE-2024-56557 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 3a4187ec454e19903fd15f6e1825a4b84e59a4cd upstream. The AD7923 was updated to support devices with 8 channels, but the size of tx_buf and ring_xfer was not increased accordingly, leading to a potential buffer overflow in ad7923_update_scan_mode(). Fixes: 851644a60d20 ("iio: adc: ad7923: Add support for the ad7908/ad7918/ad7928") Cc: stable(a)vger.kernel.org Signed-off-by: Nuno Sa <nuno.sa(a)analog.com> Signed-off-by: Zicheng Qu <quzicheng(a)huawei.com> Link: https://patch.msgid.link/20241029134637.2261336-1-quzicheng@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yi Yang <yiyang13(a)huawei.com> --- drivers/iio/adc/ad7923.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iio/adc/ad7923.c b/drivers/iio/adc/ad7923.c index 9d6bf6d0927a..709ce2a50097 100644 --- a/drivers/iio/adc/ad7923.c +++ b/drivers/iio/adc/ad7923.c @@ -48,7 +48,7 @@ struct ad7923_state { struct spi_device *spi; - struct spi_transfer ring_xfer[5]; + struct spi_transfer ring_xfer[9]; struct spi_transfer scan_single_xfer[2]; struct spi_message ring_msg; struct spi_message scan_single_msg; @@ -64,7 +64,7 @@ struct ad7923_state { * Length = 8 channels + 4 extra for 8 byte timestamp */ __be16 rx_buf[12] __aligned(IIO_DMA_MINALIGN); - __be16 tx_buf[4]; + __be16 tx_buf[8]; }; struct ad7923_chip_info { -- 2.25.1
2 1
0 0
[PATCH OLK-6.6] vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages()
by Yi Yang 06 Jan '25

06 Jan '25
From: Yishai Hadas <yishaih(a)nvidia.com> mainline inclusion from mainline-v6.13-rc1 commit 22e87bf3f77c18f5982c19ffe2732ef0c7a25f16 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEGF7 CVE: CVE-2024-56742 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Fix an unwind issue in mlx5vf_add_migration_pages(). If a set of pages is allocated but fails to be added to the SG table, they need to be freed to prevent a memory leak. Any pages successfully added to the SG table will be freed as part of mlx5vf_free_data_buffer(). Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com> Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com> Link: https://lore.kernel.org/r/20241114095318.16556-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> Conflicts: drivers/vfio/pci/mlx5/main.c drivers/vfio/pci/mlx5/cmd.c [conflicts due to not mergered 821b8f6bf8489 ("vfio/mlx5: Enforce PRE_COPY support")] Signed-off-by: Yi Yang <yiyang13(a)huawei.com> --- drivers/vfio/pci/mlx5/main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 5cf2b491d15a..54bb74433ec1 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -71,6 +71,7 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, unsigned long filled; unsigned int to_fill; int ret; + int i; to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list)); page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL_ACCOUNT); @@ -91,7 +92,7 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, GFP_KERNEL_ACCOUNT); if (ret) - goto err; + goto err_append; buf->allocated_length += filled * PAGE_SIZE; /* clean input for another bulk allocation */ memset(page_list, 0, filled * sizeof(*page_list)); @@ -102,6 +103,9 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, kvfree(page_list); return 0; +err_append: + for (i = filled - 1; i >= 0; i--) + __free_page(page_list[i]); err: kvfree(page_list); return ret; -- 2.25.1
2 1
0 0
[PATCH OLK-6.6] wifi: ath10k: avoid NULL pointer error during sdio remove
by Liu Mingrui 06 Jan '25

06 Jan '25
From: Kang Yang <quic_kangyang(a)quicinc.com> mainline inclusion from mainline-v6.13-rc1 commit 95c38953cb1ecf40399a676a1f85dfe2b5780a9a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAPA CVE: CVE-2024-56599 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When running 'rmmod ath10k', ath10k_sdio_remove() will free sdio workqueue by destroy_workqueue(). But if CONFIG_INIT_ON_FREE_DEFAULT_ON is set to yes, kernel panic will happen: Call trace: destroy_workqueue+0x1c/0x258 ath10k_sdio_remove+0x84/0x94 sdio_bus_remove+0x50/0x16c device_release_driver_internal+0x188/0x25c device_driver_detach+0x20/0x2c This is because during 'rmmod ath10k', ath10k_sdio_remove() will call ath10k_core_destroy() before destroy_workqueue(). wiphy_dev_release() will finally be called in ath10k_core_destroy(). This function will free struct cfg80211_registered_device *rdev and all its members, including wiphy, dev and the pointer of sdio workqueue. Then the pointer of sdio workqueue will be set to NULL due to CONFIG_INIT_ON_FREE_DEFAULT_ON. After device release, destroy_workqueue() will use NULL pointer then the kernel panic happen. Call trace: ath10k_sdio_remove ->ath10k_core_unregister …… ->ath10k_core_stop ->ath10k_hif_stop ->ath10k_sdio_irq_disable ->ath10k_hif_power_down ->del_timer_sync(&ar_sdio->sleep_timer) ->ath10k_core_destroy ->ath10k_mac_destroy ->ieee80211_free_hw ->wiphy_free …… ->wiphy_dev_release ->destroy_workqueue Need to call destroy_workqueue() before ath10k_core_destroy(), free the work queue buffer first and then free pointer of work queue by ath10k_core_destroy(). This order matches the error path order in ath10k_sdio_probe(). No work will be queued on sdio workqueue between it is destroyed and ath10k_core_destroy() is called. Based on the call_stack above, the reason is: Only ath10k_sdio_sleep_timer_handler(), ath10k_sdio_hif_tx_sg() and ath10k_sdio_irq_disable() will queue work on sdio workqueue. Sleep timer will be deleted before ath10k_core_destroy() in ath10k_hif_power_down(). ath10k_sdio_irq_disable() only be called in ath10k_hif_stop(). ath10k_core_unregister() will call ath10k_hif_power_down() to stop hif bus, so ath10k_sdio_hif_tx_sg() won't be called anymore. Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00189 Signed-off-by: Kang Yang <quic_kangyang(a)quicinc.com> Tested-by: David Ruth <druth(a)chromium.org> Reviewed-by: David Ruth <druth(a)chromium.org> Link: https://patch.msgid.link/20241008022246.1010-1-quic_kangyang@quicinc.com Signed-off-by: Jeff Johnson <quic_jjohnson(a)quicinc.com> Conflicts: drivers/net/wireless/ath/ath10k/sdio.c Signed-off-by: Liu Mingrui <liumingrui(a)huawei.com> --- drivers/net/wireless/ath/ath10k/sdio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c index 56fbcfb80bf8..850d999615a2 100644 --- a/drivers/net/wireless/ath/ath10k/sdio.c +++ b/drivers/net/wireless/ath/ath10k/sdio.c @@ -3,6 +3,7 @@ * Copyright (c) 2004-2011 Atheros Communications Inc. * Copyright (c) 2011-2012,2017 Qualcomm Atheros, Inc. * Copyright (c) 2016-2017 Erik Stromdahl <erik.stromdahl(a)gmail.com> + * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved. */ #include <linux/module.h> @@ -2647,9 +2648,9 @@ static void ath10k_sdio_remove(struct sdio_func *func) netif_napi_del(&ar->napi); - ath10k_core_destroy(ar); - destroy_workqueue(ar_sdio->workqueue); + + ath10k_core_destroy(ar); } static const struct sdio_device_id ath10k_sdio_devices[] = { -- 2.25.1
2 1
0 0
[PATCH openEuler-22.03-LTS-SP1 0/1] backport mainline bugfix patch
by Lin Yujun 06 Jan '25

06 Jan '25
Jinjie Ruan (1): [Backport] genirq/msi: Fix off-by-one error in msi_domain_alloc() kernel/irq/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.34.1
2 2
0 0
[PATCH OLK-5.10] usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
by Liu Mingrui 06 Jan '25

06 Jan '25
From: Lianqin Hu <hulianqin(a)vivo.com> stable inclusion from stable-v5.10.232 commit 28b3c03a6790de1f6f2683919ad657840f0f0f58 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAO8 CVE: CVE-2024-56670 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 4cfbca86f6a8b801f3254e0e3c8f2b1d2d64be2b upstream. Considering that in some extreme cases, when u_serial driver is accessed by multiple threads, Thread A is executing the open operation and calling the gs_open, Thread B is executing the disconnect operation and calling the gserial_disconnect function,The port->port_usb pointer will be set to NULL. E.g. Thread A Thread B gs_open() gadget_unbind_driver() gs_start_io() composite_disconnect() gs_start_rx() gserial_disconnect() ... ... spin_unlock(&port->port_lock) status = usb_ep_queue() spin_lock(&port->port_lock) spin_lock(&port->port_lock) port->port_usb = NULL gs_free_requests(port->port_usb->in) spin_unlock(&port->port_lock) Crash This causes thread A to access a null pointer (port->port_usb is null) when calling the gs_free_requests function, causing a crash. If port_usb is NULL, the release request will be skipped as it will be done by gserial_disconnect. So add a null pointer check to gs_start_io before attempting to access the value of the pointer port->port_usb. Call trace: gs_start_io+0x164/0x25c gs_open+0x108/0x13c tty_open+0x314/0x638 chrdev_open+0x1b8/0x258 do_dentry_open+0x2c4/0x700 vfs_open+0x2c/0x3c path_openat+0xa64/0xc60 do_filp_open+0xb8/0x164 do_sys_openat2+0x84/0xf0 __arm64_sys_openat+0x70/0x9c invoke_syscall+0x58/0x114 el0_svc_common+0x80/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x38/0x68 Fixes: c1dca562be8a ("usb gadget: split out serial core") Cc: stable(a)vger.kernel.org Suggested-by: Prashanth K <quic_prashk(a)quicinc.com> Signed-off-by: Lianqin Hu <hulianqin(a)vivo.com> Acked-by: Prashanth K <quic_prashk(a)quicinc.com> Link: https://lore.kernel.org/r/TYUPR06MB62178DC3473F9E1A537DCD02D2362@TYUPR06MB6… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Liu Mingrui <liumingrui(a)huawei.com> --- drivers/usb/gadget/function/u_serial.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c index 2caccbb6e014..b3cb3bef06d2 100644 --- a/drivers/usb/gadget/function/u_serial.c +++ b/drivers/usb/gadget/function/u_serial.c @@ -569,9 +569,12 @@ static int gs_start_io(struct gs_port *port) * we didn't in gs_start_tx() */ tty_wakeup(port->port.tty); } else { - gs_free_requests(ep, head, &port->read_allocated); - gs_free_requests(port->port_usb->in, &port->write_pool, - &port->write_allocated); + /* Free reqs only if we are still connected */ + if (port->port_usb) { + gs_free_requests(ep, head, &port->read_allocated); + gs_free_requests(port->port_usb->in, &port->write_pool, + &port->write_allocated); + } status = -EIO; } -- 2.25.1
2 1
0 0
[PATCH OLK-5.10 0/1] backport mainline bugfix patch
by Lin Yujun 06 Jan '25

06 Jan '25
Jinjie Ruan (1): [Backport] genirq/msi: Fix off-by-one error in msi_domain_alloc() kernel/irq/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.34.1
2 2
0 0
[PATCH OLK-6.6] RDMA/mlx5: Move events notifier registration to be after device registration
by Tengda Wu 06 Jan '25

06 Jan '25
From: Patrisious Haddad <phaddad(a)nvidia.com> stable inclusion from stable-v6.6.64 commit 921fcf2971a1e8d3b904ba2c2905b96f4ec3d4ad category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEADY CVE: CVE-2024-53224 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit ede132a5cf559f3ab35a4c28bac4f4a6c20334d8 ] Move pkey change work initialization and cleanup from device resources stage to notifier stage, since this is the stage which handles this work events. Fix a race between the device deregistration and pkey change work by moving MLX5_IB_STAGE_DEVICE_NOTIFIER to be after MLX5_IB_STAGE_IB_REG in order to ensure that the notifier is deregistered before the device during cleanup. Which ensures there are no works that are being executed after the device has already unregistered which can cause the panic below. BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 630071 Comm: kworker/1:2 Kdump: loaded Tainted: G W OE --------- --- 5.14.0-162.6.1.el9_1.x86_64 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 02/27/2023 Workqueue: events pkey_change_handler [mlx5_ib] RIP: 0010:setup_qp+0x38/0x1f0 [mlx5_ib] Code: ee 41 54 45 31 e4 55 89 f5 53 48 89 fb 48 83 ec 20 8b 77 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 48 8b 07 48 8d 4c 24 16 <4c> 8b 38 49 8b 87 80 0b 00 00 4c 89 ff 48 8b 80 08 05 00 00 8b 40 RSP: 0018:ffffbcc54068be20 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff954054494128 RCX: ffffbcc54068be36 RDX: ffff954004934000 RSI: 0000000000000001 RDI: ffff954054494128 RBP: 0000000000000023 R08: ffff954001be2c20 R09: 0000000000000001 R10: ffff954001be2c20 R11: ffff9540260133c0 R12: 0000000000000000 R13: 0000000000000023 R14: 0000000000000000 R15: ffff9540ffcb0905 FS: 0000000000000000(0000) GS:ffff9540ffc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010625c001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_gsi_pkey_change+0x20/0x40 [mlx5_ib] process_one_work+0x1e8/0x3c0 worker_thread+0x50/0x3b0 ? rescuer_thread+0x380/0x380 kthread+0x149/0x170 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_fwctl(OE) fwctl(OE) ib_uverbs(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlx_compat(OE) psample mlxfw(OE) tls knem(OE) netconsole nfsv3 nfs_acl nfs lockd grace fscache netfs qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common rapl hv_balloon hv_utils i2c_piix4 pcspkr joydev fuse ext4 mbcache jbd2 sr_mod sd_mod cdrom t10_pi sg ata_generic pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper drm_kms_helper hv_storvsc syscopyarea hv_netvsc sysfillrect sysimgblt hid_hyperv fb_sys_fops scsi_transport_fc hyperv_keyboard drm ata_piix crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel hv_vmbus serio_raw [last unloaded: ib_core] CR2: 0000000000000000 ---[ end trace f6f8be4eae12f7bc ]--- Fixes: 7722f47e71e5 ("IB/mlx5: Create GSI transmission QPs when P_Key table is changed") Signed-off-by: Patrisious Haddad <phaddad(a)nvidia.com> Reviewed-by: Michael Guralnik <michaelgur(a)nvidia.com> Link: https://patch.msgid.link/d271ceeff0c08431b3cbbbb3e2d416f09b6d1621.173149694… Signed-off-by: Leon Romanovsky <leon(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Pu Lehui <pulehui(a)huawei.com> --- drivers/infiniband/hw/mlx5/main.c | 40 +++++++++++++--------------- drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +- 2 files changed, 20 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index bc38af6cda6e..c510484e024b 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -2899,7 +2899,6 @@ int mlx5_ib_dev_res_srq_init(struct mlx5_ib_dev *dev) static int mlx5_ib_dev_res_init(struct mlx5_ib_dev *dev) { struct mlx5_ib_resources *devr = &dev->devr; - int port; int ret; if (!MLX5_CAP_GEN(dev->mdev, xrc)) @@ -2915,10 +2914,6 @@ static int mlx5_ib_dev_res_init(struct mlx5_ib_dev *dev) return ret; } - for (port = 0; port < ARRAY_SIZE(devr->ports); ++port) - INIT_WORK(&devr->ports[port].pkey_change_work, - pkey_change_handler); - mutex_init(&devr->cq_lock); mutex_init(&devr->srq_lock); @@ -2928,16 +2923,6 @@ static int mlx5_ib_dev_res_init(struct mlx5_ib_dev *dev) static void mlx5_ib_dev_res_cleanup(struct mlx5_ib_dev *dev) { struct mlx5_ib_resources *devr = &dev->devr; - int port; - - /* - * Make sure no change P_Key work items are still executing. - * - * At this stage, the mlx5_ib_event should be unregistered - * and it ensures that no new works are added. - */ - for (port = 0; port < ARRAY_SIZE(devr->ports); ++port) - cancel_work_sync(&devr->ports[port].pkey_change_work); /* After s0/s1 init, they are not unset during the device lifetime. */ if (devr->s1) { @@ -4201,6 +4186,13 @@ static void mlx5_ib_stage_delay_drop_cleanup(struct mlx5_ib_dev *dev) static int mlx5_ib_stage_dev_notifier_init(struct mlx5_ib_dev *dev) { + struct mlx5_ib_resources *devr = &dev->devr; + int port; + + for (port = 0; port < ARRAY_SIZE(devr->ports); ++port) + INIT_WORK(&devr->ports[port].pkey_change_work, + pkey_change_handler); + dev->mdev_events.notifier_call = mlx5_ib_event; mlx5_notifier_register(dev->mdev, &dev->mdev_events); @@ -4211,8 +4203,14 @@ static int mlx5_ib_stage_dev_notifier_init(struct mlx5_ib_dev *dev) static void mlx5_ib_stage_dev_notifier_cleanup(struct mlx5_ib_dev *dev) { + struct mlx5_ib_resources *devr = &dev->devr; + int port; + mlx5r_macsec_event_unregister(dev); mlx5_notifier_unregister(dev->mdev, &dev->mdev_events); + + for (port = 0; port < ARRAY_SIZE(devr->ports); ++port) + cancel_work_sync(&devr->ports[port].pkey_change_work); } void __mlx5_ib_remove(struct mlx5_ib_dev *dev, @@ -4286,9 +4284,6 @@ static const struct mlx5_ib_profile pf_profile = { STAGE_CREATE(MLX5_IB_STAGE_DEVICE_RESOURCES, mlx5_ib_dev_res_init, mlx5_ib_dev_res_cleanup), - STAGE_CREATE(MLX5_IB_STAGE_DEVICE_NOTIFIER, - mlx5_ib_stage_dev_notifier_init, - mlx5_ib_stage_dev_notifier_cleanup), STAGE_CREATE(MLX5_IB_STAGE_ODP, mlx5_ib_odp_init_one, mlx5_ib_odp_cleanup_one), @@ -4313,6 +4308,9 @@ static const struct mlx5_ib_profile pf_profile = { STAGE_CREATE(MLX5_IB_STAGE_IB_REG, mlx5_ib_stage_ib_reg_init, mlx5_ib_stage_ib_reg_cleanup), + STAGE_CREATE(MLX5_IB_STAGE_DEVICE_NOTIFIER, + mlx5_ib_stage_dev_notifier_init, + mlx5_ib_stage_dev_notifier_cleanup), STAGE_CREATE(MLX5_IB_STAGE_POST_IB_REG_UMR, mlx5_ib_stage_post_ib_reg_umr_init, NULL), @@ -4349,9 +4347,6 @@ const struct mlx5_ib_profile raw_eth_profile = { STAGE_CREATE(MLX5_IB_STAGE_DEVICE_RESOURCES, mlx5_ib_dev_res_init, mlx5_ib_dev_res_cleanup), - STAGE_CREATE(MLX5_IB_STAGE_DEVICE_NOTIFIER, - mlx5_ib_stage_dev_notifier_init, - mlx5_ib_stage_dev_notifier_cleanup), STAGE_CREATE(MLX5_IB_STAGE_COUNTERS, mlx5_ib_counters_init, mlx5_ib_counters_cleanup), @@ -4373,6 +4368,9 @@ const struct mlx5_ib_profile raw_eth_profile = { STAGE_CREATE(MLX5_IB_STAGE_IB_REG, mlx5_ib_stage_ib_reg_init, mlx5_ib_stage_ib_reg_cleanup), + STAGE_CREATE(MLX5_IB_STAGE_DEVICE_NOTIFIER, + mlx5_ib_stage_dev_notifier_init, + mlx5_ib_stage_dev_notifier_cleanup), STAGE_CREATE(MLX5_IB_STAGE_POST_IB_REG_UMR, mlx5_ib_stage_post_ib_reg_umr_init, NULL), diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 1c83d132197f..94678e5c59dd 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -954,7 +954,6 @@ enum mlx5_ib_stages { MLX5_IB_STAGE_QP, MLX5_IB_STAGE_SRQ, MLX5_IB_STAGE_DEVICE_RESOURCES, - MLX5_IB_STAGE_DEVICE_NOTIFIER, MLX5_IB_STAGE_ODP, MLX5_IB_STAGE_COUNTERS, MLX5_IB_STAGE_CONG_DEBUGFS, @@ -963,6 +962,7 @@ enum mlx5_ib_stages { MLX5_IB_STAGE_PRE_IB_REG_UMR, MLX5_IB_STAGE_WHITELIST_UID, MLX5_IB_STAGE_IB_REG, + MLX5_IB_STAGE_DEVICE_NOTIFIER, MLX5_IB_STAGE_POST_IB_REG_UMR, MLX5_IB_STAGE_DELAY_DROP, MLX5_IB_STAGE_RESTRACK, -- 2.34.1
2 1
0 0
[PATCH OLK-5.10] memcg: fix soft lockup in the OOM process
by Chen Ridong 06 Jan '25

06 Jan '25
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBE42N Reference: https://lore.kernel.org/linux-kernel/8cf29751-7c71-52ff-5492-0019ca7b0e02@g… ---------------------------------------- A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered. watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:console_unlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintk_emit+0x193/0x280 printk+0x52/0x6e dump_task+0x114/0x130 mem_cgroup_scan_tasks+0x76/0x100 dump_header+0x1fe/0x210 oom_kill_process+0xd1/0x100 out_of_memory+0x125/0x570 mem_cgroup_out_of_memory+0xb5/0xd0 try_charge+0x720/0x770 mem_cgroup_try_charge+0x86/0x180 mem_cgroup_try_charge_delay+0x1c/0x40 do_anonymous_page+0xb5/0x390 handle_mm_fault+0xc4/0x1f0 This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process. To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' function per 1000 iterations. For global OOM, call 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads") Signed-off-by: Chen Ridong <chenridong(a)huawei.com> --- mm/memcontrol.c | 7 ++++++- mm/oom_kill.c | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3e928581ab47..f5b1322ac77e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1202,6 +1202,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, { struct mem_cgroup *iter; int ret = 0; + int i = 0; BUG_ON(memcg == root_mem_cgroup); @@ -1210,8 +1211,12 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, struct task_struct *task; css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); - while (!ret && (task = css_task_iter_next(&it))) + while (!ret && (task = css_task_iter_next(&it))) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + cond_resched(); ret = fn(task, arg); + } css_task_iter_end(&it); if (ret) { mem_cgroup_iter_break(memcg, iter); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9d595265bbf5..fe5f4f20c5c8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -43,6 +43,7 @@ #include <linux/kthread.h> #include <linux/init.h> #include <linux/mmu_notifier.h> +#include <linux/nmi.h> #include <asm/tlb.h> #include "internal.h" @@ -479,10 +480,15 @@ static void dump_tasks(struct oom_control *oc) memcg_print_bad_task(oc); } else { struct task_struct *p; + int i = 0; rcu_read_lock(); - for_each_process(p) + for_each_process(p) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + touch_softlockup_watchdog(); dump_task(p, oc); + } rcu_read_unlock(); } } -- 2.34.1
2 1
0 0
[PATCH openEuler-22.03-LTS-SP1] memcg: fix soft lockup in the OOM process
by Chen Ridong 06 Jan '25

06 Jan '25
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBE42N Reference: https://lore.kernel.org/linux-kernel/8cf29751-7c71-52ff-5492-0019ca7b0e02@g… ---------------------------------------- A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered. watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:console_unlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintk_emit+0x193/0x280 printk+0x52/0x6e dump_task+0x114/0x130 mem_cgroup_scan_tasks+0x76/0x100 dump_header+0x1fe/0x210 oom_kill_process+0xd1/0x100 out_of_memory+0x125/0x570 mem_cgroup_out_of_memory+0xb5/0xd0 try_charge+0x720/0x770 mem_cgroup_try_charge+0x86/0x180 mem_cgroup_try_charge_delay+0x1c/0x40 do_anonymous_page+0xb5/0x390 handle_mm_fault+0xc4/0x1f0 This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process. To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' function per 1000 iterations. For global OOM, call 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads") Signed-off-by: Chen Ridong <chenridong(a)huawei.com> --- mm/memcontrol.c | 7 ++++++- mm/oom_kill.c | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3e928581ab47..f5b1322ac77e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1202,6 +1202,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, { struct mem_cgroup *iter; int ret = 0; + int i = 0; BUG_ON(memcg == root_mem_cgroup); @@ -1210,8 +1211,12 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, struct task_struct *task; css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); - while (!ret && (task = css_task_iter_next(&it))) + while (!ret && (task = css_task_iter_next(&it))) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + cond_resched(); ret = fn(task, arg); + } css_task_iter_end(&it); if (ret) { mem_cgroup_iter_break(memcg, iter); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9d595265bbf5..fe5f4f20c5c8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -43,6 +43,7 @@ #include <linux/kthread.h> #include <linux/init.h> #include <linux/mmu_notifier.h> +#include <linux/nmi.h> #include <asm/tlb.h> #include "internal.h" @@ -479,10 +480,15 @@ static void dump_tasks(struct oom_control *oc) memcg_print_bad_task(oc); } else { struct task_struct *p; + int i = 0; rcu_read_lock(); - for_each_process(p) + for_each_process(p) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + touch_softlockup_watchdog(); dump_task(p, oc); + } rcu_read_unlock(); } } -- 2.34.1
2 1
0 0
[PATCH OLK-6.6] memcg: fix soft lockup in the OOM process
by Chen Ridong 06 Jan '25

06 Jan '25
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBE42N Reference: https://lore.kernel.org/linux-kernel/8cf29751-7c71-52ff-5492-0019ca7b0e02@g… ---------------------------------------- A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered. watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:console_unlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintk_emit+0x193/0x280 printk+0x52/0x6e dump_task+0x114/0x130 mem_cgroup_scan_tasks+0x76/0x100 dump_header+0x1fe/0x210 oom_kill_process+0xd1/0x100 out_of_memory+0x125/0x570 mem_cgroup_out_of_memory+0xb5/0xd0 try_charge+0x720/0x770 mem_cgroup_try_charge+0x86/0x180 mem_cgroup_try_charge_delay+0x1c/0x40 do_anonymous_page+0xb5/0x390 handle_mm_fault+0xc4/0x1f0 This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process. To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' function per 1000 iterations. For global OOM, call 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads") Signed-off-by: Chen Ridong <chenridong(a)huawei.com> --- mm/memcontrol.c | 7 ++++++- mm/oom_kill.c | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9bc0819629344..606a481afe2e2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1322,6 +1322,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, { struct mem_cgroup *iter; int ret = 0; + int i = 0; BUG_ON(mem_cgroup_is_root(memcg)); @@ -1330,8 +1331,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, struct task_struct *task; css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); - while (!ret && (task = css_task_iter_next(&it))) + while (!ret && (task = css_task_iter_next(&it))) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + cond_resched(); ret = fn(task, arg); + } css_task_iter_end(&it); if (ret) { mem_cgroup_iter_break(memcg, iter); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index dc932c5837479..f8a45a224ca2a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -44,6 +44,7 @@ #include <linux/kthread.h> #include <linux/init.h> #include <linux/mmu_notifier.h> +#include <linux/nmi.h> #include <asm/tlb.h> #include "internal.h" @@ -482,10 +483,15 @@ static void dump_tasks(struct oom_control *oc) memcg_print_bad_task(oc); } else { struct task_struct *p; + int i = 0; rcu_read_lock(); - for_each_process(p) + for_each_process(p) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + touch_softlockup_watchdog(); dump_task(p, oc); + } rcu_read_unlock(); } } -- 2.34.1
2 1
0 0
  • ← Newer
  • 1
  • ...
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • ...
  • 1838
  • Older →

HyperKitty Powered by HyperKitty