mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 44 participants
  • 18674 discussions
[PATCH openEuler-22.03-LTS 1/4] net/bnx2x: Prevent access to a freed page in page_pool
by Baogen Shang 29 Apr '24

29 Apr '24
From: Thinh Tran <thinhtr(a)linux.ibm.com> stable inclusion from stable-v5.10.215 commit 8eebff95ce9558be66a36aa7cfb43223f3ab4699 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… ------------------------- [ Upstream commit d27e2da94a42655861ca4baea30c8cd65546f25d ] Fix race condition leading to system crash during EEH error handling During EEH error recovery, the bnx2x driver's transmit timeout logic could cause a race condition when handling reset tasks. The bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(), which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload() SGEs are freed using bnx2x_free_rx_sge_range(). However, this could overlap with the EEH driver's attempt to reset the device using bnx2x_io_slot_reset(), which also tries to free SGEs. This race condition can result in system crashes due to accessing freed memory locations in bnx2x_free_rx_sge() 799 static inline void bnx2x_free_rx_sge(struct bnx2x *bp, 800 struct bnx2x_fastpath *fp, u16 index) 801 { 802 struct sw_rx_page *sw_buf = &fp->rx_page_ring[index]; 803 struct page *page = sw_buf->page; .... where sw_buf was set to NULL after the call to dma_unmap_page() by the preceding thread. EEH: Beginning: 'slot_reset' PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset() bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing... bnx2x 0011:01:00.0: enabling device (0140 -> 0142) bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0080000025065fc Oops: Kernel access of bad area, sig: 11 [#1] ..... Call Trace: [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable) [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0 [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550 [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60 [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170 [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0 [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 To solve this issue, we need to verify page pool allocations before freeing. Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") Signed-off-by: Thinh Tran <thinhtr(a)linux.ibm.com> Reviewed-by: Jiri Pirko <jiri(a)nvidia.com> Link: https://lore.kernel.org/r/20240315205535.1321-1-thinhtr@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com> --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h index d8b1824c334d..0bc1367fd649 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h @@ -1002,9 +1002,6 @@ static inline void bnx2x_set_fw_mac_addr(__le16 *fw_hi, __le16 *fw_mid, static inline void bnx2x_free_rx_mem_pool(struct bnx2x *bp, struct bnx2x_alloc_pool *pool) { - if (!pool->page) - return; - put_page(pool->page); pool->page = NULL; @@ -1015,6 +1012,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp, { int i; + if (!fp->page_pool.page) + return; + if (fp->mode == TPA_MODE_DISABLED) return; -- 2.33.0
1 0
0 0
[PATCH openEuler-22.03-LTS 0/4] backport important fix for
by Baogen Shang 29 Apr '24

29 Apr '24
From: Baogen Shang <baogen.shang(a)windriver.com> David Christensen (1): net/tg3: resolve deadlock in tg3_reset_task() during EEH Dinghao Liu (1): net: bnxt: fix a potential use-after-free in bnxt_init_tc Thinh Tran (2): net/bnx2x: Prevent access to a freed page in page_pool net/tg3: fix race condition in tg3_reset_task() drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 6 +++--- drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 1 + drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++---- 3 files changed, 17 insertions(+), 7 deletions(-) -- 2.33.0
1 0
0 0
[openeuler:OLK-6.6 7604/7624] drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:371:3: error: a randomized struct can only be initialized with a designated initializer
by kernel test robot 29 Apr '24

29 Apr '24
tree: https://gitee.com/openeuler/kernel.git OLK-6.6 head: cc726712137756f27af36c01e3fd7f9f260f639c commit: 4213ff7957de370c1cfe528c2bad1eb2e499038a [7604/7624] net/ethernet/huawei/hinic3: Add the CQM on which the RDMA depends config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240429/202404291452.urCRP2Hn-lkp@…) compiler: clang version 18.1.4 (https://github.com/llvm/llvm-project e6c3289804a67ea0bb6a86fadbe454dd93b8d855) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240429/202404291452.urCRP2Hn-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202404291452.urCRP2Hn-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:371:3: error: a randomized struct can only be initialized with a designated initializer 371 | {check_for_use_node_alloc, cqm_buf_use_node_alloc_page}, | ^ drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:372:3: error: a randomized struct can only be initialized with a designated initializer 372 | {check_for_nouse_node_alloc, cqm_buf_unused_node_alloc_page} | ^ drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:376:3: error: a randomized struct can only be initialized with a designated initializer 376 | {check_use_non_vram, cqm_buf_free_page_common} | ^ >> drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:382:25: error: invalid application of 'sizeof' to an incomplete type 'const struct malloc_memory[]' 382 | u32 malloc_funcs_num = ARRAY_SIZE(g_malloc_funcs); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kernel.h:57:32: note: expanded from macro 'ARRAY_SIZE' 57 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) | ^~~~~ >> drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c:399:23: error: invalid application of 'sizeof' to an incomplete type 'const struct free_memory[]' 399 | u32 free_funcs_num = ARRAY_SIZE(g_free_funcs); | ^~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kernel.h:57:32: note: expanded from macro 'ARRAY_SIZE' 57 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) | ^~~~~ 5 errors generated. vim +371 drivers/net/ethernet/huawei/hinic3/cqm/cqm_bitmap_table.c 369 370 static const struct malloc_memory g_malloc_funcs[] = { > 371 {check_for_use_node_alloc, cqm_buf_use_node_alloc_page}, 372 {check_for_nouse_node_alloc, cqm_buf_unused_node_alloc_page} 373 }; 374 375 static const struct free_memory g_free_funcs[] = { 376 {check_use_non_vram, cqm_buf_free_page_common} 377 }; 378 379 static s32 cqm_buf_alloc_page(struct tag_cqm_handle *cqm_handle, struct tag_cqm_buf *buf) 380 { 381 struct hinic3_hwdev *handle = cqm_handle->ex_handle; > 382 u32 malloc_funcs_num = ARRAY_SIZE(g_malloc_funcs); 383 u32 i; 384 385 for (i = 0; i < malloc_funcs_num; i++) { 386 if (g_malloc_funcs[i].check_alloc_mode && 387 g_malloc_funcs[i].malloc_func && 388 g_malloc_funcs[i].check_alloc_mode(handle, buf)) 389 return g_malloc_funcs[i].malloc_func(handle, buf); 390 } 391 392 cqm_err(handle->dev_hdl, "Unknown alloc mode\n"); 393 394 return CQM_FAIL; 395 } 396 397 static void cqm_buf_free_page(struct tag_cqm_buf *buf) 398 { > 399 u32 free_funcs_num = ARRAY_SIZE(g_free_funcs); 400 u32 i; 401 402 for (i = 0; i < free_funcs_num; i++) { 403 if (g_free_funcs[i].check_alloc_mode && 404 g_free_funcs[i].free_func && 405 g_free_funcs[i].check_alloc_mode(NULL, buf)) 406 return g_free_funcs[i].free_func(buf); 407 } 408 } 409 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
1 0
0 0
[PATCH openEuler-22.03-LTS] amdkfd: use calloc instead of kzalloc to avoid integer overflow
by Baogen Shang 29 Apr '24

29 Apr '24
From: Dave Airlie <airlied(a)redhat.com> stable inclusion from stable-v5.10.215 commit fcbd99b3c73309107e3be71f20dff9414df64f91 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… ------------------------- commit 3b0daecfeac0103aba8b293df07a0cbaf8b43f29 upstream. This uses calloc instead of doing the multiplication which might overflow. Cc: stable(a)vger.kernel.org Signed-off-by: Dave Airlie <airlied(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com> --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 8cc51cec988a..799a91a064a1 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -959,8 +959,8 @@ static int kfd_ioctl_get_process_apertures_new(struct file *filp, * nodes, but not more than args->num_of_nodes as that is * the amount of memory allocated by user */ - pa = kzalloc((sizeof(struct kfd_process_device_apertures) * - args->num_of_nodes), GFP_KERNEL); + pa = kcalloc(args->num_of_nodes, sizeof(struct kfd_process_device_apertures), + GFP_KERNEL); if (!pa) return -ENOMEM; -- 2.33.0
1 0
0 0
[PATCH openEuler-1.0-LTS] binder: check offset alignment in binder_get_object()
by Lin Yujun 29 Apr '24

29 Apr '24
From: Carlos Llamas <cmllamas(a)google.com> mainline inclusion from mainline-v6.9-rc5 commit aaef73821a3b0194a01bd23ca77774f704a04d40 category: bugfix bugzilla: 189856 CVE: CVE-2024-26926 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Commit 6d98eb95b450 ("binder: avoid potential data leakage when copying txn") introduced changes to how binder objects are copied. In doing so, it unintentionally removed an offset alignment check done through calls to binder_alloc_copy_from_buffer() -> check_buffer(). These calls were replaced in binder_get_object() with copy_from_user(), so now an explicit offset alignment check is needed here. This avoids later complications when unwinding the objects gets harder. It is worth noting this check existed prior to commit 7a67a39320df ("binder: add function to copy binder object from buffer"), likely removed due to redundancy at the time. Fixes: 6d98eb95b450 ("binder: avoid potential data leakage when copying txn") Cc: stable(a)vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas(a)google.com> Acked-by: Todd Kjos <tkjos(a)google.com> Link: https://lore.kernel.org/r/20240330190115.1877819-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Lin Yujun <linyujun809(a)huawei.com> --- drivers/android/binder.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index f98d84c5c695..723e5a919c20 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -2077,8 +2077,10 @@ static size_t binder_get_object(struct binder_proc *proc, size_t object_size = 0; read_size = min_t(size_t, sizeof(*object), buffer->data_size - offset); - if (offset > buffer->data_size || read_size < sizeof(*hdr)) + if (offset > buffer->data_size || read_size < sizeof(*hdr) || + !IS_ALIGNED(offset, sizeof(u32))) return 0; + if (u) { if (copy_from_user(object, u + offset, read_size)) return 0; -- 2.34.1
2 1
0 0
[PATCH v3 openEuler-1.0-LTS 0/1] openEuler-1.0-LTS: bugfix for mm
by Wupeng Ma 29 Apr '24

29 Apr '24
From: Ma Wupeng <mawupeng1(a)huawei.com> mm/madvise: fix potential pte_unmap_unlock pte error. Changelog since v2: - fix style problem. Miaohe Lin (1): mm/madvise: fix potential pte_unmap_unlock pte error mm/madvise.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.25.1
2 2
0 0
[PATCH openEuler-1.0-LTS] PCI/IOV: Improve performance of creating VFs concurrently
by Jialin Zhang 29 Apr '24

29 Apr '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9J08D -------------------------------- Previous commit 700a05bd8dba ("PCI/IOV: Add pci_sriov_numvfs_lock to support enable pci sriov concurrently") reduce performance of creating VFs belongs to different PFs. Fix it by checking whether a new bus will be created. Fixes: 700a05bd8dba ("PCI/IOV: Add pci_sriov_numvfs_lock to support enable pci sriov concurrently") Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/pci/iov.c | 40 +++++++++++++++++++++++++++++++++++----- drivers/pci/pci-sysfs.c | 4 ---- include/linux/pci.h | 8 ++++++++ 3 files changed, 43 insertions(+), 9 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index c5f3cd4ed766..8d1f1e436d1a 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -18,6 +18,8 @@ #define VIRTFN_ID_LEN 16 +static DEFINE_MUTEX(pci_sriov_numvfs_lock); + int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { if (!dev->is_physfn) @@ -212,6 +214,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id) return rc; } +int pci_iov_add_virtfn_locked(struct pci_dev *dev, int id) +{ + int rc; + + mutex_lock(&pci_sriov_numvfs_lock); + rc = pci_iov_add_virtfn(dev, id); + mutex_unlock(&pci_sriov_numvfs_lock); + return rc; +} + void pci_iov_remove_virtfn(struct pci_dev *dev, int id) { char buf[VIRTFN_ID_LEN]; @@ -241,6 +253,13 @@ void pci_iov_remove_virtfn(struct pci_dev *dev, int id) pci_dev_put(dev); } +void pci_iov_remove_virtfn_locked(struct pci_dev *dev, int id) +{ + mutex_lock(&pci_sriov_numvfs_lock); + pci_iov_remove_virtfn(dev, id); + mutex_unlock(&pci_sriov_numvfs_lock); +} + int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs) { return 0; @@ -337,7 +356,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) pci_cfg_access_unlock(dev); for (i = 0; i < initial; i++) { - rc = pci_iov_add_virtfn(dev, i); + if (dev->bus->number != pci_iov_virtfn_bus(dev, i)) + rc = pci_iov_add_virtfn_locked(dev, i); + else + rc = pci_iov_add_virtfn(dev, i); if (rc) goto failed; } @@ -348,8 +370,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) return 0; failed: - while (i--) - pci_iov_remove_virtfn(dev, i); + while (i--) { + if (dev->bus->number != pci_iov_virtfn_bus(dev, i)) + pci_iov_remove_virtfn_locked(dev, i); + else + pci_iov_remove_virtfn(dev, i); + } err_pcibios: iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); @@ -375,8 +401,12 @@ static void sriov_disable(struct pci_dev *dev) if (!iov->num_VFs) return; - for (i = 0; i < iov->num_VFs; i++) - pci_iov_remove_virtfn(dev, i); + for (i = 0; i < iov->num_VFs; i++) { + if (dev->bus->number != pci_iov_virtfn_bus(dev, i)) + pci_iov_remove_virtfn_locked(dev, i); + else + pci_iov_remove_virtfn(dev, i); + } iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index e35c2f0ee28c..48c56cb08652 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -31,8 +31,6 @@ static int sysfs_initialized; /* = 0 */ -static DEFINE_MUTEX(pci_sriov_numvfs_lock); - /* show configuration fields */ #define pci_config_attr(field, format_string) \ static ssize_t \ @@ -606,7 +604,6 @@ static ssize_t sriov_numvfs_store(struct device *dev, if (num_vfs > pci_sriov_get_totalvfs(pdev)) return -ERANGE; - mutex_lock(&pci_sriov_numvfs_lock); device_lock(&pdev->dev); if (num_vfs == pdev->sriov->num_VFs) @@ -643,7 +640,6 @@ static ssize_t sriov_numvfs_store(struct device *dev, exit: device_unlock(&pdev->dev); - mutex_unlock(&pci_sriov_numvfs_lock); if (ret < 0) return ret; diff --git a/include/linux/pci.h b/include/linux/pci.h index bc49349fcc53..f4bc2a7122f4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2046,7 +2046,9 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); int pci_iov_add_virtfn(struct pci_dev *dev, int id); +int pci_iov_add_virtfn_locked(struct pci_dev *dev, int id); void pci_iov_remove_virtfn(struct pci_dev *dev, int id); +void pci_iov_remove_virtfn_locked(struct pci_dev *dev, int id); int pci_num_vf(struct pci_dev *dev); int pci_vfs_assigned(struct pci_dev *dev); int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); @@ -2074,8 +2076,14 @@ static inline int pci_iov_add_virtfn(struct pci_dev *dev, int id) { return -ENOSYS; } +static inline int pci_iov_add_virtfn_locked(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} static inline void pci_iov_remove_virtfn(struct pci_dev *dev, int id) { } +static inline void pci_iov_remove_virtfn_locked(struct pci_dev *dev, + int id) { } static inline void pci_disable_sriov(struct pci_dev *dev) { } static inline int pci_num_vf(struct pci_dev *dev) { return 0; } static inline int pci_vfs_assigned(struct pci_dev *dev) -- 2.25.1
2 1
0 0
[PATCH openEuler-22.03-LTS 4/4] net/tg3: fix race condition in tg3_reset_task()
by Baogen Shang 29 Apr '24

29 Apr '24
From: Thinh Tran <thinhtr(a)linux.vnet.ibm.com> stable inclusion from stable-v5.10.215 commit 1059aa41c5a84abfab4cc7371d6b5ff2b30b6c2d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… ------------------------- [ Upstream commit 16b55b1f2269962fb6b5154b8bf43f37c9a96637 ] When an EEH error is encountered by a PCI adapter, the EEH driver modifies the PCI channel's state as shown below: enum { /* I/O channel is in normal state */ pci_channel_io_normal = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ pci_channel_io_frozen = (__force pci_channel_state_t) 2, /* PCI card is dead */ pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, }; If the same EEH error then causes the tg3 driver's transmit timeout logic to execute, the tg3_tx_timeout() function schedules a reset task via tg3_reset_task_schedule(), which may cause a race condition between the tg3 and EEH driver as both attempt to recover the HW via a reset action. EEH driver gets error event --> eeh_set_channel_state() and set device to one of error state above scheduler: tg3_reset_task() get returned error from tg3_init_hw() --> dev_close() shuts down the interface tg3_io_slot_reset() and tg3_io_resume() fail to reset/resume the device To resolve this issue, we avoid the race condition by checking the PCI channel state in the tg3_reset_task() function and skip the tg3 driver initiated reset when the PCI channel is not in the normal state. (The driver has no access to tg3 device registers at this point and cannot even complete the reset task successfully without external assistance.) We'll leave the reset procedure to be managed by the EEH driver which calls the tg3_io_error_detected(), tg3_io_slot_reset() and tg3_io_resume() functions as appropriate. Adding the same checking in tg3_dump_state() to avoid dumping all device registers when the PCI channel is not in the normal state. Signed-off-by: Thinh Tran <thinhtr(a)linux.vnet.ibm.com> Tested-by: Venkata Sai Duggi <venkata.sai.duggi(a)ibm.com> Reviewed-by: David Christensen <drc(a)linux.vnet.ibm.com> Reviewed-by: Michael Chan <michael.chan(a)broadcom.com> Link: https://lore.kernel.org/r/20231201001911.656-1-thinhtr@linux.vnet.ibm.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com> --- drivers/net/ethernet/broadcom/tg3.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 07abbc18728b..67de7b1deab1 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -6454,6 +6454,14 @@ static void tg3_dump_state(struct tg3 *tp) int i; u32 *regs; + /* If it is a PCI error, all registers will be 0xffff, + * we don't dump them out, just report the error and return + */ + if (tp->pdev->error_state != pci_channel_io_normal) { + netdev_err(tp->dev, "PCI channel ERROR!\n"); + return; + } + regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC); if (!regs) return; @@ -11186,7 +11194,8 @@ static void tg3_reset_task(struct work_struct *work) rtnl_lock(); tg3_full_lock(tp, 0); - if (tp->pcierr_recovery || !netif_running(tp->dev)) { + if (tp->pcierr_recovery || !netif_running(tp->dev) || + tp->pdev->error_state != pci_channel_io_normal) { tg3_flag_clear(tp, RESET_TASK_PENDING); tg3_full_unlock(tp); rtnl_unlock(); -- 2.33.0
1 0
0 0
[PATCH openEuler-22.03-LTS 3/4] net/tg3: resolve deadlock in tg3_reset_task() during EEH
by Baogen Shang 29 Apr '24

29 Apr '24
From: David Christensen <drc(a)linux.vnet.ibm.com> stable inclusion from stable-v5.10.215 commit 62a0806eb4d2318874563f79c8fdd6bfe34ceddd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… ------------------------- [ Upstream commit 6c4ca03bd890566d873e3593b32d034bf2f5a087 ] During EEH error injection testing, a deadlock was encountered in the tg3 driver when tg3_io_error_detected() was attempting to cancel outstanding reset tasks: crash> foreach UN bt ... PID: 159 TASK: c0000000067c6000 CPU: 8 COMMAND: "eehd" ... #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3] #7 [c00000000681faf0] eeh_report_error at c00000000004e25c ... PID: 290 TASK: c000000036e5f800 CPU: 6 COMMAND: "kworker/6:1" ... #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3] #6 [c00000003721fc60] process_one_work at c00000000019e5c4 ... PID: 296 TASK: c000000037a65800 CPU: 21 COMMAND: "kworker/21:1" ... #4 [c000000037247bc0] rtnl_lock at c000000000c940d8 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3] #6 [c000000037247c60] process_one_work at c00000000019e5c4 ... PID: 655 TASK: c000000036f49000 CPU: 16 COMMAND: "kworker/16:2" ...:1 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3] #6 [c0000000373ebc60] process_one_work at c00000000019e5c4 ... Code inspection shows that both tg3_io_error_detected() and tg3_reset_task() attempt to acquire the RTNL lock at the beginning of their code blocks. If tg3_reset_task() should happen to execute between the times when tg3_io_error_deteced() acquires the RTNL lock and tg3_reset_task_cancel() is called, a deadlock will occur. Moving tg3_reset_task_cancel() call earlier within the code block, prior to acquiring RTNL, prevents this from happening, but also exposes another deadlock issue where tg3_reset_task() may execute AFTER tg3_io_error_detected() has executed: crash> foreach UN bt PID: 159 TASK: c0000000067d2000 CPU: 9 COMMAND: "eehd" ... #4 [c000000006867a60] rtnl_lock at c000000000c940d8 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3] #6 [c000000006867b00] eeh_report_reset at c00000000004de88 ... PID: 363 TASK: c000000037564000 CPU: 6 COMMAND: "kworker/6:1" ... #3 [c000000036c1bb70] msleep at c000000000259e6c #4 [c000000036c1bba0] napi_disable at c000000000c6b848 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3] #6 [c000000036c1bc60] process_one_work at c00000000019e5c4 ... This issue can be avoided by aborting tg3_reset_task() if EEH error recovery is already in progress. Fixes: db84bf43ef23 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize") Signed-off-by: David Christensen <drc(a)linux.vnet.ibm.com> Reviewed-by: Pavan Chebbi <pavan.chebbi(a)broadcom.com> Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com> --- drivers/net/ethernet/broadcom/tg3.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 215a02ca4688..07abbc18728b 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -11186,7 +11186,7 @@ static void tg3_reset_task(struct work_struct *work) rtnl_lock(); tg3_full_lock(tp, 0); - if (!netif_running(tp->dev)) { + if (tp->pcierr_recovery || !netif_running(tp->dev)) { tg3_flag_clear(tp, RESET_TASK_PENDING); tg3_full_unlock(tp); rtnl_unlock(); @@ -18188,6 +18188,9 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev, netdev_info(netdev, "PCI I/O error detected\n"); + /* Want to make sure that the reset task doesn't run */ + tg3_reset_task_cancel(tp); + rtnl_lock(); /* Could be second call or maybe we don't have netdev yet */ @@ -18204,9 +18207,6 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev, tg3_timer_stop(tp); - /* Want to make sure that the reset task doesn't run */ - tg3_reset_task_cancel(tp); - netif_device_detach(netdev); /* Clean up software state, even if MMIO is blocked */ -- 2.33.0
1 0
0 0
[PATCH openEuler-22.03-LTS 2/4] net: bnxt: fix a potential use-after-free in bnxt_init_tc
by Baogen Shang 29 Apr '24

29 Apr '24
From: Dinghao Liu <dinghao.liu(a)zju.edu.cn> stable inclusion from stable-v5.10.215 commit 49809af89c07c51eecba64abb013c78ff6812156 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… ------------------------- [ Upstream commit d007caaaf052f82ca2340d4c7b32d04a3f5dbf3f ] When flow_indr_dev_register() fails, bnxt_init_tc will free bp->tc_info through kfree(). However, the caller function bnxt_init_one() will ignore this failure and call bnxt_shutdown_tc() on failure of bnxt_dl_register(), where a use-after-free happens. Fix this issue by setting bp->tc_info to NULL after kfree(). Fixes: 627c89d00fb9 ("bnxt_en: flow_offload: offload tunnel decap rules via indirect callbacks") Signed-off-by: Dinghao Liu <dinghao.liu(a)zju.edu.cn> Reviewed-by: Pavan Chebbi <pavan.chebbi(a)broadcom.com> Reviewed-by: Michael Chan <michael.chan(a)broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur(a)broadcom.com> Link: https://lore.kernel.org/r/20231204024004.8245-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c index 3e9b1f59e381..775d0b7521ca 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c @@ -2061,6 +2061,7 @@ int bnxt_init_tc(struct bnxt *bp) rhashtable_destroy(&tc_info->flow_table); free_tc_info: kfree(tc_info); + bp->tc_info = NULL; return rc; } -- 2.33.0
1 0
0 0
  • ← Newer
  • 1
  • ...
  • 1084
  • 1085
  • 1086
  • 1087
  • 1088
  • 1089
  • 1090
  • ...
  • 1868
  • Older →

HyperKitty Powered by HyperKitty