August 2024 - Kernel - mailweb.openeuler.org

[PATCH openEuler-22.03-LTS-SP1] ibmvnic: don't release napi in __ibmvnic_open()
by Guo Mengqi 27 Aug '24

27 Aug '24

From: Sukadev Bhattiprolu <sukadev(a)linux.ibm.com> mainline inclusion from mainline-v5.17-rc4 commit 61772b0908c640d0309c40f7d41d062ca4e979fa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IADGL6 CVE: CVE-2022-48811 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- If __ibmvnic_open() encounters an error such as when setting link state, it calls release_resources() which frees the napi structures needlessly. Instead, have __ibmvnic_open() only clean up the work it did so far (i.e. disable napi and irqs) and leave the rest to the callers. If caller of __ibmvnic_open() is ibmvnic_open(), it should release the resources immediately. If the caller is do_reset() or do_hard_reset(), they will release the resources on the next reset. This fixes following crash that occurred when running the drmgr command several times to add/remove a vnic interface: [102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq [102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq [102056] ibmvnic 30000003 env3: Replenished 8 pools Kernel attempted to read user page (10) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000010 Faulting instruction address: 0xc000000000a3c840 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e2e37 #1 Workqueue: events_long __ibmvnic_reset [ibmvnic] NIP: c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820 REGS: c0000000548e37e0 TRAP: 0300 Not tainted (5.16.0-rc5-autotest-g6441998e2e37) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28248484 XER: 00000004 CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000 ... NIP [c000000000a3c840] napi_enable+0x20/0xc0 LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic] Call Trace: [c0000000548e3a80] [0000000000000006] 0x6 (unreliable) [c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic] [c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic] [c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570 [c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660 [c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0 [c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010 38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020 ---[ end trace 5f8033b08fd27706 ]--- Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Reported-by: Abdul Haleem <abdhalee(a)linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev(a)linux.ibm.com> Reviewed-by: Dany Madden <drt(a)linux.ibm.com> Link: https://lore.kernel.org/r/20220208001918.900602-1-sukadev@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Conflicts: drivers/net/ethernet/ibm/ibmvnic.c [commit bbd809305bc7 invocate release_rx_pools() if resources initialization failed, which not merged lead to conflicts] Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- drivers/net/ethernet/ibm/ibmvnic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 48757a59fadd..6e0092937017 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -109,6 +109,7 @@ static void release_crq_queue(struct ibmvnic_adapter *); static int __ibmvnic_set_mac(struct net_device *, u8 *); static int init_crq_queue(struct ibmvnic_adapter *adapter); static int send_query_phys_parms(struct ibmvnic_adapter *adapter); +static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter); struct ibmvnic_stat { char name[ETH_GSTRING_LEN]; @@ -1167,7 +1168,7 @@ static int __ibmvnic_open(struct net_device *netdev) rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP); if (rc) { ibmvnic_napi_disable(adapter); - release_resources(adapter); + ibmvnic_disable_irqs(adapter); return rc; } @@ -1203,7 +1204,6 @@ static int ibmvnic_open(struct net_device *netdev) rc = init_resources(adapter); if (rc) { netdev_err(netdev, "failed to initialize resources\n"); - release_resources(adapter); goto out; } } @@ -1219,6 +1219,10 @@ static int ibmvnic_open(struct net_device *netdev) adapter->state = VNIC_OPEN; rc = 0; } + + if (rc) + release_resources(adapter); + return rc; } -- 2.17.1

2 1

[PATCH openEuler-22.03-LTS-SP1] nvme-fabrics: use reserved tag for reg read/write command
by Guo Mengqi 27 Aug '24

27 Aug '24

From: Chunguang Xu <chunguang.xu(a)shopee.com> mainline inclusion from mainline-v6.10-rc3 commit 7dc3bfcb4c9cc58970fff6aaa48172cb224d85aa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEKB CVE: CVE-2024-41082 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In some scenarios, if too many commands are issued by nvme command in the same time by user tasks, this may exhaust all tags of admin_q. If a reset (nvme reset or IO timeout) occurs before these commands finish, reconnect routine may fail to update nvme regs due to insufficient tags, which will cause kernel hang forever. In order to workaround this issue, maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved tags. This maybe safe for nvmf: 1. For the disable ctrl path, we will not issue connect command 2. For the enable ctrl / fw activate path, since connect and reg_xx() are called serially. So the reserved tags may still be enough while reg_xx() use reserved tags. Signed-off-by: Chunguang Xu <chunguang.xu(a)shopee.com> Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Keith Busch <kbusch(a)kernel.org> Conflicts: drivers/nvme/host/fabrics.c [Ma Wupeng: BLK_MQ_REQ_RESERVED is replaced by NVME_SUBMIT_RESERVED in v6.8] Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- drivers/nvme/host/fabrics.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 7015fba2e512..2687454351bc 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -151,7 +151,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val) cmd.prop_get.offset = cpu_to_le32(off); ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0, - NVME_QID_ANY, 0, 0, false); + NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false); if (ret >= 0) *val = le64_to_cpu(res.u64); @@ -198,7 +198,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val) cmd.prop_get.offset = cpu_to_le32(off); ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0, - NVME_QID_ANY, 0, 0, false); + NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false); if (ret >= 0) *val = le64_to_cpu(res.u64); @@ -244,7 +244,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val) cmd.prop_set.value = cpu_to_le64(val); ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0, 0, - NVME_QID_ANY, 0, 0, false); + NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false); if (unlikely(ret)) dev_err(ctrl->device, "Property Set error: %d, offset %#x\n", -- 2.17.1

2 1

[PATCH OLK-5.10] nvme-fabrics: use reserved tag for reg read/write command
by Guo Mengqi 27 Aug '24

27 Aug '24

From: Chunguang Xu <chunguang.xu(a)shopee.com> mainline inclusion from mainline-v6.10-rc3 commit 7dc3bfcb4c9cc58970fff6aaa48172cb224d85aa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEKB CVE: CVE-2024-41082 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In some scenarios, if too many commands are issued by nvme command in the same time by user tasks, this may exhaust all tags of admin_q. If a reset (nvme reset or IO timeout) occurs before these commands finish, reconnect routine may fail to update nvme regs due to insufficient tags, which will cause kernel hang forever. In order to workaround this issue, maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved tags. This maybe safe for nvmf: 1. For the disable ctrl path, we will not issue connect command 2. For the enable ctrl / fw activate path, since connect and reg_xx() are called serially. So the reserved tags may still be enough while reg_xx() use reserved tags. Signed-off-by: Chunguang Xu <chunguang.xu(a)shopee.com> Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Keith Busch <kbusch(a)kernel.org> Conflicts: drivers/nvme/host/fabrics.c [Ma Wupeng: BLK_MQ_REQ_RESERVED is replaced by NVME_SUBMIT_RESERVED in v6.8] Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- drivers/nvme/host/fabrics.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 7015fba2e512..2687454351bc 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -151,7 +151,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val) cmd.prop_get.offset = cpu_to_le32(off); ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0, - NVME_QID_ANY, 0, 0, false); + NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false); if (ret >= 0) *val = le64_to_cpu(res.u64); @@ -198,7 +198,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val) cmd.prop_get.offset = cpu_to_le32(off); ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0, - NVME_QID_ANY, 0, 0, false); + NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false); if (ret >= 0) *val = le64_to_cpu(res.u64); @@ -244,7 +244,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val) cmd.prop_set.value = cpu_to_le64(val); ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0, 0, - NVME_QID_ANY, 0, 0, false); + NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false); if (unlikely(ret)) dev_err(ctrl->device, "Property Set error: %d, offset %#x\n", -- 2.17.1

2 1

[PATCH OLK-5.10] drm/radeon: check bo_va->bo is non-NULL before using it
by Guo Mengqi 27 Aug '24

27 Aug '24

From: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> stable inclusion from stable-v6.6.42 commit f13c96e0e325a057c03f8a47734adb360e112efe category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEMD CVE: CVE-2024-41060 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 6fb15dcbcf4f212930350eaee174bb60ed40a536 ] The call to radeon_vm_clear_freed might clear bo_va->bo, so we have to check it before dereferencing it. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> Acked-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: drivers/gpu/drm/radeon/radeon_gem.c [Ma Wupeng: Context conflict] Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- drivers/gpu/drm/radeon/radeon_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 75053917d213..51b6f38b5c47 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -582,7 +582,7 @@ static void radeon_gem_va_update_vm(struct radeon_device *rdev, if (r) goto error_unlock; - if (bo_va->it.start) + if (bo_va->it.start && bo_va->bo) r = radeon_vm_bo_update(rdev, bo_va, &bo_va->bo->tbo.mem); error_unlock: -- 2.17.1

2 1

[PATCH openEuler-22.03-LTS-SP1] drm/radeon: check bo_va->bo is non-NULL before using it
by Guo Mengqi 27 Aug '24

27 Aug '24

From: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> stable inclusion from stable-v6.6.42 commit f13c96e0e325a057c03f8a47734adb360e112efe category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEMD CVE: CVE-2024-41060 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 6fb15dcbcf4f212930350eaee174bb60ed40a536 ] The call to radeon_vm_clear_freed might clear bo_va->bo, so we have to check it before dereferencing it. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> Acked-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: drivers/gpu/drm/radeon/radeon_gem.c [Ma Wupeng: Context conflict] Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- drivers/gpu/drm/radeon/radeon_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index e5c4271e64ed..1c9c6e49cb22 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -584,7 +584,7 @@ static void radeon_gem_va_update_vm(struct radeon_device *rdev, if (r) goto error_unlock; - if (bo_va->it.start) + if (bo_va->it.start && bo_va->bo) r = radeon_vm_bo_update(rdev, bo_va, &bo_va->bo->tbo.mem); error_unlock: -- 2.17.1

2 1

[PATCH openEuler-22.03-LTS-SP1] tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc
by Xiang Yang 27 Aug '24

27 Aug '24

From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com> stable inclusion from stable-v5.10.215 commit ada28eb4b9561aab93942f3224a2e41d76fe57fa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9S26U CVE: CVE-2023-52880 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 67c37756898a5a6b2941a13ae7260c89b54e0d88 upstream. Any unprivileged user can attach N_GSM0710 ldisc, but it requires CAP_NET_ADMIN to create a GSM network anyway. Require initial namespace CAP_NET_ADMIN to do that. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com> Link: https://lore.kernel.org/r/20230731185942.279611-1-cascardo@canonical.com Cc: Salvatore Bonaccorso <carnil(a)debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Xiang Yang <xiangyang3(a)huawei.com> --- drivers/tty/n_gsm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 6cd87e71fecc..96a5e0f91c15 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2619,6 +2619,9 @@ static int gsmld_open(struct tty_struct *tty) struct gsm_mux *gsm; int ret; + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (tty->ops->write == NULL) return -EINVAL; -- 2.34.1

2 1

[openeuler:openEuler-1.0-LTS 13475/23564] net/can/j1939/.tmp_socket.o: warning: objtool: j1939_sk_queue_activate_next()+0x89: unreachable instruction
by kernel test robot 27 Aug '24

27 Aug '24

tree: https://gitee.com/openeuler/kernel.git openEuler-1.0-LTS head: 73a86ad60aabf460a4c8a0f96a1b2ddb98522fd8 commit: e0c2ee9edcacfdda0e806166eaec1310cdf9e324 [13475/23564] can: add support of SAE J1939 protocol config: x86_64-buildonly-randconfig-004-20240825 (https://download.01.org/0day-ci/archive/20240827/202408272140.3C5OLBQP-lkp@…) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408272140.3C5OLBQP-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408272140.3C5OLBQP-lkp@intel.com/ All warnings (new ones prefixed by >>): >> net/can/j1939/.tmp_socket.o: warning: objtool: j1939_sk_queue_activate_next()+0x89: unreachable instruction -- >> net/can/j1939/.tmp_transport.o: warning: objtool: j1939_xtp_abort_to_errno()+0xea: unreachable instruction -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 0

[PATCH OLK-5.10 0/2] Fix mainline patchs
by Luo Gengkun 27 Aug '24

27 Aug '24

Breno Leitao (1): perf/x86/amd/core: Always clear status for idx Sandipan Das (1): perf/x86/amd/core: Fix overflow reset on hotplug arch/x86/events/amd/core.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) -- 2.34.1

2 3

[PATCH openEuler-1.0-LTS] net/ipv6: Fix the RT cache flush via sysctl using a previous delay
by Liu Jian 27 Aug '24

27 Aug '24

From: Petr Pavlu <petr.pavlu(a)suse.com> stable inclusion from stable-v4.19.317 commit ebde6e8a52c68dc45b4ae354e279ba74788579e7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAMPH5 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ------------------------------------------------- [ Upstream commit 14a20e5b4ad998793c5f43b0330d9e1388446cf3 ] The net.ipv6.route.flush system parameter takes a value which specifies a delay used during the flush operation for aging exception routes. The written value is however not used in the currently requested flush and instead utilized only in the next one. A problem is that ipv6_sysctl_rtcache_flush() first reads the old value of net->ipv6.sysctl.flush_delay into a local delay variable and then calls proc_dointvec() which actually updates the sysctl based on the provided input. Fix the problem by switching the order of the two operations. Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.") Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com> Reviewed-by: David Ahern <dsahern(a)kernel.org> Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/ipv6/route.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e7ff0f40867b..907a5e507819 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5186,12 +5186,12 @@ int ipv6_sysctl_rtcache_flush(struct ctl_table *ctl, int write, if (!write) return -EINVAL; - net = (struct net *)ctl->extra1; - delay = net->ipv6.sysctl.flush_delay; ret = proc_dointvec(ctl, write, buffer, lenp, ppos); if (ret) return ret; + net = (struct net *)ctl->extra1; + delay = net->ipv6.sysctl.flush_delay; fib6_run_gc(delay <= 0 ? 0 : (unsigned long)delay, net, delay > 0); return 0; } -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
by Dong Chenchen 27 Aug '24

27 Aug '24

From: Eric Dumazet <edumazet(a)google.com> stable inclusion from stable-v4.19.316 commit 4f3ae7d846b4565c0b80d65ed607c3277bc984d4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAMPH5 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 581073f626e387d3e7eed55c48c8495584ead7ba ] trafgen performance considerably sank on hosts with many cores after the blamed commit. packet_read_pending() is very expensive, and calling it in af_packet fast path defeats Daniel intent in commit b013840810c2 ("packet: use percpu mmap tx frame pending refcount") tpacket_destruct_skb() makes room for one packet, we can immediately wakeup a producer, no need to completely drain the tx ring. Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Signed-off-by: Eric Dumazet <edumazet(a)google.com> Cc: Neil Horman <nhorman(a)tuxdriver.com> Cc: Daniel Borkmann <daniel(a)iogearbox.net> Reviewed-by: Willem de Bruijn <willemb(a)google.com> Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- net/packet/af_packet.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c64bcba21e47..e26ee0486ded 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2437,8 +2437,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb) ts = __packet_set_timestamp(po, ph, skb); __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts); - if (!packet_read_pending(&po->tx_ring)) - complete(&po->skb_completion); + complete(&po->skb_completion); } sock_wfree(skb); -- 2.25.1

2 1