From: Chunguang Xu <chunguang.xu(a)shopee.com>
mainline inclusion
from mainline-v6.10-rc3
commit 7dc3bfcb4c9cc58970fff6aaa48172cb224d85aa
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEKB
CVE: CVE-2024-41082
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
In some scenarios, if too many commands are issued by nvme command in
the same time by user tasks, this may exhaust all tags of admin_q. If
a reset (nvme reset or IO timeout) occurs before these commands finish,
reconnect routine may fail to update nvme regs due to insufficient tags,
which will cause kernel hang forever. In order to workaround this issue,
maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
tags. This maybe safe for nvmf:
1. For the disable ctrl path, we will not issue connect command
2. For the enable ctrl / fw activate path, since connect and reg_xx()
are called serially.
So the reserved tags may still be enough while reg_xx() use reserved tags.
Signed-off-by: Chunguang Xu <chunguang.xu(a)shopee.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Conflicts:
drivers/nvme/host/fabrics.c
[Ma Wupeng: BLK_MQ_REQ_RESERVED is replaced by NVME_SUBMIT_RESERVED in v6.8]
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
---
drivers/nvme/host/fabrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 7015fba2e512..2687454351bc 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -151,7 +151,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
cmd.prop_get.offset = cpu_to_le32(off);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0,
- NVME_QID_ANY, 0, 0, false);
+ NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false);
if (ret >= 0)
*val = le64_to_cpu(res.u64);
@@ -198,7 +198,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
cmd.prop_get.offset = cpu_to_le32(off);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0,
- NVME_QID_ANY, 0, 0, false);
+ NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false);
if (ret >= 0)
*val = le64_to_cpu(res.u64);
@@ -244,7 +244,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
cmd.prop_set.value = cpu_to_le64(val);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0, 0,
- NVME_QID_ANY, 0, 0, false);
+ NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false);
if (unlikely(ret))
dev_err(ctrl->device,
"Property Set error: %d, offset %#x\n",
--
2.17.1
From: Chunguang Xu <chunguang.xu(a)shopee.com>
mainline inclusion
from mainline-v6.10-rc3
commit 7dc3bfcb4c9cc58970fff6aaa48172cb224d85aa
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAGEKB
CVE: CVE-2024-41082
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
In some scenarios, if too many commands are issued by nvme command in
the same time by user tasks, this may exhaust all tags of admin_q. If
a reset (nvme reset or IO timeout) occurs before these commands finish,
reconnect routine may fail to update nvme regs due to insufficient tags,
which will cause kernel hang forever. In order to workaround this issue,
maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
tags. This maybe safe for nvmf:
1. For the disable ctrl path, we will not issue connect command
2. For the enable ctrl / fw activate path, since connect and reg_xx()
are called serially.
So the reserved tags may still be enough while reg_xx() use reserved tags.
Signed-off-by: Chunguang Xu <chunguang.xu(a)shopee.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Conflicts:
drivers/nvme/host/fabrics.c
[Ma Wupeng: BLK_MQ_REQ_RESERVED is replaced by NVME_SUBMIT_RESERVED in v6.8]
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
---
drivers/nvme/host/fabrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 7015fba2e512..2687454351bc 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -151,7 +151,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
cmd.prop_get.offset = cpu_to_le32(off);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0,
- NVME_QID_ANY, 0, 0, false);
+ NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false);
if (ret >= 0)
*val = le64_to_cpu(res.u64);
@@ -198,7 +198,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
cmd.prop_get.offset = cpu_to_le32(off);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0, 0,
- NVME_QID_ANY, 0, 0, false);
+ NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false);
if (ret >= 0)
*val = le64_to_cpu(res.u64);
@@ -244,7 +244,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
cmd.prop_set.value = cpu_to_le64(val);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0, 0,
- NVME_QID_ANY, 0, 0, false);
+ NVME_QID_ANY, 0, BLK_MQ_REQ_RESERVED, false);
if (unlikely(ret))
dev_err(ctrl->device,
"Property Set error: %d, offset %#x\n",
--
2.17.1
From: Petr Pavlu <petr.pavlu(a)suse.com>
stable inclusion
from stable-v4.19.317
commit ebde6e8a52c68dc45b4ae354e279ba74788579e7
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IAMPH5
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
-------------------------------------------------
[ Upstream commit 14a20e5b4ad998793c5f43b0330d9e1388446cf3 ]
The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.
A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.
Fix the problem by switching the order of the two operations.
Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.")
Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com>
Reviewed-by: David Ahern <dsahern(a)kernel.org>
Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Liu Jian <liujian56(a)huawei.com>
---
net/ipv6/route.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e7ff0f40867b..907a5e507819 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5186,12 +5186,12 @@ int ipv6_sysctl_rtcache_flush(struct ctl_table *ctl, int write,
if (!write)
return -EINVAL;
- net = (struct net *)ctl->extra1;
- delay = net->ipv6.sysctl.flush_delay;
ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
if (ret)
return ret;
+ net = (struct net *)ctl->extra1;
+ delay = net->ipv6.sysctl.flush_delay;
fib6_run_gc(delay <= 0 ? 0 : (unsigned long)delay, net, delay > 0);
return 0;
}
--
2.34.1
From: Eric Dumazet <edumazet(a)google.com>
stable inclusion
from stable-v4.19.316
commit 4f3ae7d846b4565c0b80d65ed607c3277bc984d4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IAMPH5
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 581073f626e387d3e7eed55c48c8495584ead7ba ]
trafgen performance considerably sank on hosts with many cores
after the blamed commit.
packet_read_pending() is very expensive, and calling it
in af_packet fast path defeats Daniel intent in commit
b013840810c2 ("packet: use percpu mmap tx frame pending refcount")
tpacket_destruct_skb() makes room for one packet, we can immediately
wakeup a producer, no need to completely drain the tx ring.
Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: Neil Horman <nhorman(a)tuxdriver.com>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
---
net/packet/af_packet.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c64bcba21e47..e26ee0486ded 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2437,8 +2437,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
ts = __packet_set_timestamp(po, ph, skb);
__packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
- if (!packet_read_pending(&po->tx_ring))
- complete(&po->skb_completion);
+ complete(&po->skb_completion);
}
sock_wfree(skb);
--
2.25.1