From: Florian Westphal fw@strlen.de
stable inclusion from stable-v4.19.272 commit 01687e35df44dd09cc6943306db35d9efc507907 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit e15d4cdf27cb0c1e977270270b2cea12e0955edd ]
Consider: client -----> conntrack ---> Host
client sends a SYN, but $Host is unreachable/silent. Client eventually gives up and the conntrack entry will time out.
However, if the client is restarted with same addr/port pair, it may prevent the conntrack entry from timing out.
This is noticeable when the existing conntrack entry has no NAT transformation or an outdated one and port reuse happens either on client or due to a NAT middlebox.
This change prevents refresh of the timeout for SYN retransmits, so entry is going away after nf_conntrack_tcp_timeout_syn_sent seconds (default: 60).
Entry will be re-created on next connection attempt, but then nat rules will be evaluated again.
Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/netfilter/nf_conntrack_proto_tcp.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index aab532b8c8c6..1600f35bfd49 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -1089,6 +1089,16 @@ static int tcp_packet(struct nf_conn *ct, nf_ct_kill_acct(ct, ctinfo, skb); return NF_ACCEPT; } + + if (index == TCP_SYN_SET && old_state == TCP_CONNTRACK_SYN_SENT) { + /* do not renew timeout on SYN retransmit. + * + * Else port reuse by client or NAT middlebox can keep + * entry alive indefinitely (including nat info). + */ + return NF_ACCEPT; + } + /* ESTABLISHED without SEEN_REPLY, i.e. mid-connection * pickup with loose=1. Avoid large ESTABLISHED timeout. */
From: Paolo Abeni pabeni@redhat.com
stable inclusion from stable-v4.19.272 commit ad0dfe9bcf0d78e699c7efb64c90ed062dc48bea category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit 71ab9c3e2253619136c31c89dbb2c69305cc89b1 ]
If net_assign_generic() fails, the current error path in ops_init() tries to clear the gen pointer slot. Anyway, in such error path, the gen pointer itself has not been modified yet, and the existing and accessed one is smaller than the accessed index, causing an out-of-bounds error:
BUG: KASAN: slab-out-of-bounds in ops_init+0x2de/0x320 Write of size 8 at addr ffff888109124978 by task modprobe/1018
CPU: 2 PID: 1018 Comm: modprobe Not tainted 6.2.0-rc2.mptcp_ae5ac65fbed5+ #1641 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x9f print_address_description.constprop.0+0x86/0x2b5 print_report+0x11b/0x1fb kasan_report+0x87/0xc0 ops_init+0x2de/0x320 register_pernet_operations+0x2e4/0x750 register_pernet_subsys+0x24/0x40 tcf_register_action+0x9f/0x560 do_one_initcall+0xf9/0x570 do_init_module+0x190/0x650 load_module+0x1fa5/0x23c0 __do_sys_finit_module+0x10d/0x1b0 do_syscall_64+0x58/0x80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f42518f778d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff96869688 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 00005568ef7f7c90 RCX: 00007f42518f778d RDX: 0000000000000000 RSI: 00005568ef41d796 RDI: 0000000000000003 RBP: 00005568ef41d796 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 R13: 00005568ef7f7d30 R14: 0000000000040000 R15: 0000000000000000 </TASK>
This change addresses the issue by skipping the gen pointer de-reference in the mentioned error-path.
Found by code inspection and verified with explicit error injection on a kasan-enabled kernel.
Fixes: d266935ac43d ("net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Simon Horman simon.horman@corigine.com Link: https://lore.kernel.org/r/cec4e0f3bb2c77ac03a6154a8508d3930beb5f0f.167415434... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/net_namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 123094869b34..975cf8cebc34 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -132,12 +132,12 @@ static int ops_init(const struct pernet_operations *ops, struct net *net) return 0;
if (ops->id && ops->size) { -cleanup: ng = rcu_dereference_protected(net->gen, lockdep_is_held(&pernet_ops_rwsem)); ng->ptr[*ops->id] = NULL; }
+cleanup: kfree(data);
out:
From: Li RongQing lirongqing@baidu.com
stable inclusion from stable-v4.19.272 commit bc6199018f7b1b1fca9d6d3a746101d72859c332 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit 0041195d55bc38df6b574cc8c36dcf2266fbee39 ]
The type of hash::nelems has been changed from size_t to atom_t which in fact is int, so not need to check if BITS_PER_LONG, that is bit number of size_t, is bigger than 32
and rht_grow_above_max() will be called to check if hashtable is too big, ensure it can not bigger than 1<<31
Signed-off-by: Zhang Yu zhangyu31@baidu.com Signed-off-by: Li RongQing lirongqing@baidu.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: c1bb9484e3b0 ("netlink: annotate data races around nlk->portid") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/netlink/af_netlink.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 7621c2b1d39e..71cffa4b1dcc 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -578,11 +578,6 @@ static int netlink_insert(struct sock *sk, u32 portid) if (nlk_sk(sk)->bound) goto err;
- err = -ENOMEM; - if (BITS_PER_LONG > 32 && - unlikely(atomic_read(&table->hash.nelems) >= UINT_MAX)) - goto err; - nlk_sk(sk)->portid = portid; sock_hold(sk);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.272 commit a737d39273e636ebcbcce2b04d088fe0b1effe87 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit c1bb9484e3b05166880da8574504156ccbd0549e ]
syzbot reminds us netlink_getname() runs locklessly [1]
This first patch annotates the race against nlk->portid.
Following patches take care of the remaining races.
[1] BUG: KCSAN: data-race in netlink_getname / netlink_insert
write to 0xffff88814176d310 of 4 bytes by task 2315 on cpu 1: netlink_insert+0xf1/0x9a0 net/netlink/af_netlink.c:583 netlink_autobind+0xae/0x180 net/netlink/af_netlink.c:856 netlink_sendmsg+0x444/0x760 net/netlink/af_netlink.c:1895 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2476 ___sys_sendmsg net/socket.c:2530 [inline] __sys_sendmsg+0x19a/0x230 net/socket.c:2559 __do_sys_sendmsg net/socket.c:2568 [inline] __se_sys_sendmsg net/socket.c:2566 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2566 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88814176d310 of 4 bytes by task 2316 on cpu 0: netlink_getname+0xcd/0x1a0 net/netlink/af_netlink.c:1144 __sys_getsockname+0x11d/0x1b0 net/socket.c:2026 __do_sys_getsockname net/socket.c:2041 [inline] __se_sys_getsockname net/socket.c:2038 [inline] __x64_sys_getsockname+0x3e/0x50 net/socket.c:2038 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0xc9a49780
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 2316 Comm: syz-executor.2 Not tainted 6.2.0-rc3-syzkaller-00030-ge8f60cd7db24-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/netlink/af_netlink.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 71cffa4b1dcc..372e83f8c5ed 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -578,7 +578,9 @@ static int netlink_insert(struct sock *sk, u32 portid) if (nlk_sk(sk)->bound) goto err;
- nlk_sk(sk)->portid = portid; + /* portid can be read locklessly from netlink_getname(). */ + WRITE_ONCE(nlk_sk(sk)->portid, portid); + sock_hold(sk);
err = __netlink_insert(table, sk); @@ -1132,7 +1134,8 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr, nladdr->nl_pid = nlk->dst_portid; nladdr->nl_groups = netlink_group_mask(nlk->dst_group); } else { - nladdr->nl_pid = nlk->portid; + /* Paired with WRITE_ONCE() in netlink_insert() */ + nladdr->nl_pid = READ_ONCE(nlk->portid); netlink_lock_table(); nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0; netlink_unlock_table();
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.272 commit 71bd90357ee93482922ea1ec73cd72271a825fec category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit 004db64d185a5f23dfb891d7701e23713b2420ee ]
netlink_getname(), netlink_sendmsg() and netlink_getsockbyportid() can read nlk->dst_portid and nlk->dst_group while another thread is changing them.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/netlink/af_netlink.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 372e83f8c5ed..0df7057036fe 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1090,8 +1090,9 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
if (addr->sa_family == AF_UNSPEC) { sk->sk_state = NETLINK_UNCONNECTED; - nlk->dst_portid = 0; - nlk->dst_group = 0; + /* dst_portid and dst_group can be read locklessly */ + WRITE_ONCE(nlk->dst_portid, 0); + WRITE_ONCE(nlk->dst_group, 0); return 0; } if (addr->sa_family != AF_NETLINK) @@ -1113,8 +1114,9 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
if (err == 0) { sk->sk_state = NETLINK_CONNECTED; - nlk->dst_portid = nladdr->nl_pid; - nlk->dst_group = ffs(nladdr->nl_groups); + /* dst_portid and dst_group can be read locklessly */ + WRITE_ONCE(nlk->dst_portid, nladdr->nl_pid); + WRITE_ONCE(nlk->dst_group, ffs(nladdr->nl_groups)); }
return err; @@ -1131,8 +1133,9 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr, nladdr->nl_pad = 0;
if (peer) { - nladdr->nl_pid = nlk->dst_portid; - nladdr->nl_groups = netlink_group_mask(nlk->dst_group); + /* Paired with WRITE_ONCE() in netlink_connect() */ + nladdr->nl_pid = READ_ONCE(nlk->dst_portid); + nladdr->nl_groups = netlink_group_mask(READ_ONCE(nlk->dst_group)); } else { /* Paired with WRITE_ONCE() in netlink_insert() */ nladdr->nl_pid = READ_ONCE(nlk->portid); @@ -1162,8 +1165,9 @@ static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid)
/* Don't bother queuing skb if kernel socket has no input function */ nlk = nlk_sk(sock); + /* dst_portid can be changed in netlink_connect() */ if (sock->sk_state == NETLINK_CONNECTED && - nlk->dst_portid != nlk_sk(ssk)->portid) { + READ_ONCE(nlk->dst_portid) != nlk_sk(ssk)->portid) { sock_put(sock); return ERR_PTR(-ECONNREFUSED); } @@ -1875,8 +1879,9 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) goto out; netlink_skb_flags |= NETLINK_SKB_DST; } else { - dst_portid = nlk->dst_portid; - dst_group = nlk->dst_group; + /* Paired with WRITE_ONCE() in netlink_connect() */ + dst_portid = READ_ONCE(nlk->dst_portid); + dst_group = READ_ONCE(nlk->dst_group); }
/* Paired with WRITE_ONCE() in netlink_insert() */
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.272 commit 4988b4ad0f7614d68ff5882b3e89fe37daa2b25b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit 9b663b5cbb15b494ef132a3c937641c90646eb73 ]
netlink_getsockbyportid() reads sk_state while a concurrent netlink_connect() can change its value.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/netlink/af_netlink.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 0df7057036fe..544727a6f2df 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1089,7 +1089,8 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, return -EINVAL;
if (addr->sa_family == AF_UNSPEC) { - sk->sk_state = NETLINK_UNCONNECTED; + /* paired with READ_ONCE() in netlink_getsockbyportid() */ + WRITE_ONCE(sk->sk_state, NETLINK_UNCONNECTED); /* dst_portid and dst_group can be read locklessly */ WRITE_ONCE(nlk->dst_portid, 0); WRITE_ONCE(nlk->dst_group, 0); @@ -1113,7 +1114,8 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, err = netlink_autobind(sock);
if (err == 0) { - sk->sk_state = NETLINK_CONNECTED; + /* paired with READ_ONCE() in netlink_getsockbyportid() */ + WRITE_ONCE(sk->sk_state, NETLINK_CONNECTED); /* dst_portid and dst_group can be read locklessly */ WRITE_ONCE(nlk->dst_portid, nladdr->nl_pid); WRITE_ONCE(nlk->dst_group, ffs(nladdr->nl_groups)); @@ -1165,8 +1167,8 @@ static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid)
/* Don't bother queuing skb if kernel socket has no input function */ nlk = nlk_sk(sock); - /* dst_portid can be changed in netlink_connect() */ - if (sock->sk_state == NETLINK_CONNECTED && + /* dst_portid and sk_state can be changed in netlink_connect() */ + if (READ_ONCE(sock->sk_state) == NETLINK_CONNECTED && READ_ONCE(nlk->dst_portid) != nlk_sk(ssk)->portid) { sock_put(sock); return ERR_PTR(-ECONNREFUSED);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.272 commit ef050cf5fb70d995a0d03244e25179b7c66a924a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
[ Upstream commit 1d1d63b612801b3f0a39b7d4467cad0abd60e5c8 ]
if (!type) continue; if (type > RTAX_MAX) return -EINVAL; ... metrics[type - 1] = val;
@type being used as an array index, we need to prevent cpu speculation or risk leaking kernel memory content.
Fixes: 6cf9dfd3bd62 ("net: fib: move metrics parsing to a helper") Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/metrics.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c index 04311f7067e2..9a6b01d85cd0 100644 --- a/net/ipv4/metrics.c +++ b/net/ipv4/metrics.c @@ -1,4 +1,5 @@ #include <linux/netlink.h> +#include <linux/nospec.h> #include <linux/rtnetlink.h> #include <linux/types.h> #include <net/ip.h> @@ -24,6 +25,7 @@ int ip_metrics_convert(struct net *net, struct nlattr *fc_mx, int fc_mx_len, if (type > RTAX_MAX) return -EINVAL;
+ type = array_index_nospec(type, RTAX_MAX + 1); if (type == RTAX_CC_ALGO) { char tmp[TCP_CA_NAME_MAX];
From: Thomas Gleixner tglx@linutronix.de
stable inclusion from stable-v4.19.272 commit 496975d1a2937f4baadf3d985991b13fc4fc4f27 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit 5fa55950729d0762a787451dc52862c3f850f859 upstream.
Baoquan reported that after triggering a crash the subsequent crash-kernel fails to boot about half of the time. It triggers a NULL pointer dereference in the periodic tick code.
This happens because the legacy timer interrupt (IRQ0) is resent in software which happens in soft interrupt (tasklet) context. In this context get_irq_regs() returns NULL which leads to the NULL pointer dereference.
The reason for the resend is a spurious APIC interrupt on the IRQ0 vector which is captured and leads to a resend when the legacy timer interrupt is enabled. This is wrong because the legacy PIC interrupts are level triggered and therefore should never be resent in software, but nothing ever sets the IRQ_LEVEL flag on those interrupts, so the core code does not know about their trigger type.
Ensure that IRQ_LEVEL is set when the legacy PCI interrupts are set up.
Fixes: a4633adcdbc1 ("[PATCH] genirq: add genirq sw IRQ-retrigger") Reported-by: Baoquan He bhe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Baoquan He bhe@redhat.com Link: https://lore.kernel.org/r/87mt6rjrra.ffs@tglx Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/i8259.c | 1 + arch/x86/kernel/irqinit.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c index 519649ddf100..89c7975b88a7 100644 --- a/arch/x86/kernel/i8259.c +++ b/arch/x86/kernel/i8259.c @@ -114,6 +114,7 @@ static void make_8259A_irq(unsigned int irq) disable_irq_nosync(irq); io_apic_irqs &= ~(1<<irq); irq_set_chip_and_handler(irq, &i8259A_chip, handle_level_irq); + irq_set_status_flags(irq, IRQ_LEVEL); enable_irq(irq); lapic_assign_legacy_vector(irq, true); } diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index a0693b71cfc1..f2c215e1f64c 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -72,8 +72,10 @@ void __init init_ISA_irqs(void)
legacy_pic->init(0);
- for (i = 0; i < nr_legacy_irqs(); i++) + for (i = 0; i < nr_legacy_irqs(); i++) { irq_set_chip_and_handler(i, chip, handle_level_irq); + irq_set_status_flags(i, IRQ_LEVEL); + } }
void __init init_IRQ(void)
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.273 commit 86bd9f9d11a24bfd072f38c43f4b2664c54a2c2c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit 31352811e13dc2313f101b890fd4b1ce760b5fe7 upstream.
__dma_rx_complete() is called from two places: - Through the DMA completion callback dma_rx_complete() - From serial8250_rx_dma_flush() after IIR_RLSI or IIR_RX_TIMEOUT The former does not hold port's lock during __dma_rx_complete() which allows these two to race and potentially insert the same data twice.
Extend port's lock coverage in dma_rx_complete() to prevent the race and check if the DMA Rx is still pending completion before calling into __dma_rx_complete().
Reported-by: Gilles BULOZ gilles.buloz@kontron.com Tested-by: Gilles BULOZ gilles.buloz@kontron.com Fixes: 9ee4b83e51f7 ("serial: 8250: Add support for dmaengine") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230130114841.25749-2-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/serial/8250/8250_dma.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c index bfa1a857f3ff..871c40c655ec 100644 --- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -61,6 +61,18 @@ static void __dma_rx_complete(void *param) tty_flip_buffer_push(tty_port); }
+static void dma_rx_complete(void *param) +{ + struct uart_8250_port *p = param; + struct uart_8250_dma *dma = p->dma; + unsigned long flags; + + spin_lock_irqsave(&p->port.lock, flags); + if (dma->rx_running) + __dma_rx_complete(p); + spin_unlock_irqrestore(&p->port.lock, flags); +} + int serial8250_tx_dma(struct uart_8250_port *p) { struct uart_8250_dma *dma = p->dma; @@ -126,7 +138,7 @@ int serial8250_rx_dma(struct uart_8250_port *p) return -EBUSY;
dma->rx_running = 1; - desc->callback = __dma_rx_complete; + desc->callback = dma_rx_complete; desc->callback_param = p;
dma->rx_cookie = dmaengine_submit(desc);
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.273 commit 81f0e0be230792cd5779333feadd06ad7c820ef3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit 57e9af7831dcf211c5c689c2a6f209f4abdf0bce upstream.
As DMA Rx can be completed from two places, it is possible that DMA Rx completes before DMA completion callback had a chance to complete it. Once the previous DMA Rx has been completed, a new one can be started on the next UART interrupt. The following race is possible (uart_unlock_and_check_sysrq_irqrestore() replaced with spin_unlock_irqrestore() for simplicity/clarity):
CPU0 CPU1 dma_rx_complete() serial8250_handle_irq() spin_lock_irqsave(&port->lock) handle_rx_dma() serial8250_rx_dma_flush() __dma_rx_complete() dma->rx_running = 0 // Complete DMA Rx spin_unlock_irqrestore(&port->lock)
serial8250_handle_irq() spin_lock_irqsave(&port->lock) handle_rx_dma() serial8250_rx_dma() dma->rx_running = 1 // Setup a new DMA Rx spin_unlock_irqrestore(&port->lock)
spin_lock_irqsave(&port->lock) // sees dma->rx_running = 1 __dma_rx_complete() dma->rx_running = 0 // Incorrectly complete // running DMA Rx
This race seems somewhat theoretical to occur for real but handle it correctly regardless. Check what is the DMA status before complething anything in __dma_rx_complete().
Reported-by: Gilles BULOZ gilles.buloz@kontron.com Tested-by: Gilles BULOZ gilles.buloz@kontron.com Fixes: 9ee4b83e51f7 ("serial: 8250: Add support for dmaengine") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230130114841.25749-3-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/serial/8250/8250_dma.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c index 871c40c655ec..c8b17070faf4 100644 --- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -48,15 +48,23 @@ static void __dma_rx_complete(void *param) struct uart_8250_dma *dma = p->dma; struct tty_port *tty_port = &p->port.state->port; struct dma_tx_state state; + enum dma_status dma_status; int count;
- dma->rx_running = 0; - dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state); + /* + * New DMA Rx can be started during the completion handler before it + * could acquire port's lock and it might still be ongoing. Don't to + * anything in such case. + */ + dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state); + if (dma_status == DMA_IN_PROGRESS) + return;
count = dma->rx_size - state.residue;
tty_insert_flip_string(tty_port, dma->rx_buf, count); p->port.icount.rx += count; + dma->rx_running = 0;
tty_flip_buffer_push(tty_port); }
From: Toke Høiland-Jørgensen toke@redhat.com
stable inclusion from stable-v4.19.273 commit 96d9b11d688ced4c9e16312b1b95570e86b73260 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit d1c362e1dd68a421cf9033404cf141a4ab734a5d upstream.
The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work.
For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds.
Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: David Ahern dsahern@gmail.com Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/filter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/filter.c b/net/core/filter.c index c24c7cddeb8e..cd9fba7fe78e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4240,7 +4240,6 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, memcpy(params->smac, dev->dev_addr, ETH_ALEN); params->h_vlan_TCI = 0; params->h_vlan_proto = 0; - params->ifindex = dev->ifindex;
return 0; } @@ -4338,6 +4337,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, params->ipv4_dst = nh->nh_gw;
params->rt_metric = res.fi->fib_priority; + params->ifindex = dev->ifindex;
/* xdp and cls_bpf programs are run in RCU-bh so * rcu_read_lock_bh is not needed here @@ -4452,6 +4452,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
dev = f6i->fib6_nh.nh_dev; params->rt_metric = f6i->fib6_metric; + params->ifindex = dev->ifindex;
/* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is * not needed here. Can not use __ipv6_neigh_lookup_noref here
From: Seth Jenkins sethjenkins@google.com
stable inclusion from stable-v4.19.273 commit d8dca1bfe9adcae38b35add64977818c0c13dd22 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit 81e9d6f8647650a7bead74c5f926e29970e834d1 upstream.
Commit e4a0d3e720e7 ("aio: Make it possible to remap aio ring") introduced a null-deref if mremap is called on an old aio mapping after fork as mm->ioctx_table will be set to NULL.
[jmoyer@redhat.com: fix 80 column issue] Link: https://lkml.kernel.org/r/x49sffq4nvg.fsf@segfault.boston.devel.redhat.com Fixes: e4a0d3e720e7 ("aio: Make it possible to remap aio ring") Signed-off-by: Seth Jenkins sethjenkins@google.com Signed-off-by: Jeff Moyer jmoyer@redhat.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Benjamin LaHaise bcrl@kvack.org Cc: Jann Horn jannh@google.com Cc: Pavel Emelyanov xemul@parallels.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/aio.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/aio.c b/fs/aio.c index d221260b8f30..b77f0959f0e8 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -332,6 +332,9 @@ static int aio_ring_mremap(struct vm_area_struct *vma) spin_lock(&mm->ioctx_lock); rcu_read_lock(); table = rcu_dereference(mm->ioctx_table); + if (!table) + goto out_unlock; + for (i = 0; i < table->nr; i++) { struct kioctx *ctx;
@@ -345,6 +348,7 @@ static int aio_ring_mremap(struct vm_area_struct *vma) } }
+out_unlock: rcu_read_unlock(); spin_unlock(&mm->ioctx_lock); return res;
From: Guillaume Nault gnault@redhat.com
stable inclusion from stable-v4.19.273 commit b0f76723a05ab652cc8181b9dffbf76ec28a3be3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit e010ae08c71fda8be3d6bda256837795a0b3ea41 upstream.
Take into account the IPV6_TCLASS socket option (DSCP) in ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100
echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table.
Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/datagram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 5c52fc1cb25b..4c21abb34203 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -54,7 +54,7 @@ static void ip6_datagram_flow_key_init(struct flowi6 *fl6, struct sock *sk) fl6->flowi6_mark = sk->sk_mark; fl6->fl6_dport = inet->inet_dport; fl6->fl6_sport = inet->inet_sport; - fl6->flowlabel = np->flow_label; + fl6->flowlabel = ip6_make_flowinfo(np->tclass, np->flow_label); fl6->flowi6_uid = sk->sk_uid;
if (!fl6->flowi6_oif)
From: Guillaume Nault gnault@redhat.com
stable inclusion from stable-v4.19.273 commit 1c9df9775dd84ed765895035b00df4b9c0749ff6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
--------------------------------
commit 8230680f36fd1525303d1117768c8852314c488c upstream.
Take into account the IPV6_TCLASS socket option (DSCP) in tcp_v6_connect(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100
echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table.
Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/tcp_ipv6.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 79164360b635..8a257a1bc5b1 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -259,6 +259,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, fl6.flowi6_proto = IPPROTO_TCP; fl6.daddr = sk->sk_v6_daddr; fl6.saddr = saddr ? *saddr : np->saddr; + fl6.flowlabel = ip6_make_flowinfo(np->tclass, np->flow_label); fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; fl6.fl6_dport = usin->sin6_port;
From: Mike Kravetz mike.kravetz@oracle.com
stable inclusion from stable-v4.19.275 commit 400723777e17164aec1510f0f2c630ae2eee8a48 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 3489dbb696d25602aea8c3e669a6d43b76bd5358 upstream.
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".
This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue.
[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
This patch (of 2):
A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table.
page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added.
[akpm@linux-foundation.org: simplification, per David] [akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Peter Xu peterx@redhat.com Cc: David Hildenbrand david@redhat.com Cc: James Houghton jthoughton@google.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Muchun Song songmuchun@bytedance.com Cc: Naoya Horiguchi naoya.horiguchi@linux.dev Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: include/linux/hugetlb.h
Signed-off-by: Ze Zuo zuoze1@huawei.com Reviewed-by: yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/proc/task_mmu.c | 4 +--- include/linux/hugetlb.h | 13 +++++++++++++ 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index cfe2c4e32533..93815b2b8440 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -709,9 +709,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, page = device_private_entry_to_page(swpent); } if (page) { - int mapcount = page_mapcount(page); - - if (mapcount >= 2) + if (page_mapcount(page) >= 2 || hugetlb_pmd_shared(pte)) mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); else mss->private_hugetlb += huge_page_size(hstate_vma(vma)); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index dfd9a8c945e1..868aea82db2d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -7,6 +7,7 @@ #include <linux/fs.h> #include <linux/hugetlb_inline.h> #include <linux/cgroup.h> +#include <linux/page_ref.h> #include <linux/list.h> #include <linux/kref.h> #include <asm/pgtable.h> @@ -784,4 +785,16 @@ static inline int hugetlb_insert__hugepage_pte_by_pa(struct mm_struct *mm, pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, int writable); #endif
+#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE +static inline bool hugetlb_pmd_shared(pte_t *pte) +{ + return page_count(virt_to_page(pte)) > 1; +} +#else +static inline bool hugetlb_pmd_shared(pte_t *pte) +{ + return false; +} +#endif + #endif /* _LINUX_HUGETLB_H */
From: Mike Kravetz mike.kravetz@oracle.com
stable inclusion from stable-v4.19.275 commit 2cfc6d164b11297ca3511dc7d3cd9f2d530fc301 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6KOHU CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 73bdf65ea74857d7fb2ec3067a3cec0e261b1462 upstream.
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to move pages shared with another process to a different node. page_mapcount
1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. As a result, hugetlb pages shared by multiple processes and mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Peter Xu peterx@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: James Houghton jthoughton@google.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Muchun Song songmuchun@bytedance.com Cc: Naoya Horiguchi naoya.horiguchi@linux.dev Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ze Zuo zuoze1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/mempolicy.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 42f71113698d..4cac46d56f38 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -612,7 +612,8 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, goto unlock; /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ if (flags & (MPOL_MF_MOVE_ALL) || - (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) + (flags & MPOL_MF_MOVE && page_mapcount(page) == 1 && + !hugetlb_pmd_shared(pte))) isolate_huge_page(page, qp->pagelist); unlock: spin_unlock(ptl);