Benedict Wong (1): xfrm: Check if_id in inbound policy/secpath match
Cambda Zhu (1): tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set
Eric Dumazet (7): netlink: annotate accesses to nlk->cb_running net: annotate sk->sk_err write from do_recvmmsg() net: datagram: fix data-races in datagram_poll() vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() af_packet: do not use READ_ONCE() in packet_bind() rfs: annotate lockless accesses to sk->sk_rxhash rfs: annotate lockless accesses to RFS sock flow table
Gavrilov Ilia (1): ipv6: Fix out-of-bounds access in ipv6_find_tlv()
Guillaume Nault (1): ping6: Fix send to link-local addresses with VRF.
Hangyu Hua (1): net: sched: fix possible refcount leak in tc_chain_tmplt_add()
Kuniyuki Iwashima (4): af_unix: Fix a data race of sk->sk_receive_queue->qlen. af_unix: Fix data races around sk->sk_shutdown. udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). af_packet: Fix data-races of pkt_sk(sk)->num.
Nick Child (1): net: Catch invalid index in XPS mapping
Nicolas Dichtel (1): ipv{4,6}/raw: fix output xfrm lookup wrt protocol
Paolo Abeni (1): tcp: factor out __tcp_close() helper
Pratyush Yadav (1): net: fix skb leak in __skb_tstamp_tx()
Tobias Brunner (1): af_key: Reject optional tunnel/BEET mode templates in outbound policies
Vladislav Efanov (1): udp6: Fix race condition in udp6_sendmsg & connect
include/linux/netdevice.h | 7 +++++-- include/net/ip.h | 2 ++ include/net/sock.h | 18 +++++++++++++----- include/net/tcp.h | 1 + include/uapi/linux/in.h | 2 ++ net/8021q/vlan_dev.c | 4 ++-- net/core/datagram.c | 15 ++++++++++----- net/core/dev.c | 8 ++++++-- net/core/skbuff.c | 4 +++- net/core/sock.c | 2 +- net/ipv4/ip_sockglue.c | 12 +++++++++++- net/ipv4/raw.c | 5 ++++- net/ipv4/tcp.c | 12 +++++++++--- net/ipv4/udplite.c | 2 ++ net/ipv6/exthdrs_core.c | 2 ++ net/ipv6/ping.c | 3 ++- net/ipv6/raw.c | 3 ++- net/ipv6/udplite.c | 2 ++ net/key/af_key.c | 12 ++++++++---- net/netlink/af_netlink.c | 8 ++++---- net/packet/af_packet.c | 8 +++++--- net/packet/diag.c | 2 +- net/sched/cls_api.c | 1 + net/socket.c | 2 +- net/unix/af_unix.c | 22 +++++++++++++--------- net/xfrm/xfrm_policy.c | 11 ++++++----- 26 files changed, 118 insertions(+), 52 deletions(-)
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.284 commit 840a647499b093621167de56ffa8756dfc69f242 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit a939d14919b799e6fff8a9c80296ca229ba2f8a4 ]
Both netlink_recvmsg() and netlink_native_seq_show() read nlk->cb_running locklessly. Use READ_ONCE() there.
Add corresponding WRITE_ONCE() to netlink_dump() and __netlink_dump_start()
syzbot reported: BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg
write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0: __netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399 netlink_dump_start include/linux/netlink.h:308 [inline] rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] sock_write_iter+0x1aa/0x230 net/socket.c:1138 call_write_iter include/linux/fs.h:1851 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x463/0x760 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1: netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022 sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017 ____sys_recvmsg+0x2db/0x310 net/socket.c:2718 ___sys_recvmsg net/socket.c:2762 [inline] do_recvmmsg+0x2e5/0x710 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00 -> 0x01
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/netlink/af_netlink.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 544727a6f2df..dadfc86ff9c0 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2005,7 +2005,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
skb_free_datagram(sk, skb);
- if (nlk->cb_running && + if (READ_ONCE(nlk->cb_running) && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { ret = netlink_dump(sk); if (ret) { @@ -2287,7 +2287,7 @@ static int netlink_dump(struct sock *sk) if (cb->done) cb->done(cb);
- nlk->cb_running = false; + WRITE_ONCE(nlk->cb_running, false); module = cb->module; skb = cb->skb; mutex_unlock(nlk->cb_mutex); @@ -2347,7 +2347,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, goto error_put; }
- nlk->cb_running = true; + WRITE_ONCE(nlk->cb_running, true); nlk->dump_done_errno = INT_MAX;
mutex_unlock(nlk->cb_mutex); @@ -2651,7 +2651,7 @@ static int netlink_seq_show(struct seq_file *seq, void *v) nlk->groups ? (u32)nlk->groups[0] : 0, sk_rmem_alloc_get(s), sk_wmem_alloc_get(s), - nlk->cb_running, + READ_ONCE(nlk->cb_running), refcount_read(&s->sk_refcnt), atomic_read(&s->sk_drops), sock_i_ino(s)
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.284 commit 640bce625ccf667a1a80262175a81108889ac41d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit e05a5f510f26607616fecdd4ac136310c8bea56b ]
do_recvmmsg() can write to sk->sk_err from multiple threads.
As said before, many other points reading or writing sk_err need annotations.
Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/socket.c b/net/socket.c index 553045441a2c..2eebc9274f6a 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2686,7 +2686,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). */ - sock->sk->sk_err = -err; + WRITE_ONCE(sock->sk->sk_err, -err); } out_put: fput_light(sock->file, fput_needed);
From: Paolo Abeni pabeni@redhat.com
stable inclusion from stable-v4.19.284 commit e62806034c409b87e8fa00b829c059ba8face62c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 77c3c95637526f1e4330cc9a4b2065f668c2c4fe ]
unlocked version of protocol level close, will be used by MPTCP to allow decouple orphaning and subflow level close.
Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: e14cadfd80d7 ("tcp: add annotations around sk->sk_shutdown accesses") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/tcp.h | 1 + net/ipv4/tcp.c | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 12741aed95b5..3abc1e200fc1 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -391,6 +391,7 @@ void tcp_update_metrics(struct sock *sk); void tcp_init_metrics(struct sock *sk); void tcp_metrics_init(void); bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst); +void __tcp_close(struct sock *sk, long timeout); void tcp_close(struct sock *sk, long timeout); void tcp_init_sock(struct sock *sk); void tcp_init_transfer(struct sock *sk, int bpf_op); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cb118905740f..a12f44b9c0f5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2343,13 +2343,12 @@ bool tcp_check_oom(struct sock *sk, int shift) return too_many_orphans || out_of_socket_memory; }
-void tcp_close(struct sock *sk, long timeout) +void __tcp_close(struct sock *sk, long timeout) { struct sk_buff *skb; int data_was_unread = 0; int state;
- lock_sock(sk); sk->sk_shutdown = SHUTDOWN_MASK;
if (sk->sk_state == TCP_LISTEN) { @@ -2510,6 +2509,12 @@ void tcp_close(struct sock *sk, long timeout) out: bh_unlock_sock(sk); local_bh_enable(); +} + +void tcp_close(struct sock *sk, long timeout) +{ + lock_sock(sk); + __tcp_close(sk, timeout); release_sock(sk); sock_put(sk); }
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.284 commit 32b304def00874955ce614261d15ea3836aa7947 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 5bca1d081f44c9443e61841842ce4e9179d327b6 ]
datagram_poll() runs locklessly, we should add READ_ONCE() annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230509173131.3263780-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/datagram.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/net/core/datagram.c b/net/core/datagram.c index 865a8cb7b0bd..6ba82eb14b46 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -837,18 +837,21 @@ __poll_t datagram_poll(struct file *file, struct socket *sock, { struct sock *sk = sock->sk; __poll_t mask; + u8 shutdown;
sock_poll_wait(file, sock, wait); mask = 0;
/* exceptional events? */ - if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) + if (READ_ONCE(sk->sk_err) || + !skb_queue_empty_lockless(&sk->sk_error_queue)) mask |= EPOLLERR | (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
- if (sk->sk_shutdown & RCV_SHUTDOWN) + shutdown = READ_ONCE(sk->sk_shutdown); + if (shutdown & RCV_SHUTDOWN) mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM; - if (sk->sk_shutdown == SHUTDOWN_MASK) + if (shutdown == SHUTDOWN_MASK) mask |= EPOLLHUP;
/* readable? */ @@ -857,10 +860,12 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
/* Connection-based need to check for termination and startup */ if (connection_based(sk)) { - if (sk->sk_state == TCP_CLOSE) + int state = READ_ONCE(sk->sk_state); + + if (state == TCP_CLOSE) mask |= EPOLLHUP; /* connection hasn't started yet? */ - if (sk->sk_state == TCP_SYN_SENT) + if (state == TCP_SYN_SENT) return mask; }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.284 commit 80508eceeb8073fe61dd80de1c07d28de9f4d57e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 679ed006d416ea0cecfe24a99d365d1dea69c683 ]
KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg() updates qlen under the queue lock and sendmsg() checks qlen under unix_state_sock(), not the queue lock, so the reader side needs READ_ONCE().
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer
write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0: __skb_unlink include/linux/skbuff.h:2347 [inline] __skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197 __skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263 __unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452 unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549 sock_recvmsg_nosec net/socket.c:1019 [inline] ____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720 ___sys_recvmsg+0xc8/0x150 net/socket.c:2764 do_recvmmsg+0x182/0x560 net/socket.c:2858 __sys_recvmmsg net/socket.c:2937 [inline] __do_sys_recvmmsg net/socket.c:2960 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1: skb_queue_len include/linux/skbuff.h:2127 [inline] unix_recvq_full net/unix/af_unix.c:229 [inline] unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445 unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x148/0x160 net/socket.c:747 ____sys_sendmsg+0x20e/0x620 net/socket.c:2503 ___sys_sendmsg+0xc6/0x140 net/socket.c:2557 __sys_sendmmsg+0x11d/0x370 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0000000b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Michal Kubiak michal.kubiak@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index e79c32942796..0b2d466fb858 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1232,7 +1232,7 @@ static long unix_wait_for_peer(struct sock *other, long timeo)
sched = !sock_flag(other, SOCK_DEAD) && !(other->sk_shutdown & RCV_SHUTDOWN) && - unix_recvq_full(other); + unix_recvq_full_lockless(other);
unix_state_unlock(other);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.284 commit 1c488f4e95b498c977fbeae784983eb4cf6085e8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit e1d09c2c2f5793474556b60f83900e088d0d366d ]
KCSAN found a data race around sk->sk_shutdown where unix_release_sock() and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll() and unix_dgram_poll() read it locklessly.
We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE().
BUG: KCSAN: data-race in unix_poll / unix_release_sock
write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1042 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1397 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1: unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170 sock_poll+0xcf/0x2b0 net/socket.c:1385 vfs_poll include/linux/poll.h:88 [inline] ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855 ep_send_events fs/eventpoll.c:1694 [inline] ep_poll fs/eventpoll.c:1823 [inline] do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258 __do_sys_epoll_wait fs/eventpoll.c:2270 [inline] __se_sys_epoll_wait fs/eventpoll.c:2265 [inline] __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x00 -> 0x03
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Michal Kubiak michal.kubiak@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/unix/af_unix.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0b2d466fb858..b0dcbb08e60d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -536,7 +536,7 @@ static void unix_release_sock(struct sock *sk, int embrion) /* Clear state */ unix_state_lock(sk); sock_orphan(sk); - sk->sk_shutdown = SHUTDOWN_MASK; + WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK); path = u->path; u->path.dentry = NULL; u->path.mnt = NULL; @@ -554,7 +554,7 @@ static void unix_release_sock(struct sock *sk, int embrion) if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) { unix_state_lock(skpair); /* No more writes */ - skpair->sk_shutdown = SHUTDOWN_MASK; + WRITE_ONCE(skpair->sk_shutdown, SHUTDOWN_MASK); if (!skb_queue_empty(&sk->sk_receive_queue) || embrion) skpair->sk_err = ECONNRESET; unix_state_unlock(skpair); @@ -2551,7 +2551,7 @@ static int unix_shutdown(struct socket *sock, int mode) ++mode;
unix_state_lock(sk); - sk->sk_shutdown |= mode; + WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | mode); other = unix_peer(sk); if (other) sock_hold(other); @@ -2568,7 +2568,7 @@ static int unix_shutdown(struct socket *sock, int mode) if (mode&SEND_SHUTDOWN) peer_mode |= RCV_SHUTDOWN; unix_state_lock(other); - other->sk_shutdown |= peer_mode; + WRITE_ONCE(other->sk_shutdown, other->sk_shutdown | peer_mode); unix_state_unlock(other); other->sk_state_change(other); if (peer_mode == SHUTDOWN_MASK) @@ -2687,16 +2687,18 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa { struct sock *sk = sock->sk; __poll_t mask; + u8 shutdown;
sock_poll_wait(file, sock, wait); mask = 0; + shutdown = READ_ONCE(sk->sk_shutdown);
/* exceptional events? */ if (sk->sk_err) mask |= EPOLLERR; - if (sk->sk_shutdown == SHUTDOWN_MASK) + if (shutdown == SHUTDOWN_MASK) mask |= EPOLLHUP; - if (sk->sk_shutdown & RCV_SHUTDOWN) + if (shutdown & RCV_SHUTDOWN) mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
/* readable? */ @@ -2724,18 +2726,20 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, struct sock *sk = sock->sk, *other; unsigned int writable; __poll_t mask; + u8 shutdown;
sock_poll_wait(file, sock, wait); mask = 0; + shutdown = READ_ONCE(sk->sk_shutdown);
/* exceptional events? */ if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) mask |= EPOLLERR | (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
- if (sk->sk_shutdown & RCV_SHUTDOWN) + if (shutdown & RCV_SHUTDOWN) mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM; - if (sk->sk_shutdown == SHUTDOWN_MASK) + if (shutdown == SHUTDOWN_MASK) mask |= EPOLLHUP;
/* readable? */
From: Nick Child nnac123@linux.ibm.com
stable inclusion from stable-v4.19.284 commit 3e750733375cb59cd049de7b38713122cec995c4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 5dd0dfd55baec0742ba8f5625a0dd064aca7db16 ]
When setting the XPS value of a TX queue, warn the user once if the index of the queue is greater than the number of allocated TX queues.
Previously, this scenario went uncaught. In the best case, it resulted in unnecessary allocations. In the worst case, it resulted in out-of-bounds memory references through calls to `netdev_get_tx_queue( dev, index)`. Therefore, it is important to inform the user but not worth returning an error and risk downing the netdevice.
Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Piotr Raczynski piotr.raczynski@intel.com Link: https://lore.kernel.org/r/20230321150725.127229-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c index 65e767c9661a..e40d9d98baea 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2303,6 +2303,8 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask, bool active = false; unsigned int nr_ids;
+ WARN_ON_ONCE(index >= dev->num_tx_queues); + if (dev->num_tc) { /* Do not allow XPS on subordinate device directly */ num_tc = dev->num_tc;
From: Tobias Brunner tobias@strongswan.org
stable inclusion from stable-v4.19.284 commit d45a4270a5581ebceeec47e42d9def945dc51e90 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit cf3128a7aca55b2eefb68281d44749c683bdc96f ]
xfrm_state_find() uses `encap_family` of the current template with the passed local and remote addresses to find a matching state. If an optional tunnel or BEET mode template is skipped in a mixed-family scenario, there could be a mismatch causing an out-of-bounds read as the addresses were not replaced to match the family of the next template.
While there are theoretical use cases for optional templates in outbound policies, the only practical one is to skip IPComp states in inbound policies if uncompressed packets are received that are handled by an implicitly created IPIP state instead.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Tobias Brunner tobias@strongswan.org Acked-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/key/af_key.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/key/af_key.c b/net/key/af_key.c index 6a170a37d9e2..fe7829afea1a 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1950,7 +1950,8 @@ static u32 gen_reqid(struct net *net) }
static int -parse_ipsecrequest(struct xfrm_policy *xp, struct sadb_x_ipsecrequest *rq) +parse_ipsecrequest(struct xfrm_policy *xp, struct sadb_x_policy *pol, + struct sadb_x_ipsecrequest *rq) { struct net *net = xp_net(xp); struct xfrm_tmpl *t = xp->xfrm_vec + xp->xfrm_nr; @@ -1968,9 +1969,12 @@ parse_ipsecrequest(struct xfrm_policy *xp, struct sadb_x_ipsecrequest *rq) if ((mode = pfkey_mode_to_xfrm(rq->sadb_x_ipsecrequest_mode)) < 0) return -EINVAL; t->mode = mode; - if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_USE) + if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_USE) { + if ((mode == XFRM_MODE_TUNNEL || mode == XFRM_MODE_BEET) && + pol->sadb_x_policy_dir == IPSEC_DIR_OUTBOUND) + return -EINVAL; t->optional = 1; - else if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_UNIQUE) { + } else if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_UNIQUE) { t->reqid = rq->sadb_x_ipsecrequest_reqid; if (t->reqid > IPSEC_MANUAL_REQID_MAX) t->reqid = 0; @@ -2012,7 +2016,7 @@ parse_ipsecrequests(struct xfrm_policy *xp, struct sadb_x_policy *pol) rq->sadb_x_ipsecrequest_len < sizeof(*rq)) return -EINVAL;
- if ((err = parse_ipsecrequest(xp, rq)) < 0) + if ((err = parse_ipsecrequest(xp, pol, rq)) < 0) return err; len -= rq->sadb_x_ipsecrequest_len; rq = (void*)((u8*)rq + rq->sadb_x_ipsecrequest_len);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.284 commit 0a7c08aeca3e531772b83a224ec9b997c6fc4b2c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit dacab578c7c6cd06c50c89dfa36b0e0f10decd4e ]
syzbot triggered the following splat [1], sending an empty message through pppoe_sendmsg().
When VLAN_FLAG_REORDER_HDR flag is set, vlan_dev_hard_header() does not push extra bytes for the VLAN header, because vlan is offloaded.
Unfortunately vlan_dev_hard_start_xmit() first reads veth->h_vlan_proto before testing (vlan->flags & VLAN_FLAG_REORDER_HDR).
We need to swap the two conditions.
[1] BUG: KMSAN: uninit-value in vlan_dev_hard_start_xmit+0x171/0x7f0 net/8021q/vlan_dev.c:111 vlan_dev_hard_start_xmit+0x171/0x7f0 net/8021q/vlan_dev.c:111 __netdev_start_xmit include/linux/netdevice.h:4883 [inline] netdev_start_xmit include/linux/netdevice.h:4897 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x253/0xa20 net/core/dev.c:3596 __dev_queue_xmit+0x3c7f/0x5ac0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3053 [inline] pppoe_sendmsg+0xa93/0xb80 drivers/net/ppp/pppoe.c:900 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0xa24/0xe40 net/socket.c:2501 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555 __sys_sendmmsg+0x411/0xa50 net/socket.c:2641 __do_sys_sendmmsg net/socket.c:2670 [inline] __se_sys_sendmmsg net/socket.c:2667 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at: slab_post_alloc_hook+0x12d/0xb60 mm/slab.h:774 slab_alloc_node mm/slub.c:3452 [inline] kmem_cache_alloc_node+0x543/0xab0 mm/slub.c:3497 kmalloc_reserve+0x148/0x470 net/core/skbuff.c:520 __alloc_skb+0x3a7/0x850 net/core/skbuff.c:606 alloc_skb include/linux/skbuff.h:1277 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2583 pppoe_sendmsg+0x3af/0xb80 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0xa24/0xe40 net/socket.c:2501 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555 __sys_sendmmsg+0x411/0xa50 net/socket.c:2641 __do_sys_sendmmsg net/socket.c:2670 [inline] __se_sys_sendmmsg net/socket.c:2667 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 0 PID: 29770 Comm: syz-executor.0 Not tainted 6.3.0-rc6-syzkaller-gc478e5b17829 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/8021q/vlan_dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 52e33fd9c9bf..3bdcb74c2ecc 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -114,8 +114,8 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb, * NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING * OTHER THINGS LIKE FDDI/TokenRing/802.3 SNAPs... */ - if (veth->h_vlan_proto != vlan->vlan_proto || - vlan->flags & VLAN_FLAG_REORDER_HDR) { + if (vlan->flags & VLAN_FLAG_REORDER_HDR || + veth->h_vlan_proto != vlan->vlan_proto) { u16 vlan_tci; vlan_tci = vlan->vlan_id; vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.284 commit cc56de054d828935aa37734b479f82fa34b5f9bd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
commit ad42a35bdfc6d3c0fc4cb4027d7b2757ce665665 upstream.
syzbot reported [0] a null-ptr-deref in sk_get_rmem0() while using IPPROTO_UDPLITE (0x88):
14:25:52 executing program 1: r0 = socket$inet6(0xa, 0x80002, 0x88)
We had a similar report [1] for probably sk_memory_allocated_add() in __sk_mem_raise_allocated(), and commit c915fe13cbaa ("udplite: fix NULL pointer dereference") fixed it by setting .memory_allocated for udplite_prot and udplitev6_prot.
To fix the variant, we need to set either .sysctl_wmem_offset or .sysctl_rmem.
Now UDP and UDPLITE share the same value for .memory_allocated, so we use the same .sysctl_wmem_offset for UDP and UDPLITE.
[0]: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 6829 Comm: syz-executor.1 Not tainted 6.4.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/28/2023 RIP: 0010:sk_get_rmem0 include/net/sock.h:2907 [inline] RIP: 0010:__sk_mem_raise_allocated+0x806/0x17a0 net/core/sock.c:3006 Code: c1 ea 03 80 3c 02 00 0f 85 23 0f 00 00 48 8b 44 24 08 48 8b 98 38 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 0f 8d 6f 0a 00 00 8b RSP: 0018:ffffc90005d7f450 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004d92000 RDX: 0000000000000000 RSI: ffffffff88066482 RDI: ffffffff8e2ccbb8 RBP: ffff8880173f7000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000030000 R13: 0000000000000001 R14: 0000000000000340 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9800000(0063) knlGS:00000000f7f1cb40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000002e82f000 CR3: 0000000034ff0000 CR4: 00000000003506f0 Call Trace: <TASK> __sk_mem_schedule+0x6c/0xe0 net/core/sock.c:3077 udp_rmem_schedule net/ipv4/udp.c:1539 [inline] __udp_enqueue_schedule_skb+0x776/0xb30 net/ipv4/udp.c:1581 __udpv6_queue_rcv_skb net/ipv6/udp.c:666 [inline] udpv6_queue_rcv_one_skb+0xc39/0x16c0 net/ipv6/udp.c:775 udpv6_queue_rcv_skb+0x194/0xa10 net/ipv6/udp.c:793 __udp6_lib_mcast_deliver net/ipv6/udp.c:906 [inline] __udp6_lib_rcv+0x1bda/0x2bd0 net/ipv6/udp.c:1013 ip6_protocol_deliver_rcu+0x2e7/0x1250 net/ipv6/ip6_input.c:437 ip6_input_finish+0x150/0x2f0 net/ipv6/ip6_input.c:482 NF_HOOK include/linux/netfilter.h:303 [inline] NF_HOOK include/linux/netfilter.h:297 [inline] ip6_input+0xa0/0xd0 net/ipv6/ip6_input.c:491 ip6_mc_input+0x40b/0xf50 net/ipv6/ip6_input.c:585 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] NF_HOOK include/linux/netfilter.h:297 [inline] ipv6_rcv+0x250/0x380 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5491 __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5605 netif_receive_skb_internal net/core/dev.c:5691 [inline] netif_receive_skb+0x133/0x7a0 net/core/dev.c:5750 tun_rx_batched+0x4b3/0x7a0 drivers/net/tun.c:1553 tun_get_user+0x2452/0x39c0 drivers/net/tun.c:1989 tun_chr_write_iter+0xdf/0x200 drivers/net/tun.c:2035 call_write_iter include/linux/fs.h:1868 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x945/0xd50 fs/read_write.c:584 ksys_write+0x12b/0x250 fs/read_write.c:637 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x70/0x82 RIP: 0023:0xf7f21579 Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 RSP: 002b:00000000f7f1c590 EFLAGS: 00000282 ORIG_RAX: 0000000000000004 RAX: ffffffffffffffda RBX: 00000000000000c8 RCX: 0000000020000040 RDX: 0000000000000083 RSI: 00000000f734e000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in:
Link: https://lore.kernel.org/netdev/CANaxB-yCk8hhP68L4Q2nFOJht8sqgXGGQO2AftpHs0u1... [1] Fixes: 850cbaddb52d ("udp: use it's own memory accounting schema") Reported-by: syzbot+444ca0907e96f7c5e48b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=444ca0907e96f7c5e48b Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230523163305.66466-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/udplite.c | 2 ++ net/ipv6/udplite.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c index 6beab353bc8b..27173549b000 100644 --- a/net/ipv4/udplite.c +++ b/net/ipv4/udplite.c @@ -64,6 +64,8 @@ struct proto udplite_prot = { .get_port = udp_v4_get_port, .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, + .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), + .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_udp_rmem_min), .obj_size = sizeof(struct udp_sock), .h.udp_table = &udplite_table, #ifdef CONFIG_COMPAT diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c index f15b8305d87b..a26a4b5da09c 100644 --- a/net/ipv6/udplite.c +++ b/net/ipv6/udplite.c @@ -58,6 +58,8 @@ struct proto udplitev6_prot = { .get_port = udp_v6_get_port, .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, + .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), + .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_udp_rmem_min), .obj_size = sizeof(struct udp6_sock), .h.udp_table = &udplite_table, #ifdef CONFIG_COMPAT
From: Pratyush Yadav ptyadav@amazon.de
stable inclusion from stable-v4.19.284 commit 779332447108545ef04682ea29af5f85c0202aee category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
commit 8a02fb71d7192ff1a9a47c9d937624966c6e09af upstream.
Commit 50749f2dd685 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.") added a call to skb_orphan_frags_rx() to fix leaks with zerocopy skbs. But it ended up adding a leak of its own. When skb_orphan_frags_rx() fails, the function just returns, leaking the skb it just cloned. Free it before returning.
This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc.
Fixes: 50749f2dd685 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.") Signed-off-by: Pratyush Yadav ptyadav@amazon.de Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230522153020.32422-1-ptyadav@amazon.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/skbuff.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5ae62d743357..b7314a6cf8c2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4446,8 +4446,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, } else { skb = skb_clone(orig_skb, GFP_ATOMIC);
- if (skb_orphan_frags_rx(skb, GFP_ATOMIC)) + if (skb_orphan_frags_rx(skb, GFP_ATOMIC)) { + kfree_skb(skb); return; + } } if (!skb) return;
From: Gavrilov Ilia Ilia.Gavrilov@infotecs.ru
stable inclusion from stable-v4.19.284 commit 04bf69e3de435d793a203aacc4b774f8f9f2baeb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
commit 878ecb0897f4737a4c9401f3523fd49589025671 upstream.
optlen is fetched without checking whether there is more than one byte to parse. It can lead to out-of-bounds access.
Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: c61a40432509 ("[IPV6]: Find option offset by type.") Signed-off-by: Gavrilov Ilia Ilia.Gavrilov@infotecs.ru Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/exthdrs_core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c index ae365df8abf7..f356d3049143 100644 --- a/net/ipv6/exthdrs_core.c +++ b/net/ipv6/exthdrs_core.c @@ -142,6 +142,8 @@ int ipv6_find_tlv(const struct sk_buff *skb, int offset, int type) optlen = 1; break; default: + if (len < 2) + goto bad; optlen = nh[offset + 1] + 2; if (optlen > len) goto bad;
From: Nicolas Dichtel nicolas.dichtel@6wind.com
stable inclusion from stable-v4.19.285 commit 15c11db30e5a21473a0faca0df428d8f83c1fac1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
commit 3632679d9e4f879f49949bb5b050e0de553e4739 upstream.
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/ip.h | 2 ++ include/uapi/linux/in.h | 2 ++ net/ipv4/ip_sockglue.c | 12 +++++++++++- net/ipv4/raw.c | 5 ++++- net/ipv6/raw.c | 3 ++- 5 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h index a711bafa62ce..6c8c2552a95f 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -72,6 +72,7 @@ struct ipcm_cookie { __be32 addr; int oif; struct ip_options_rcu *opt; + __u8 protocol; __u8 ttl; __s16 tos; char priority; @@ -91,6 +92,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, ipcm->sockc.tsflags = inet->sk.sk_tsflags; ipcm->oif = inet->sk.sk_bound_dev_if; ipcm->addr = inet->inet_saddr; + ipcm->protocol = inet->inet_num; }
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb)) diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index 2a66ab49f14d..b4f95eb8cdcd 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -154,6 +154,8 @@ struct in_addr { #define MCAST_MSFILTER 48 #define IP_MULTICAST_ALL 49 #define IP_UNICAST_IF 50 +#define IP_LOCAL_PORT_RANGE 51 +#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0 #define MCAST_INCLUDE 1 diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index aa3fd61818c4..0c528f642eb8 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -316,7 +316,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc, ipc->tos = val; ipc->priority = rt_tos2priority(ipc->tos); break; - + case IP_PROTOCOL: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) + return -EINVAL; + val = *(int *)CMSG_DATA(cmsg); + if (val < 1 || val > 255) + return -EINVAL; + ipc->protocol = val; + break; default: return -EINVAL; } @@ -1524,6 +1531,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, case IP_MINTTL: val = inet->min_ttl; break; + case IP_PROTOCOL: + val = inet_sk(sk)->inet_num; + break; default: release_sock(sk); return -ENOPROTOOPT; diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index a24f94abe14f..754c0fae499a 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -562,6 +562,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) }
ipcm_init_sk(&ipc, inet); + /* Keep backward compat */ + if (hdrincl) + ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) { err = ip_cmsg_send(sk, msg, &ipc, false); @@ -629,7 +632,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE, - hdrincl ? IPPROTO_RAW : sk->sk_protocol, + hdrincl ? ipc.protocol : sk->sk_protocol, inet_sk_flowi_flags(sk) | (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), daddr, saddr, 0, 0, sk->sk_uid); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index cc42dc14c33a..6aa852285cd2 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -831,7 +831,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!proto) proto = inet->inet_num; - else if (proto != inet->inet_num) + else if (proto != inet->inet_num && + inet->inet_num != IPPROTO_RAW) return -EINVAL;
if (proto > 255)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.285 commit 51bddc67b48d22aa99a4e9d72d68a9b8376d907a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 822b5a1c17df7e338b9f05d1cfe5764e37c7f74f ]
syzkaller found a data race of pkt_sk(sk)->num.
The value is changed under lock_sock() and po->bind_lock, so we need READ_ONCE() to access pkt_sk(sk)->num without these locks in packet_bind_spkt(), packet_bind(), and sk_diag_fill().
Note that WRITE_ONCE() is already added by commit c7d2ef5dd4b0 ("net/packet: annotate accesses to po->bind").
BUG: KCSAN: data-race in packet_bind / packet_do_bind
write (marked) to 0xffff88802ffd1cee of 2 bytes by task 7322 on cpu 0: packet_do_bind+0x446/0x640 net/packet/af_packet.c:3236 packet_bind+0x99/0xe0 net/packet/af_packet.c:3321 __sys_bind+0x19b/0x1e0 net/socket.c:1803 __do_sys_bind net/socket.c:1814 [inline] __se_sys_bind net/socket.c:1812 [inline] __x64_sys_bind+0x40/0x50 net/socket.c:1812 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88802ffd1cee of 2 bytes by task 7318 on cpu 1: packet_bind+0xbf/0xe0 net/packet/af_packet.c:3322 __sys_bind+0x19b/0x1e0 net/socket.c:1803 __do_sys_bind net/socket.c:1814 [inline] __se_sys_bind net/socket.c:1812 [inline] __x64_sys_bind+0x40/0x50 net/socket.c:1812 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0300 -> 0x0000
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 7318 Comm: syz-executor.4 Not tainted 6.3.0-13380-g7fddb5b5300c #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 96ec6327144e ("packet: Diag core and basic socket info dumping") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230524232934.50950-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/packet/af_packet.c | 4 ++-- net/packet/diag.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index aef6ab487287..73711353c04d 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3214,7 +3214,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data)); name[sizeof(uaddr->sa_data)] = 0;
- return packet_do_bind(sk, name, 0, pkt_sk(sk)->num); + return packet_do_bind(sk, name, 0, READ_ONCE(pkt_sk(sk)->num)); }
static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) @@ -3232,7 +3232,7 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len return -EINVAL;
return packet_do_bind(sk, NULL, sll->sll_ifindex, - sll->sll_protocol ? : pkt_sk(sk)->num); + sll->sll_protocol ? : READ_ONCE(pkt_sk(sk)->num)); }
static struct proto packet_proto = { diff --git a/net/packet/diag.c b/net/packet/diag.c index d9f912ad23df..ecabf78d29b8 100644 --- a/net/packet/diag.c +++ b/net/packet/diag.c @@ -142,7 +142,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, rp = nlmsg_data(nlh); rp->pdiag_family = AF_PACKET; rp->pdiag_type = sk->sk_type; - rp->pdiag_num = ntohs(po->num); + rp->pdiag_num = ntohs(READ_ONCE(po->num)); rp->pdiag_ino = sk_ino; sock_diag_save_cookie(sk, rp->pdiag_cookie);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.285 commit 4a94996e1d730f40b78e5b3fcbc62a83e1390f23 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 6ffc57ea004234d9373c57b204fd10370a69f392 ]
A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt()
This is better handled by reading pkt_sk(sk)->num later in packet_do_bind() while appropriate lock is held.
READ_ONCE() in writers are often an evidence of something being wrong.
Fixes: 822b5a1c17df ("af_packet: Fix data-races of pkt_sk(sk)->num.") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Willem de Bruijn willemb@google.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230526154342.2533026-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/packet/af_packet.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 73711353c04d..332b1f7a8924 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3112,6 +3112,9 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
lock_sock(sk); spin_lock(&po->bind_lock); + if (!proto) + proto = po->num; + rcu_read_lock();
if (po->fanout) { @@ -3214,7 +3217,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data)); name[sizeof(uaddr->sa_data)] = 0;
- return packet_do_bind(sk, name, 0, READ_ONCE(pkt_sk(sk)->num)); + return packet_do_bind(sk, name, 0, 0); }
static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) @@ -3231,8 +3234,7 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len if (sll->sll_family != AF_PACKET) return -EINVAL;
- return packet_do_bind(sk, NULL, sll->sll_ifindex, - sll->sll_protocol ? : READ_ONCE(pkt_sk(sk)->num)); + return packet_do_bind(sk, NULL, sll->sll_ifindex, sll->sll_protocol); }
static struct proto packet_proto = {
From: Cambda Zhu cambda@linux.alibaba.com
stable inclusion from stable-v4.19.285 commit 70ffc7579752eb34bcf9a42bbfea71fc79275e4c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 34dfde4ad87b84d21278a7e19d92b5b2c68e6c4d ]
This patch replaces the tp->mss_cache check in getting TCP_MAXSEG with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Since tp->mss_cache is initialized with TCP_MSS_DEFAULT, checking if it's zero is probably a bug.
With this change, getting TCP_MAXSEG before connecting will return default MSS normally, and return user_mss if user_mss is set.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Jack Yang mingliang@linux.alibaba.com Suggested-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/netdev/CANn89i+3kL9pYtkxkwxwNMzvC_w3LNUum_2=3u+UyLBm... Signed-off-by: Cambda Zhu cambda@linux.alibaba.com Link: https://lore.kernel.org/netdev/14D45862-36EA-4076-974C-EA67513C92F6@linux.al... Reviewed-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20230527040317.68247-1-cambda@linux.alibaba.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a12f44b9c0f5..79a0d908eb3f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3374,7 +3374,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, switch (optname) { case TCP_MAXSEG: val = tp->mss_cache; - if (!val && ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))) + if (tp->rx_opt.user_mss && + ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))) val = tp->rx_opt.user_mss; if (tp->repair) val = tp->rx_opt.mss_clamp;
From: Vladislav Efanov VEfanov@ispras.ru
stable inclusion from stable-v4.19.285 commit d328ad5e09f5d74af87e2ff4298617a88b51edd1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 448a5ce1120c5bdbce1f1ccdabcd31c7d029f328 ]
Syzkaller got the following report: BUG: KASAN: use-after-free in sk_setup_caps+0x621/0x690 net/core/sock.c:2018 Read of size 8 at addr ffff888027f82780 by task syz-executor276/3255
The function sk_setup_caps (called by ip6_sk_dst_store_flow-> ip6_dst_store) referenced already freed memory as this memory was freed by parallel task in udpv6_sendmsg->ip6_sk_dst_lookup_flow-> sk_dst_check.
task1 (connect) task2 (udp6_sendmsg) sk_setup_caps->sk_dst_set | | sk_dst_check-> | sk_dst_set | dst_release sk_setup_caps references | to already freed dst_entry|
The reason for this race condition is: sk_setup_caps() keeps using the dst after transferring the ownership to the dst cache.
Found by Linux Verification Center (linuxtesting.org) with syzkaller.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Vladislav Efanov VEfanov@ispras.ru Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 02516e424e94..b18c54ba80b5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1817,7 +1817,6 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) { u32 max_segs = 1;
- sk_dst_set(sk, dst); sk->sk_route_caps = dst->dev->features | sk->sk_route_forced_caps; if (sk->sk_route_caps & NETIF_F_GSO) sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE; @@ -1832,6 +1831,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) } } sk->sk_gso_max_segs = max_segs; + sk_dst_set(sk, dst); } EXPORT_SYMBOL_GPL(sk_setup_caps);
From: Benedict Wong benedictwong@google.com
stable inclusion from stable-v4.19.285 commit d3d8a6999c0839b54edfd6d7772a580ea3608f75 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 8680407b6f8f5fba59e8f1d63c869abc280f04df ]
This change ensures that if configured in the policy, the if_id set in the policy and secpath states match during the inbound policy check. Without this, there is potential for ambiguity where entries in the secpath differing by only the if_id could be mismatched.
Notably, this is checked in the outbound direction when resolving templates to SAs, but not on the inbound path when matching SAs and policies.
Test: Tested against Android kernel unit tests & CTS Signed-off-by: Benedict Wong benedictwong@google.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/xfrm/xfrm_policy.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index e02421ca9609..8c133ecec9ab 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2246,7 +2246,7 @@ xfrm_secpath_reject(int idx, struct sk_buff *skb, const struct flowi *fl)
static inline int xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x, - unsigned short family) + unsigned short family, u32 if_id) { if (xfrm_state_kern(x)) return tmpl->optional && !xfrm_state_addr_cmp(tmpl, x, tmpl->encap_family); @@ -2257,7 +2257,8 @@ xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x, (tmpl->allalgs || (tmpl->aalgos & (1<<x->props.aalgo)) || !(xfrm_id_proto_match(tmpl->id.proto, IPSEC_PROTO_ANY))) && !(x->props.mode != XFRM_MODE_TRANSPORT && - xfrm_state_addr_cmp(tmpl, x, family)); + xfrm_state_addr_cmp(tmpl, x, family)) && + (if_id == 0 || if_id == x->if_id); }
/* @@ -2269,7 +2270,7 @@ xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x, */ static inline int xfrm_policy_ok(const struct xfrm_tmpl *tmpl, const struct sec_path *sp, int start, - unsigned short family) + unsigned short family, u32 if_id) { int idx = start;
@@ -2279,7 +2280,7 @@ xfrm_policy_ok(const struct xfrm_tmpl *tmpl, const struct sec_path *sp, int star } else start = -1; for (; idx < sp->len; idx++) { - if (xfrm_state_ok(tmpl, sp->xvec[idx], family)) + if (xfrm_state_ok(tmpl, sp->xvec[idx], family, if_id)) return ++idx; if (sp->xvec[idx]->props.mode != XFRM_MODE_TRANSPORT) { if (start == -1) @@ -2456,7 +2457,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, * are implied between each two transformations. */ for (i = xfrm_nr-1, k = 0; i >= 0; i--) { - k = xfrm_policy_ok(tpp[i], sp, k, family); + k = xfrm_policy_ok(tpp[i], sp, k, family, if_id); if (k < 0) { if (k < -1) /* "-2 - errored_index" returned */
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.286 commit 9b3907c57b044d4709fcadd7b0690004fb310962 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 1e5c647c3f6d4f8497dedcd226204e1880e0ffb3 ]
Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash.
This also prevents a (smart ?) compiler to remove the condition in:
if (sk->sk_rxhash != newval) sk->sk_rxhash = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/sock.h | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index bac4ae940a89..2ca00fe89cf9 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1018,8 +1018,12 @@ static inline void sock_rps_record_flow(const struct sock *sk) * OR an additional socket flag * [1] : sk_state and sk_prot are in the same cache line. */ - if (sk->sk_state == TCP_ESTABLISHED) - sock_rps_record_flow_hash(sk->sk_rxhash); + if (sk->sk_state == TCP_ESTABLISHED) { + /* This READ_ONCE() is paired with the WRITE_ONCE() + * from sock_rps_save_rxhash() and sock_rps_reset_rxhash(). + */ + sock_rps_record_flow_hash(READ_ONCE(sk->sk_rxhash)); + } } #endif } @@ -1028,15 +1032,19 @@ static inline void sock_rps_save_rxhash(struct sock *sk, const struct sk_buff *skb) { #ifdef CONFIG_RPS - if (unlikely(sk->sk_rxhash != skb->hash)) - sk->sk_rxhash = skb->hash; + /* The following WRITE_ONCE() is paired with the READ_ONCE() + * here, and another one in sock_rps_record_flow(). + */ + if (unlikely(READ_ONCE(sk->sk_rxhash) != skb->hash)) + WRITE_ONCE(sk->sk_rxhash, skb->hash); #endif }
static inline void sock_rps_reset_rxhash(struct sock *sk) { #ifdef CONFIG_RPS - sk->sk_rxhash = 0; + /* Paired with READ_ONCE() in sock_rps_record_flow() */ + WRITE_ONCE(sk->sk_rxhash, 0); #endif }
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.286 commit 28dbabef5bb865c5a573ebee831a4b9f3544e2e1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 5c3b74a92aa285a3df722bf6329ba7ccf70346d6 ]
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.
This also prevents a (smart ?) compiler to remove the condition in:
if (table->ents[index] != newval) table->ents[index] = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/netdevice.h | 7 +++++-- net/core/dev.c | 6 ++++-- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 752df39590ba..437cd78fdb9a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -714,8 +714,11 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table, /* We only give a hint, preemption can change CPU under us */ val |= raw_smp_processor_id();
- if (table->ents[index] != val) - table->ents[index] = val; + /* The following WRITE_ONCE() is paired with the READ_ONCE() + * here, and another one in get_rps_cpu(). + */ + if (READ_ONCE(table->ents[index]) != val) + WRITE_ONCE(table->ents[index], val); } }
diff --git a/net/core/dev.c b/net/core/dev.c index e40d9d98baea..e7cc6630a8b6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4061,8 +4061,10 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, u32 next_cpu; u32 ident;
- /* First check into global flow table if there is a match */ - ident = sock_flow_table->ents[hash & sock_flow_table->mask]; + /* First check into global flow table if there is a match. + * This READ_ONCE() pairs with WRITE_ONCE() from rps_record_sock_flow(). + */ + ident = READ_ONCE(sock_flow_table->ents[hash & sock_flow_table->mask]); if ((ident ^ hash) & ~rps_cpu_mask) goto try_rps;
From: Hangyu Hua hbh25y@gmail.com
stable inclusion from stable-v4.19.286 commit 3cc54473badeb4cdb3798a079480114f0738432f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 44f8baaf230c655c249467ca415b570deca8df77 ]
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs to be called to drop the refcount if ops don't implement the required function.
Fixes: 9f407f1768d3 ("net: sched: introduce chain templates") Signed-off-by: Hangyu Hua hbh25y@gmail.com Reviewed-by: Larysa Zaremba larysa.zaremba@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/sched/cls_api.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 3ca18e3a2be9..f94cbe706d33 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1840,6 +1840,7 @@ static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net, return PTR_ERR(ops); if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump) { NL_SET_ERR_MSG(extack, "Chain templates are not supported with specified classifier"); + module_put(ops->owner); return -EOPNOTSUPP; }
From: Guillaume Nault gnault@redhat.com
stable inclusion from stable-v4.19.287 commit 17c46e7f299b1acc2688c12e7b0c129a3ad46fdd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF CVE: NA
--------------------------------
[ Upstream commit 91ffd1bae1dafbb9e34b46813f5b058581d9144d ]
Ping sockets can't send packets when they're bound to a VRF master device and the output interface is set to a slave device.
For example, when net.ipv4.ping_group_range is properly set, so that ping6 can use ping sockets, the following kind of commands fails: $ ip vrf exec red ping6 fe80::854:e7ff:fe88:4bf1%eth1
What happens is that sk->sk_bound_dev_if is set to the VRF master device, but 'oif' is set to the real output device. Since both are set but different, ping_v6_sendmsg() sees their value as inconsistent and fails.
Fix this by allowing 'oif' to be a slave device of ->sk_bound_dev_if.
This fixes the following kselftest failure: $ ./fcnal-test.sh -t ipv6_ping [...] TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/b6191f90-ffca-dbca-7d06-88a9788def9c@alu.uniz... Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Fixes: 5e457896986e ("net: ipv6: Fix ping to link-local addresses.") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/6c8b53108816a8d0d5705ae37bdc5a8322b5e3d9.168615384... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/ping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 5c9be8594483..e065f49a4ae3 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -101,7 +101,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) addr_type = ipv6_addr_type(daddr); if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) || (addr_type & IPV6_ADDR_MAPPED) || - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if)) + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if && + l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if)) return -EINVAL;
/* TODO: use ip6_datagram_send_ctl to get options from cmsg */