From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.282 commit 9577d9f0fbf1eea95f61b02627572c401499aa32 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit 21985f43376cee092702d6cb963ff97a9d2ede68 upstream.
Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6 socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is changed to udp_prot and ->destroy() never cleans it up, resulting in a memory leak.
This is due to the discrepancy between inet6_destroy_sock() and IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM to remove the difference.
However, this is not enough for now because rxpmtu can be changed without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless sendmsg() support"). We will fix this case in the following patch.
Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy() in the future.
Fixes: 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/ipv6.h | 1 + net/ipv6/af_inet6.c | 6 ++++++ net/ipv6/ipv6_sockglue.c | 20 ++++++++------------ 3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 4c2e40882e88..f33be9653aaa 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1040,6 +1040,7 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err, __be16 port, void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info); void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);
+void inet6_cleanup_sock(struct sock *sk); int inet6_release(struct socket *sock); int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); int inet6_getname(struct socket *sock, struct sockaddr *uaddr, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index a6be1ad6994a..d285db413ceb 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -505,6 +505,12 @@ void inet6_destroy_sock(struct sock *sk) } EXPORT_SYMBOL_GPL(inet6_destroy_sock);
+void inet6_cleanup_sock(struct sock *sk) +{ + inet6_destroy_sock(sk); +} +EXPORT_SYMBOL_GPL(inet6_cleanup_sock); + /* * This does both peername and sockname. */ diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index dccbc71d7f8e..441febd19674 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -180,9 +180,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, if (optlen < sizeof(int)) goto e_inval; if (val == PF_INET) { - struct ipv6_txoptions *opt; - struct sk_buff *pktopt; - if (sk->sk_type == SOCK_RAW) break;
@@ -213,7 +210,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; }
- fl6_free_socklist(sk); __ipv6_sock_mc_close(sk); __ipv6_sock_ac_close(sk);
@@ -251,14 +247,14 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sk->sk_socket->ops = &inet_dgram_ops; sk->sk_family = PF_INET; } - opt = xchg((__force struct ipv6_txoptions **)&np->opt, - NULL); - if (opt) { - atomic_sub(opt->tot_len, &sk->sk_omem_alloc); - txopt_put(opt); - } - pktopt = xchg(&np->pktoptions, NULL); - kfree_skb(pktopt); + + /* Disable all options not to allocate memory anymore, + * but there is still a race. See the lockless path + * in udpv6_sendmsg() and ipv6_local_rxpmtu(). + */ + np->rxopt.all = 0; + + inet6_cleanup_sock(sk);
/* * ... and add it to the refcnt debug socks count
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.282 commit 1d8a87d6b6ee29fd37e8ecd67d45aef76bacc88b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit d38afeec26ed4739c640bf286c270559aab2ba5f upstream.
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak:
setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case
- __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed.
- goto out_no_dst;
- lock_sock(sk)
For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally.
We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next.
Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/ipv6.h | 1 + include/net/udp.h | 2 +- include/net/udplite.h | 8 -------- net/ipv4/udp.c | 9 ++++++--- net/ipv4/udplite.c | 8 ++++++++ net/ipv6/af_inet6.c | 8 +++++++- net/ipv6/udp.c | 15 ++++++++++++++- net/ipv6/udp_impl.h | 1 + net/ipv6/udplite.c | 9 ++++++++- 9 files changed, 46 insertions(+), 15 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index f33be9653aaa..0c883249814c 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1041,6 +1041,7 @@ void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info); void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);
void inet6_cleanup_sock(struct sock *sk); +void inet6_sock_destruct(struct sock *sk); int inet6_release(struct socket *sock); int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); int inet6_getname(struct socket *sock, struct sockaddr *uaddr, diff --git a/include/net/udp.h b/include/net/udp.h index 7f3067172e50..bd27e9dd6b84 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -265,7 +265,7 @@ static inline bool udp_sk_bound_dev_eq(struct net *net, int bound_dev_if, }
/* net/ipv4/udp.c */ -void udp_destruct_sock(struct sock *sk); +void udp_destruct_common(struct sock *sk); void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len); int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb); void udp_skb_destructor(struct sock *sk, struct sk_buff *skb); diff --git a/include/net/udplite.h b/include/net/udplite.h index 9185e45b997f..c59ba86668af 100644 --- a/include/net/udplite.h +++ b/include/net/udplite.h @@ -24,14 +24,6 @@ static __inline__ int udplite_getfrag(void *from, char *to, int offset, return copy_from_iter_full(to, len, &msg->msg_iter) ? 0 : -EFAULT; }
-/* Designate sk as UDP-Lite socket */ -static inline int udplite_sk_init(struct sock *sk) -{ - udp_init_sock(sk); - udp_sk(sk)->pcflag = UDPLITE_BIT; - return 0; -} - /* * Checksumming routines */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index b9b6b62ef16f..dfe166871ea4 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1455,7 +1455,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(__udp_enqueue_schedule_skb);
-void udp_destruct_sock(struct sock *sk) +void udp_destruct_common(struct sock *sk) { /* reclaim completely the forward allocated memory */ struct udp_sock *up = udp_sk(sk); @@ -1468,10 +1468,14 @@ void udp_destruct_sock(struct sock *sk) kfree_skb(skb); } udp_rmem_release(sk, total, 0, true); +} +EXPORT_SYMBOL_GPL(udp_destruct_common);
+static void udp_destruct_sock(struct sock *sk) +{ + udp_destruct_common(sk); inet_sock_destruct(sk); } -EXPORT_SYMBOL_GPL(udp_destruct_sock);
int udp_init_sock(struct sock *sk) { @@ -1479,7 +1483,6 @@ int udp_init_sock(struct sock *sk) sk->sk_destruct = udp_destruct_sock; return 0; } -EXPORT_SYMBOL_GPL(udp_init_sock);
void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len) { diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c index 8545457752fb..6beab353bc8b 100644 --- a/net/ipv4/udplite.c +++ b/net/ipv4/udplite.c @@ -20,6 +20,14 @@ struct udp_table udplite_table __read_mostly; EXPORT_SYMBOL(udplite_table);
+/* Designate sk as UDP-Lite socket */ +static int udplite_sk_init(struct sock *sk) +{ + udp_init_sock(sk); + udp_sk(sk)->pcflag = UDPLITE_BIT; + return 0; +} + static int udplite_rcv(struct sk_buff *skb) { return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index d285db413ceb..38f82257579d 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -107,6 +107,12 @@ static __inline__ struct ipv6_pinfo *inet6_sk_generic(struct sock *sk) return (struct ipv6_pinfo *)(((u8 *)sk) + offset); }
+void inet6_sock_destruct(struct sock *sk) +{ + inet6_cleanup_sock(sk); + inet_sock_destruct(sk); +} + static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern) { @@ -199,7 +205,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol, inet->hdrincl = 1; }
- sk->sk_destruct = inet_sock_destruct; + sk->sk_destruct = inet6_sock_destruct; sk->sk_family = PF_INET6; sk->sk_protocol = protocol;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index cd317a6b73fc..49e57cda20c0 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -66,6 +66,19 @@ static bool udp6_lib_exact_dif_match(struct net *net, struct sk_buff *skb) return false; }
+static void udpv6_destruct_sock(struct sock *sk) +{ + udp_destruct_common(sk); + inet6_sock_destruct(sk); +} + +int udpv6_init_sock(struct sock *sk) +{ + skb_queue_head_init(&udp_sk(sk)->reader_queue); + sk->sk_destruct = udpv6_destruct_sock; + return 0; +} + static u32 udp6_ehashfn(const struct net *net, const struct in6_addr *laddr, const u16 lport, @@ -1595,7 +1608,7 @@ struct proto udpv6_prot = { .connect = ip6_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, - .init = udp_init_sock, + .init = udpv6_init_sock, .destroy = udpv6_destroy_sock, .setsockopt = udpv6_setsockopt, .getsockopt = udpv6_getsockopt, diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h index 7903e21c178b..e5d067b09ccf 100644 --- a/net/ipv6/udp_impl.h +++ b/net/ipv6/udp_impl.h @@ -12,6 +12,7 @@ int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int); void __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int, __be32, struct udp_table *);
+int udpv6_init_sock(struct sock *sk); int udp_v6_get_port(struct sock *sk, unsigned short snum);
int udpv6_getsockopt(struct sock *sk, int level, int optname, diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c index 5000ad6878e6..f15b8305d87b 100644 --- a/net/ipv6/udplite.c +++ b/net/ipv6/udplite.c @@ -15,6 +15,13 @@ #include <linux/proc_fs.h> #include "udp_impl.h"
+static int udplitev6_sk_init(struct sock *sk) +{ + udpv6_init_sock(sk); + udp_sk(sk)->pcflag = UDPLITE_BIT; + return 0; +} + static int udplitev6_rcv(struct sk_buff *skb) { return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE); @@ -40,7 +47,7 @@ struct proto udplitev6_prot = { .connect = ip6_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, - .init = udplite_sk_init, + .init = udplitev6_sk_init, .destroy = udpv6_destroy_sock, .setsockopt = udpv6_setsockopt, .getsockopt = udpv6_getsockopt,
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.282 commit e1820a934398d35a5925bcf61316983c90857f7d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit b5fc29233d28be7a3322848ebe73ac327559cdb9 upstream.
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources.
Now we can remove unnecessary inet6_destroy_sock() calls in sk->sk_prot->destroy().
DCCP and SCTP have their own sk->sk_destruct() function, so we change them separately in the following patches.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/ping.c | 6 ------ net/ipv6/raw.c | 2 -- net/ipv6/tcp_ipv6.c | 8 +------- net/ipv6/udp.c | 2 -- net/l2tp/l2tp_ip6.c | 2 -- 5 files changed, 1 insertion(+), 19 deletions(-)
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 23ae01715e7b..5c9be8594483 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -27,11 +27,6 @@ #include <linux/proc_fs.h> #include <net/ping.h>
-static void ping_v6_destroy(struct sock *sk) -{ - inet6_destroy_sock(sk); -} - /* Compatibility glue so we can support IPv6 when it's compiled as a module */ static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) @@ -175,7 +170,6 @@ struct proto pingv6_prot = { .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, - .destroy = ping_v6_destroy, .connect = ip6_datagram_connect_v6_only, .disconnect = __udp_disconnect, .setsockopt = ipv6_setsockopt, diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 8ce6414edd88..cc42dc14c33a 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -1258,8 +1258,6 @@ static void raw6_destroy(struct sock *sk) lock_sock(sk); ip6_flush_pending_frames(sk); release_sock(sk); - - inet6_destroy_sock(sk); }
static int rawv6_init_sk(struct sock *sk) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 8a257a1bc5b1..63b0784cbdea 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1785,12 +1785,6 @@ static int tcp_v6_init_sock(struct sock *sk) return 0; }
-static void tcp_v6_destroy_sock(struct sock *sk) -{ - tcp_v4_destroy_sock(sk); - inet6_destroy_sock(sk); -} - #ifdef CONFIG_PROC_FS /* Proc filesystem TCPv6 sock list dumping. */ static void get_openreq6(struct seq_file *seq, @@ -1983,7 +1977,7 @@ struct proto tcpv6_prot = { .accept = inet_csk_accept, .ioctl = tcp_ioctl, .init = tcp_v6_init_sock, - .destroy = tcp_v6_destroy_sock, + .destroy = tcp_v4_destroy_sock, .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 49e57cda20c0..4d8a61eac755 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1503,8 +1503,6 @@ void udpv6_destroy_sock(struct sock *sk) if (encap_destroy) encap_destroy(sk); } - - inet6_destroy_sock(sk); }
/* diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index 2ff25c445b82..28a4c25c8164 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -272,8 +272,6 @@ static void l2tp_ip6_destroy_sock(struct sock *sk)
if (tunnel) l2tp_tunnel_delete(tunnel); - - inet6_destroy_sock(sk); }
static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.282 commit b165119e6cc96ceaeea061ecca0750f0052aa2c8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit 1651951ebea54970e0bda60c638fc2eee7a6218f upstream.
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources.
DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and DCCPv6 socket shares it by calling the same init function via dccp_v6_init_sock().
To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we export it and set dccp_v6_sk_destruct() in the init function.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/dccp/dccp.h | 1 + net/dccp/ipv6.c | 15 ++++++++------- net/dccp/proto.c | 8 +++++++- net/ipv6/af_inet6.c | 1 + 4 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index f91e3816806b..eee3c576b30b 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -291,6 +291,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb, int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct dccp_hdr *dh, const unsigned int len);
+void dccp_destruct_common(struct sock *sk); int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); void dccp_destroy_sock(struct sock *sk);
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 2cd3508a3786..9973a808b688 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1001,6 +1001,12 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = { #endif };
+static void dccp_v6_sk_destruct(struct sock *sk) +{ + dccp_destruct_common(sk); + inet6_sock_destruct(sk); +} + /* NOTE: A lot of things set to zero explicitly by call to * sk_alloc() so need not be done here. */ @@ -1013,17 +1019,12 @@ static int dccp_v6_init_sock(struct sock *sk) if (unlikely(!dccp_v6_ctl_sock_initialized)) dccp_v6_ctl_sock_initialized = 1; inet_csk(sk)->icsk_af_ops = &dccp_ipv6_af_ops; + sk->sk_destruct = dccp_v6_sk_destruct; }
return err; }
-static void dccp_v6_destroy_sock(struct sock *sk) -{ - dccp_destroy_sock(sk); - inet6_destroy_sock(sk); -} - static struct timewait_sock_ops dccp6_timewait_sock_ops = { .twsk_obj_size = sizeof(struct dccp6_timewait_sock), }; @@ -1046,7 +1047,7 @@ static struct proto dccp_v6_prot = { .accept = inet_csk_accept, .get_port = inet_csk_get_port, .shutdown = dccp_shutdown, - .destroy = dccp_v6_destroy_sock, + .destroy = dccp_destroy_sock, .orphan_count = &dccp_orphan_count, .max_header = MAX_DCCP_HEADER, .obj_size = sizeof(struct dccp6_sock), diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 43733accf58e..c8eab427fa76 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -174,12 +174,18 @@ const char *dccp_packet_name(const int type)
EXPORT_SYMBOL_GPL(dccp_packet_name);
-static void dccp_sk_destruct(struct sock *sk) +void dccp_destruct_common(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk);
ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); dp->dccps_hc_tx_ccid = NULL; +} +EXPORT_SYMBOL_GPL(dccp_destruct_common); + +static void dccp_sk_destruct(struct sock *sk) +{ + dccp_destruct_common(sk); inet_sock_destruct(sk); }
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 38f82257579d..8a6387d3eb0f 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -112,6 +112,7 @@ void inet6_sock_destruct(struct sock *sk) inet6_cleanup_sock(sk); inet_sock_destruct(sk); } +EXPORT_SYMBOL_GPL(inet6_sock_destruct);
static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern)
From: Johannes Berg johannes.berg@intel.com
stable inclusion from stable-v4.19.283 commit 2702b67f59d455072a08dc40312f9b090d4dec04 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit 675751bb20634f981498c7d66161584080cc061e upstream.
If something was written to the buffer just before destruction, it may be possible (maybe not in a real system, but it did happen in ARCH=um with time-travel) to destroy the ringbuffer before the IRQ work ran, leading this KASAN report (or a crash without KASAN):
BUG: KASAN: slab-use-after-free in irq_work_run_list+0x11a/0x13a Read of size 8 at addr 000000006d640a48 by task swapper/0
CPU: 0 PID: 0 Comm: swapper Tainted: G W O 6.3.0-rc1 #7 Stack: 60c4f20f 0c203d48 41b58ab3 60f224fc 600477fa 60f35687 60c4f20f 601273dd 00000008 6101eb00 6101eab0 615be548 Call Trace: [<60047a58>] show_stack+0x25e/0x282 [<60c609e0>] dump_stack_lvl+0x96/0xfd [<60c50d4c>] print_report+0x1a7/0x5a8 [<603078d3>] kasan_report+0xc1/0xe9 [<60308950>] __asan_report_load8_noabort+0x1b/0x1d [<60232844>] irq_work_run_list+0x11a/0x13a [<602328b4>] irq_work_tick+0x24/0x34 [<6017f9dc>] update_process_times+0x162/0x196 [<6019f335>] tick_sched_handle+0x1a4/0x1c3 [<6019fd9e>] tick_sched_timer+0x79/0x10c [<601812b9>] __hrtimer_run_queues.constprop.0+0x425/0x695 [<60182913>] hrtimer_interrupt+0x16c/0x2c4 [<600486a3>] um_timer+0x164/0x183 [...]
Allocated by task 411: save_stack_trace+0x99/0xb5 stack_trace_save+0x81/0x9b kasan_save_stack+0x2d/0x54 kasan_set_track+0x34/0x3e kasan_save_alloc_info+0x25/0x28 ____kasan_kmalloc+0x8b/0x97 __kasan_kmalloc+0x10/0x12 __kmalloc+0xb2/0xe8 load_elf_phdrs+0xee/0x182 [...]
The buggy address belongs to the object at 000000006d640800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 584 bytes inside of freed 1024-byte region [000000006d640800, 000000006d640c00)
Add the appropriate irq_work_sync() so the work finishes before the buffers are destroyed.
Prior to the commit in the Fixes tag below, there was only a single global IRQ work, so this issue didn't exist.
Link: https://lore.kernel.org/linux-trace-kernel/20230427175920.a76159263122.I8295...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Fixes: 15693458c4bc ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/trace/ring_buffer.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 4610265ad6ca..3f098e32a7df 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1325,6 +1325,8 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) struct list_head *head = cpu_buffer->pages; struct buffer_page *bpage, *tmp;
+ irq_work_sync(&cpu_buffer->irq_work.work); + free_buffer_page(cpu_buffer->reader_page);
if (head) { @@ -1430,6 +1432,8 @@ ring_buffer_free(struct ring_buffer *buffer)
cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
+ irq_work_sync(&buffer->irq_work.work); + for_each_buffer_cpu(buffer, cpu) rb_free_cpu_buffer(buffer->buffers[cpu]);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.283 commit 36a320c3e2fe960eb5eb56ee31c2a07f55d4000a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit ee5675ecdf7a4e713ed21d98a70c2871d6ebed01 ]
syzbot/KCAN reported that po->origdev can be read while another thread is changing its value.
We can avoid this splat by converting this field to an actual bit.
Following patches will convert remaining 1bit fields.
Fixes: 80feaacb8a64 ("[AF_PACKET]: Add option to return orig_dev to userspace.") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/packet/af_packet.c | 10 ++++------ net/packet/diag.c | 2 +- net/packet/internal.h | 22 +++++++++++++++++++++- 3 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 9dc974649a1a..3bac5494fd62 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2104,7 +2104,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, sll = &PACKET_SKB_CB(skb)->sa.ll; sll->sll_hatype = dev->type; sll->sll_pkttype = skb->pkt_type; - if (unlikely(po->origdev)) + if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) sll->sll_ifindex = orig_dev->ifindex; else sll->sll_ifindex = dev->ifindex; @@ -2366,7 +2366,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, sll->sll_hatype = dev->type; sll->sll_protocol = skb->protocol; sll->sll_pkttype = skb->pkt_type; - if (unlikely(po->origdev)) + if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) sll->sll_ifindex = orig_dev->ifindex; else sll->sll_ifindex = dev->ifindex; @@ -3835,9 +3835,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
- lock_sock(sk); - po->origdev = !!val; - release_sock(sk); + packet_sock_flag_set(po, PACKET_SOCK_ORIGDEV, val); return 0; } case PACKET_VNET_HDR: @@ -3974,7 +3972,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, val = po->auxdata; break; case PACKET_ORIGDEV: - val = po->origdev; + val = packet_sock_flag(po, PACKET_SOCK_ORIGDEV); break; case PACKET_VNET_HDR: val = po->has_vnet_hdr; diff --git a/net/packet/diag.c b/net/packet/diag.c index 7ef1c881ae74..bf5928e5df03 100644 --- a/net/packet/diag.c +++ b/net/packet/diag.c @@ -24,7 +24,7 @@ static int pdiag_put_info(const struct packet_sock *po, struct sk_buff *nlskb) pinfo.pdi_flags |= PDI_RUNNING; if (po->auxdata) pinfo.pdi_flags |= PDI_AUXDATA; - if (po->origdev) + if (packet_sock_flag(po, PACKET_SOCK_ORIGDEV)) pinfo.pdi_flags |= PDI_ORIGDEV; if (po->has_vnet_hdr) pinfo.pdi_flags |= PDI_VNETHDR; diff --git a/net/packet/internal.h b/net/packet/internal.h index 907f4cd2a718..23649010e958 100644 --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -115,9 +115,9 @@ struct packet_sock { int copy_thresh; spinlock_t bind_lock; struct mutex pg_vec_lock; + unsigned long flags; unsigned int running; /* bind_lock must be held */ unsigned int auxdata:1, /* writer must hold sock lock */ - origdev:1, has_vnet_hdr:1, tp_loss:1, tp_tx_has_off:1; @@ -143,4 +143,24 @@ static struct packet_sock *pkt_sk(struct sock *sk) return (struct packet_sock *)sk; }
+enum packet_sock_flags { + PACKET_SOCK_ORIGDEV, +}; + +static inline void packet_sock_flag_set(struct packet_sock *po, + enum packet_sock_flags flag, + bool val) +{ + if (val) + set_bit(flag, &po->flags); + else + clear_bit(flag, &po->flags); +} + +static inline bool packet_sock_flag(const struct packet_sock *po, + enum packet_sock_flags flag) +{ + return test_bit(flag, &po->flags); +} + #endif
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.283 commit e70e38104e5ecd6717f46f054592ba2683c5c7c3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit fd53c297aa7b077ae98a3d3d2d3aa278a1686ba6 ]
po->auxdata can be read while another thread is changing its value, potentially raising KCSAN splat.
Convert it to PACKET_SOCK_AUXDATA flag.
Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/packet/af_packet.c | 8 +++----- net/packet/diag.c | 2 +- net/packet/internal.h | 4 ++-- 3 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 3bac5494fd62..5d069715bd02 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3438,7 +3438,7 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, memcpy(msg->msg_name, &PACKET_SKB_CB(skb)->sa, copy_len); }
- if (pkt_sk(sk)->auxdata) { + if (packet_sock_flag(pkt_sk(sk), PACKET_SOCK_AUXDATA)) { struct tpacket_auxdata aux;
aux.tp_status = TP_STATUS_USER; @@ -3821,9 +3821,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
- lock_sock(sk); - po->auxdata = !!val; - release_sock(sk); + packet_sock_flag_set(po, PACKET_SOCK_AUXDATA, val); return 0; } case PACKET_ORIGDEV: @@ -3969,7 +3967,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
break; case PACKET_AUXDATA: - val = po->auxdata; + val = packet_sock_flag(po, PACKET_SOCK_AUXDATA); break; case PACKET_ORIGDEV: val = packet_sock_flag(po, PACKET_SOCK_ORIGDEV); diff --git a/net/packet/diag.c b/net/packet/diag.c index bf5928e5df03..d9f912ad23df 100644 --- a/net/packet/diag.c +++ b/net/packet/diag.c @@ -22,7 +22,7 @@ static int pdiag_put_info(const struct packet_sock *po, struct sk_buff *nlskb) pinfo.pdi_flags = 0; if (po->running) pinfo.pdi_flags |= PDI_RUNNING; - if (po->auxdata) + if (packet_sock_flag(po, PACKET_SOCK_AUXDATA)) pinfo.pdi_flags |= PDI_AUXDATA; if (packet_sock_flag(po, PACKET_SOCK_ORIGDEV)) pinfo.pdi_flags |= PDI_ORIGDEV; diff --git a/net/packet/internal.h b/net/packet/internal.h index 23649010e958..2b2b85dadf8e 100644 --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -117,8 +117,7 @@ struct packet_sock { struct mutex pg_vec_lock; unsigned long flags; unsigned int running; /* bind_lock must be held */ - unsigned int auxdata:1, /* writer must hold sock lock */ - has_vnet_hdr:1, + unsigned int has_vnet_hdr:1, /* writer must hold sock lock */ tp_loss:1, tp_tx_has_off:1; int pressure; @@ -145,6 +144,7 @@ static struct packet_sock *pkt_sk(struct sock *sk)
enum packet_sock_flags { PACKET_SOCK_ORIGDEV, + PACKET_SOCK_AUXDATA, };
static inline void packet_sock_flag_set(struct packet_sock *po,
From: Nicolai Stange nstange@suse.de
stable inclusion from stable-v4.19.283 commit f1943e5703861f89f4376596e3d28d0dd52c5ead category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 559edd47cce4cc407d606b4d7f376822816fd4b8 ]
Now that drbg_prepare_hrng() doesn't do anything but to instantiate a jitterentropy crypto_rng instance, it looks a little odd to have the related error handling at its only caller, drbg_instantiate().
Move the handling of jitterentropy allocation failures from drbg_instantiate() close to the allocation itself in drbg_prepare_hrng().
There is no change in behaviour.
Signed-off-by: Nicolai Stange nstange@suse.de Reviewed-by: Stephan Müller smueller@chronox.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Stable-dep-of: 686cd976b6dd ("crypto: drbg - Only fail when jent is unavailable in FIPS mode") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- crypto/drbg.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/crypto/drbg.c b/crypto/drbg.c index bc52d9562611..9cd9bb9d5d75 100644 --- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1425,6 +1425,14 @@ static int drbg_prepare_hrng(struct drbg_state *drbg) }
drbg->jent = crypto_alloc_rng("jitterentropy_rng", 0, 0); + if (IS_ERR(drbg->jent)) { + const int err = PTR_ERR(drbg->jent); + + drbg->jent = NULL; + if (fips_enabled || err != -ENOENT) + return err; + pr_info("DRBG: Continuing without Jitter RNG\n"); + }
/* * Require frequent reseeds until the seed source is fully @@ -1486,14 +1494,6 @@ static int drbg_instantiate(struct drbg_state *drbg, struct drbg_string *pers, if (ret) goto free_everything;
- if (IS_ERR(drbg->jent)) { - ret = PTR_ERR(drbg->jent); - drbg->jent = NULL; - if (fips_enabled || ret != -ENOENT) - goto free_everything; - pr_info("DRBG: Continuing without Jitter RNG\n"); - } - reseed = false; }
From: Herbert Xu herbert@gondor.apana.org.au
stable inclusion from stable-v4.19.283 commit 1fd247c1ded58f9bc1130fe4b26fb187fa1af55d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 686cd976b6ddedeeb1a1fb09ba53a891d3cc9a03 ]
When jent initialisation fails for any reason other than ENOENT, the entire drbg fails to initialise, even when we're not in FIPS mode. This is wrong because we can still use the kernel RNG when we're not in FIPS mode.
Change it so that it only fails when we are in FIPS mode.
Fixes: 57225e679788 ("crypto: drbg - Use callback API for random readiness") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Reviewed-by: Stephan Mueller smueller@chronox.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- crypto/drbg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/crypto/drbg.c b/crypto/drbg.c index 9cd9bb9d5d75..ca6765eeb990 100644 --- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1429,7 +1429,7 @@ static int drbg_prepare_hrng(struct drbg_state *drbg) const int err = PTR_ERR(drbg->jent);
drbg->jent = NULL; - if (fips_enabled || err != -ENOENT) + if (fips_enabled) return err; pr_info("DRBG: Continuing without Jitter RNG\n"); }
From: Ziyang Xuan william.xuanziyang@huawei.com
stable inclusion from stable-v4.19.283 commit 022ea4374c319690c804706bda9dc42946d1556d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 99e5acae193e369b71217efe6f1dad42f3f18815 ]
Like commit ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()"). icmphdr does not in skb linear region under the scenario of SOCK_RAW socket. Access icmp_hdr(skb)->type directly will trigger the uninit variable access bug.
Use a local variable icmp_type to carry the correct value in different scenarios.
Fixes: 96793b482540 ("[IPV4]: Add ICMPMsgStats MIB (RFC 4293)") Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/ip_output.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 1a6c05908584..f7ee0378782d 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1442,9 +1442,19 @@ struct sk_buff *__ip_make_skb(struct sock *sk, cork->dst = NULL; skb_dst_set(skb, &rt->dst);
- if (iph->protocol == IPPROTO_ICMP) - icmp_out_count(net, ((struct icmphdr *) - skb_transport_header(skb))->type); + if (iph->protocol == IPPROTO_ICMP) { + u8 icmp_type; + + /* For such sockets, transhdrlen is zero when do ip_append_data(), + * so icmphdr does not in skb linear region and can not get icmp_type + * by icmp_hdr(skb)->type. + */ + if (sk->sk_type == SOCK_RAW && !inet_sk(sk)->hdrincl) + icmp_type = fl4->fl4_icmp_type; + else + icmp_type = icmp_hdr(skb)->type; + icmp_out_count(net, icmp_type); + }
ip_cork_release(cork); out:
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.283 commit 1f69c086b20e27763af28145981435423f088268 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 50749f2dd6854a41830996ad302aef2ffaf011d8 ]
syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY skbs. We can reproduce the problem with these sequences:
sk = socket(AF_INET, SOCK_DGRAM, 0) sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE) sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1) sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53)) sk.close()
sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct ubuf_info_msgzc indirectly holds a refcnt of the socket. When the skb is sent, __skb_tstamp_tx() clones it and puts the clone into the socket's error queue with the TX timestamp.
When the original skb is received locally, skb_copy_ubufs() calls skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt. This additional count is decremented while freeing the skb, but struct ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is not called.
The last refcnt is not released unless we retrieve the TX timestamped skb by recvmsg(). Since we clear the error queue in inet_sock_destruct() after the socket's refcnt reaches 0, there is a circular dependency. If we close() the socket holding such skbs, we never call sock_put() and leak the count, sk, and skb.
TCP has the same problem, and commit e0c8bccd40fc ("net: stream: purge sk_error_queue in sk_stream_kill_queues()") tried to fix it by calling skb_queue_purge() during close(). However, there is a small chance that skb queued in a qdisc or device could be put into the error queue after the skb_queue_purge() call.
In __skb_tstamp_tx(), the cloned skb should not have a reference to the ubuf to remove the circular dependency, but skb_clone() does not call skb_copy_ubufs() for zerocopy skb. So, we need to call skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().
[0]: BUG: memory leak unreferenced object 0xffff88800c6d2d00 (size 1152): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................ 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024 [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083 [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline] [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245 [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515 [<00000000b9b11231>] sock_create net/socket.c:1566 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline] [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636 [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline] [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline] [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
BUG: memory leak unreferenced object 0xffff888017633a00 (size 240): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m..... backtrace: [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497 [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline] [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596 [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline] [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370 [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037 [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652 [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253 [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819 [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline] [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734 [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117 [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline] [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline] [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Fixes: b5947e5d1e71 ("udp: msg_zerocopy") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/skbuff.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7f501dff4501..5ae62d743357 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4445,6 +4445,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, skb = alloc_skb(0, GFP_ATOMIC); } else { skb = skb_clone(orig_skb, GFP_ATOMIC); + + if (skb_orphan_frags_rx(skb, GFP_ATOMIC)) + return; } if (!skb) return;
From: Miquel Raynal miquel.raynal@bootlin.com
stable inclusion from stable-v4.19.283 commit d72e2dc104e65827798e116f9e4853b85488d3e8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit b19a4266c52de78496fe40f0b37580a3b762e67d ]
The helper generating an OF based modalias (of_device_get_modalias()) works fine, but due to the use of snprintf() internally it needs a buffer one byte longer than what should be needed just for the entire string (excluding the '\0'). Most users of this helper are sysfs hooks providing the modalias string to users. They all provide a PAGE_SIZE buffer which is way above the number of bytes required to fit the modalias string and hence do not suffer from this issue.
There is another user though, of_device_request_module(), which is only called by drivers/usb/common/ulpi.c. This request module function is faulty, but maybe because in most cases there is an alternative, ULPI driver users have not noticed it.
In this function, of_device_get_modalias() is called twice. The first time without buffer just to get the number of bytes required by the modalias string (excluding the null byte), and a second time, after buffer allocation, to fill the buffer. The allocation asks for an additional byte, in order to store the trailing '\0'. However, the buffer *length* provided to of_device_get_modalias() excludes this extra byte. The internal use of snprintf() with a length that is exactly the number of bytes to be written has the effect of using the last available byte to store a '\0', which then smashes the last character of the modalias string.
Provide the actual size of the buffer to of_device_get_modalias() to fix this issue.
Note: the "str[size - 1] = '\0';" line is not really needed as snprintf will anyway end the string with a null byte, but there is a possibility that this function might be called on a struct device_node without compatible, in this case snprintf() would not be executed. So we keep it just to avoid possible unbounded strings.
Cc: Stephen Boyd sboyd@kernel.org Cc: Peter Chen peter.chen@kernel.org Fixes: 9c829c097f2f ("of: device: Support loading a module with OF based modalias") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Reviewed-by: Rob Herring robh@kernel.org Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20230404172148.82422-2-srinivas.kandagatla@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/of/device.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/of/device.c b/drivers/of/device.c index 258742830e36..566d8af05157 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -258,12 +258,15 @@ int of_device_request_module(struct device *dev) if (size < 0) return size;
- str = kmalloc(size + 1, GFP_KERNEL); + /* Reserve an additional byte for the trailing '\0' */ + size++; + + str = kmalloc(size, GFP_KERNEL); if (!str) return -ENOMEM;
of_device_get_modalias(dev, str, size); - str[size] = '\0'; + str[size - 1] = '\0'; ret = request_module(str); kfree(str);
From: Yang Jihong yangjihong1@huawei.com
stable inclusion from stable-v4.19.283 commit 6805c1fcbe37989dd6fa4d545d04ccbea5e69d1f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 15def34e2635ab7e0e96f1bc32e1b69609f14942 ]
commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling") introduces a change in throttling threshold judgment. Before this, compare hwc->interrupts and max_samples_per_tick, then increase hwc->interrupts by 1, but this commit reverses order of these two behaviors, causing the semantics of max_samples_per_tick to change. In literal sense of "max_samples_per_tick", if hwc->interrupts == max_samples_per_tick, it should not be throttled, therefore, the judgment condition should be changed to "hwc->interrupts > max_samples_per_tick".
In fact, this may cause the hardlockup to fail, The minimum value of max_samples_per_tick may be 1, in this case, the return value of __perf_event_account_interrupt function is 1. As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86 architecture as an example, see x86_pmu_handle_irq).
Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") Signed-off-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20230227023508.102230-1-yangjihong1@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 0dd284509aa7..2f835dcea703 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7865,8 +7865,8 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle) hwc->interrupts = 1; } else { hwc->interrupts++; - if (unlikely(throttle - && hwc->interrupts >= max_samples_per_tick)) { + if (unlikely(throttle && + hwc->interrupts > max_samples_per_tick)) { __this_cpu_inc(perf_throttled_count); tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS); hwc->interrupts = MAX_INTERRUPTS;
From: Frederic Weisbecker frederic@kernel.org
stable inclusion from stable-v4.19.283 commit fe2ae32a7ec9fa64f61993b808f25315b9996b03 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 01b4c39901e087ceebae2733857248de81476bd8 ]
If a nohz_full CPU is looping in the kernel, the scheduling-clock tick might nevertheless remain disabled. In !PREEMPT kernels, this can prevent RCU's attempts to enlist the aid of that CPU's executions of cond_resched(), which can in turn result in an arbitrarily delayed grace period and thus an OOM. RCU therefore needs a way to enable a holdout nohz_full CPU's scheduler-clock interrupt.
This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can pass to tick_dep_set_cpu() and friends to force on the scheduler-clock interrupt for a specified CPU or task. In some cases, rcutorture needs to turn on the scheduler-clock tick, so this commit also exports the relevant symbols to GPL-licensed modules.
Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Stable-dep-of: 58d766824264 ("tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/tick.h | 7 ++++++- include/trace/events/timer.h | 3 ++- kernel/time/tick-sched.c | 7 +++++++ 3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/linux/tick.h b/include/linux/tick.h index 76acb48acdb7..2b150a5686aa 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -108,7 +108,8 @@ enum tick_dep_bits { TICK_DEP_BIT_POSIX_TIMER = 0, TICK_DEP_BIT_PERF_EVENTS = 1, TICK_DEP_BIT_SCHED = 2, - TICK_DEP_BIT_CLOCK_UNSTABLE = 3 + TICK_DEP_BIT_CLOCK_UNSTABLE = 3, + TICK_DEP_BIT_RCU = 4 };
#define TICK_DEP_MASK_NONE 0 @@ -116,6 +117,7 @@ enum tick_dep_bits { #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS) #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED) #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE) +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
#ifdef CONFIG_NO_HZ_COMMON extern bool tick_nohz_enabled; @@ -263,6 +265,9 @@ static inline bool tick_nohz_full_enabled(void) { return false; } static inline bool tick_nohz_full_cpu(int cpu) { return false; } static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
+static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { } +static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { } + static inline void tick_dep_set(enum tick_dep_bits bit) { } static inline void tick_dep_clear(enum tick_dep_bits bit) { } static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit) { } diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index a57e4ee989d6..350b046e7576 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -362,7 +362,8 @@ TRACE_EVENT(itimer_expire, tick_dep_name(POSIX_TIMER) \ tick_dep_name(PERF_EVENTS) \ tick_dep_name(SCHED) \ - tick_dep_name_end(CLOCK_UNSTABLE) + tick_dep_name(CLOCK_UNSTABLE) \ + tick_dep_name_end(RCU)
#undef tick_dep_name #undef tick_dep_mask_name diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 03f940089431..59c28d912b5f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -195,6 +195,11 @@ static bool check_tick_dependency(atomic_t *dep) return true; }
+ if (val & TICK_DEP_MASK_RCU) { + trace_tick_stop(0, TICK_DEP_MASK_RCU); + return true; + } + return false; }
@@ -321,6 +326,7 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) preempt_enable(); } } +EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu);
void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { @@ -328,6 +334,7 @@ void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
atomic_andnot(BIT(bit), &ts->tick_dep_mask); } +EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
/* * Set a per-task tick dependency. Posix CPU timers need this in order to elapse
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.283 commit 0a607752b4ef5d48d81b1a1be6ac010f991fdf63 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
[ Upstream commit 6a341729fb31b4c5df9f74f24b4b1c98410c9b87 ]
syzkaller reported a warning below [0].
We can reproduce it by sending 0-byte data from the (AF_PACKET, SOCK_PACKET) socket via some devices whose dev->hard_header_len is 0.
struct sockaddr_pkt addr = { .spkt_family = AF_PACKET, .spkt_device = "tun0", }; int fd;
fd = socket(AF_PACKET, SOCK_PACKET, 0); sendto(fd, NULL, 0, 0, (struct sockaddr *)&addr, sizeof(addr));
We have a similar fix for the (AF_PACKET, SOCK_RAW) socket as commit dc633700f00f ("net/af_packet: check len when min_header_len equals to 0").
Let's add the same test for the SOCK_PACKET socket.
[0]: skb_assert_len WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 skb_assert_len include/linux/skbuff.h:2552 [inline] WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 __dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159 Modules linked in: CPU: 1 PID: 19945 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_assert_len include/linux/skbuff.h:2552 [inline] RIP: 0010:__dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159 Code: 89 de e8 1d a2 85 fd 84 db 75 21 e8 64 a9 85 fd 48 c7 c6 80 2a 1f 86 48 c7 c7 c0 06 1f 86 c6 05 23 cf 27 04 01 e8 fa ee 56 fd <0f> 0b e8 43 a9 85 fd 0f b6 1d 0f cf 27 04 31 ff 89 de e8 e3 a1 85 RSP: 0018:ffff8880217af6e0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90001133000 RDX: 0000000000040000 RSI: ffffffff81186922 RDI: 0000000000000001 RBP: ffff8880217af8b0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888030045640 R13: ffff8880300456b0 R14: ffff888030045650 R15: ffff888030045718 FS: 00007fc5864da640(0000) GS:ffff88806cd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020005740 CR3: 000000003f856003 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> dev_queue_xmit include/linux/netdevice.h:3085 [inline] packet_sendmsg_spkt+0xc4b/0x1230 net/packet/af_packet.c:2066 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x1b4/0x200 net/socket.c:747 ____sys_sendmsg+0x331/0x970 net/socket.c:2503 ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2557 __sys_sendmmsg+0x18c/0x430 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fc58791de5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007fc5864d9cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007fc58791de5d RDX: 0000000000000001 RSI: 0000000020005740 RDI: 0000000000000004 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc58797e530 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- skb len=0 headroom=16 headlen=0 tailroom=304 mac=(16,0) net=(16,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 dev name=sit0 feat=0x00000006401d7869 sk family=17 type=10 proto=0
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/packet/af_packet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 5d069715bd02..aef6ab487287 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1954,7 +1954,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, goto retry; }
- if (!dev_validate_header(dev, skb->data, len)) { + if (!dev_validate_header(dev, skb->data, len) || !skb->len) { err = -EINVAL; goto out_unlock; }
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.283 commit 3271859c2d2709515db4ad7a6cbdd3617da04405 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
If userspace races tcsetattr() with a write, the drained condition might not be guaranteed by the kernel. There is a race window after checking Tx is empty before tty_set_termios() takes termios_rwsem for write. During that race window, more characters can be queued by a racing writer.
Any ongoing transmission might produce garbage during HW's ->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be preventing such a character corruption. If those flags are set, take tty's write lock to stop any writer before performing the lower layer Tx empty check and wait for the pending characters to be sent (if any).
The initial wait for all-writers-done must be placed outside of tty's write lock to avoid deadlock which makes it impossible to use tty_wait_until_sent(). The write lock is retried if a racing write is detected.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230317113318.31327-2-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org (cherry picked from commit 094fb49a2d0d6827c86d2e0840873e6db0c491d2) Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/tty_io.c | 4 ++-- drivers/tty/tty_ioctl.c | 45 ++++++++++++++++++++++++++++++----------- include/linux/tty.h | 2 ++ 3 files changed, 37 insertions(+), 14 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index e7917270ef61..58c3f7b92b0d 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -875,13 +875,13 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, return i; }
-static void tty_write_unlock(struct tty_struct *tty) +void tty_write_unlock(struct tty_struct *tty) { mutex_unlock(&tty->atomic_write_lock); wake_up_interruptible_poll(&tty->write_wait, EPOLLOUT); }
-static int tty_write_lock(struct tty_struct *tty, int ndelay) +int tty_write_lock(struct tty_struct *tty, int ndelay) { if (!mutex_trylock(&tty->atomic_write_lock)) { if (ndelay) diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c index d99fec44036c..095c8780e210 100644 --- a/drivers/tty/tty_ioctl.c +++ b/drivers/tty/tty_ioctl.c @@ -397,21 +397,42 @@ static int set_termios(struct tty_struct *tty, void __user *arg, int opt) tmp_termios.c_ispeed = tty_termios_input_baud_rate(&tmp_termios); tmp_termios.c_ospeed = tty_termios_baud_rate(&tmp_termios);
- ld = tty_ldisc_ref(tty); + if (opt & (TERMIOS_FLUSH|TERMIOS_WAIT)) { +retry_write_wait: + retval = wait_event_interruptible(tty->write_wait, !tty_chars_in_buffer(tty)); + if (retval < 0) + return retval;
- if (ld != NULL) { - if ((opt & TERMIOS_FLUSH) && ld->ops->flush_buffer) - ld->ops->flush_buffer(tty); - tty_ldisc_deref(ld); - } + if (tty_write_lock(tty, 0) < 0) + goto retry_write_wait;
- if (opt & TERMIOS_WAIT) { - tty_wait_until_sent(tty, 0); - if (signal_pending(current)) - return -ERESTARTSYS; - } + /* Racing writer? */ + if (tty_chars_in_buffer(tty)) { + tty_write_unlock(tty); + goto retry_write_wait; + } + + ld = tty_ldisc_ref(tty); + if (ld != NULL) { + if ((opt & TERMIOS_FLUSH) && ld->ops->flush_buffer) + ld->ops->flush_buffer(tty); + tty_ldisc_deref(ld); + } + + if ((opt & TERMIOS_WAIT) && tty->ops->wait_until_sent) { + tty->ops->wait_until_sent(tty, 0); + if (signal_pending(current)) { + tty_write_unlock(tty); + return -ERESTARTSYS; + } + } + + tty_set_termios(tty, &tmp_termios);
- tty_set_termios(tty, &tmp_termios); + tty_write_unlock(tty); + } else { + tty_set_termios(tty, &tmp_termios); + }
/* FIXME: Arguably if tmp_termios == tty->termios AND the actual requested termios was not tmp_termios then we may diff --git a/include/linux/tty.h b/include/linux/tty.h index 0f37c8dc710f..cd05edd330f5 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -478,6 +478,8 @@ extern void __stop_tty(struct tty_struct *tty); extern void stop_tty(struct tty_struct *tty); extern void __start_tty(struct tty_struct *tty); extern void start_tty(struct tty_struct *tty); +void tty_write_unlock(struct tty_struct *tty); +int tty_write_lock(struct tty_struct *tty, int ndelay); extern int tty_register_driver(struct tty_driver *driver); extern int tty_unregister_driver(struct tty_driver *driver); extern struct device *tty_register_device(struct tty_driver *driver,
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.283 commit 3f9cab5766daa1c1e5b389cd12b6e717ce95852f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
There's a potential race before THRE/TEMT deasserts when DMA Tx is starting up (or the next batch of continuous Tx is being submitted). This can lead to misdetecting Tx empty condition.
It is entirely normal for THRE/TEMT to be set for some time after the DMA Tx had been setup in serial8250_tx_dma(). As Tx side is definitely not empty at that point, it seems incorrect for serial8250_tx_empty() claim Tx is empty.
Fix the race by also checking in serial8250_tx_empty() whether there's DMA Tx active.
Note: This fix only addresses in-kernel race mainly to make using TCSADRAIN/FLUSH robust. Userspace can still cause other races but they seem userspace concurrency control problems.
Fixes: 9ee4b83e51f74 ("serial: 8250: Add support for dmaengine") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230317113318.31327-3-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org (cherry picked from commit 146a37e05d620cef4ad430e5d1c9c077fe6fa76f) Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/serial/8250/8250.h | 12 ++++++++++++ drivers/tty/serial/8250/8250_port.c | 12 +++++++++--- 2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h index ebfb0bd5bef5..8c8aa3b9c298 100644 --- a/drivers/tty/serial/8250/8250.h +++ b/drivers/tty/serial/8250/8250.h @@ -217,6 +217,13 @@ extern int serial8250_rx_dma(struct uart_8250_port *); extern void serial8250_rx_dma_flush(struct uart_8250_port *); extern int serial8250_request_dma(struct uart_8250_port *); extern void serial8250_release_dma(struct uart_8250_port *); + +static inline bool serial8250_tx_dma_running(struct uart_8250_port *p) +{ + struct uart_8250_dma *dma = p->dma; + + return dma && dma->tx_running; +} #else static inline int serial8250_tx_dma(struct uart_8250_port *p) { @@ -232,6 +239,11 @@ static inline int serial8250_request_dma(struct uart_8250_port *p) return -1; } static inline void serial8250_release_dma(struct uart_8250_port *p) { } + +static inline bool serial8250_tx_dma_running(struct uart_8250_port *p) +{ + return false; +} #endif
static inline int ns16550a_goto_highspeed(struct uart_8250_port *up) diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index 238900b1f995..e7d7b2dd406c 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -1962,19 +1962,25 @@ static int serial8250_tx_threshold_handle_irq(struct uart_port *port) static unsigned int serial8250_tx_empty(struct uart_port *port) { struct uart_8250_port *up = up_to_u8250p(port); + unsigned int result = 0; unsigned long flags; unsigned int lsr;
serial8250_rpm_get(up);
spin_lock_irqsave(&port->lock, flags); - lsr = serial_port_in(port, UART_LSR); - up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS; + if (!serial8250_tx_dma_running(up)) { + lsr = serial_port_in(port, UART_LSR); + up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS; + + if ((lsr & BOTH_EMPTY) == BOTH_EMPTY) + result = TIOCSER_TEMT; + } spin_unlock_irqrestore(&port->lock, flags);
serial8250_rpm_put(up);
- return (lsr & BOTH_EMPTY) == BOTH_EMPTY ? TIOCSER_TEMT : 0; + return result; }
unsigned int serial8250_do_get_mctrl(struct uart_port *port)
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
stable inclusion from stable-v4.19.283 commit 09b28fe9ff2fce03efc7d71dc79b58a49b01d0e9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit 85e3e7fbbb720b9897fba9a99659e31cbd1c082e upstream.
[This patch implements subset of original commit 85e3e7fbbb72 ("printk: remove NMI tracking") where commit 1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") depends on, for commit 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") was backported to stable.]
All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough.
There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing:
arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c
For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective.
For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk().
Signed-off-by: John Ogness john.ogness@linutronix.de [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de [penguin-kernel: Copy only printk_deferred_{enter,safe}() definition ] Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Conflicts: include/linux/printk.h Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/printk.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/include/linux/printk.h b/include/linux/printk.h index 895a46da8827..c2e583c0b303 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -570,4 +570,23 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type, } #endif
+#ifdef CONFIG_PRINTK +extern void printk_safe_enter(void); +extern void printk_safe_exit(void); +/* + * The printk_deferred_enter/exit macros are available only as a hack for + * some code paths that need to defer all printk console printing. Interrupts + * must be disabled for the deferred duration. + */ +#define printk_deferred_enter printk_safe_enter +#define printk_deferred_exit printk_safe_exit +#else +static inline void printk_deferred_enter(void) +{ +} +static inline void printk_deferred_exit(void) +{ +} +#endif + #endif
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
stable inclusion from stable-v4.19.283 commit 90c4e02baef3eed8640c9375bd0e75bddd0ec08d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
--------------------------------
commit 1007843a91909a4995ee78a538f62d8665705b66 upstream.
syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even }
This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach.
Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held.
CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } }
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA... Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Acked-by: Michal Hocko mhocko@suse.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: Petr Mladek pmladek@suse.com Cc: David Hildenbrand david@redhat.com Cc: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: John Ogness john.ogness@linutronix.de Cc: Patrick Daly quic_pdaly@quicinc.com Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: Steven Rostedt rostedt@goodmis.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/page_alloc.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2084b912efa8..fc8be4b00125 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5826,7 +5826,21 @@ static void __build_all_zonelists(void *data) int nid; int __maybe_unused cpu; pg_data_t *self = data; + unsigned long flags;
+ /* + * Explicitly disable this CPU's interrupts before taking seqlock + * to prevent any IRQ handler from calling into the page allocator + * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + */ + local_irq_save(flags); + /* + * Explicitly disable this CPU's synchronous printk() before taking + * seqlock to prevent any printk() from trying to hold port->lock, for + * tty_insert_flip_string_and_push_buffer() on other CPU might be + * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. + */ + printk_deferred_enter(); write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA @@ -5861,6 +5875,8 @@ static void __build_all_zonelists(void *data) }
write_sequnlock(&zonelist_update_seq); + printk_deferred_exit(); + local_irq_restore(flags); }
static noinline void __init
From: Corey Minyard cminyard@mvista.com
stable inclusion from stable-v4.19.283 commit 13b3a05b5b03b4fce5a9ea03dc91159dea1f6ef9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 8230831c43a328c2be6d28c65d3f77e14c59986b ]
Rename the SSIF_IDLE() to IS_SSIF_IDLE(), since that is more clear, and rename SSIF_NORMAL to SSIF_IDLE, since that's more accurate.
Cc: stable@vger.kernel.org Signed-off-by: Corey Minyard cminyard@mvista.com Stable-dep-of: 6d2555cde291 ("ipmi: fix SSIF not responding under certain cond.") Signed-off-by: Sasha Levin sashal@kernel.org
conflict: drivers/char/ipmi/ipmi_ssif.c
Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/char/ipmi/ipmi_ssif.c | 46 +++++++++++++++++------------------ 1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index 08edcb516375..0a098c29f247 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -95,7 +95,7 @@ #define SSIF_WATCH_TIMEOUT_JIFFIES msecs_to_jiffies(SSIF_WATCH_TIMEOUT_MSEC)
enum ssif_intf_state { - SSIF_NORMAL, + SSIF_IDLE, SSIF_GETTING_FLAGS, SSIF_GETTING_EVENTS, SSIF_CLEARING_FLAGS, @@ -103,8 +103,8 @@ enum ssif_intf_state { /* FIXME - add watchdog stuff. */ };
-#define SSIF_IDLE(ssif) ((ssif)->ssif_state == SSIF_NORMAL \ - && (ssif)->curr_msg == NULL) +#define IS_SSIF_IDLE(ssif) ((ssif)->ssif_state == SSIF_IDLE \ + && (ssif)->curr_msg == NULL)
/* * Indexes into stats[] in ssif_info below. @@ -350,9 +350,9 @@ static void return_hosed_msg(struct ssif_info *ssif_info,
/* * Must be called with the message lock held. This will release the - * message lock. Note that the caller will check SSIF_IDLE and start a - * new operation, so there is no need to check for new messages to - * start in here. + * message lock. Note that the caller will check IS_SSIF_IDLE and + * start a new operation, so there is no need to check for new + * messages to start in here. */ static void start_clear_flags(struct ssif_info *ssif_info, unsigned long *flags) { @@ -369,7 +369,7 @@ static void start_clear_flags(struct ssif_info *ssif_info, unsigned long *flags)
if (start_send(ssif_info, msg, 3) != 0) { /* Error, just go to normal state. */ - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; } }
@@ -384,7 +384,7 @@ static void start_flag_fetch(struct ssif_info *ssif_info, unsigned long *flags) mb[0] = (IPMI_NETFN_APP_REQUEST << 2); mb[1] = IPMI_GET_MSG_FLAGS_CMD; if (start_send(ssif_info, mb, 2) != 0) - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; }
static void check_start_send(struct ssif_info *ssif_info, unsigned long *flags, @@ -395,7 +395,7 @@ static void check_start_send(struct ssif_info *ssif_info, unsigned long *flags,
flags = ipmi_ssif_lock_cond(ssif_info, &oflags); ssif_info->curr_msg = NULL; - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); ipmi_free_smi_msg(msg); } @@ -409,7 +409,7 @@ static void start_event_fetch(struct ssif_info *ssif_info, unsigned long *flags)
msg = ipmi_alloc_smi_msg(); if (!msg) { - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); return; } @@ -432,7 +432,7 @@ static void start_recv_msg_fetch(struct ssif_info *ssif_info,
msg = ipmi_alloc_smi_msg(); if (!msg) { - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); return; } @@ -450,9 +450,9 @@ static void start_recv_msg_fetch(struct ssif_info *ssif_info,
/* * Must be called with the message lock held. This will release the - * message lock. Note that the caller will check SSIF_IDLE and start a - * new operation, so there is no need to check for new messages to - * start in here. + * message lock. Note that the caller will check IS_SSIF_IDLE and + * start a new operation, so there is no need to check for new + * messages to start in here. */ static void handle_flags(struct ssif_info *ssif_info, unsigned long *flags) { @@ -468,7 +468,7 @@ static void handle_flags(struct ssif_info *ssif_info, unsigned long *flags) /* Events available. */ start_event_fetch(ssif_info, flags); else { - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); } } @@ -580,7 +580,7 @@ static void watch_timeout(struct timer_list *t) if (ssif_info->need_watch) { mod_timer(&ssif_info->watch_timer, jiffies + SSIF_WATCH_TIMEOUT_JIFFIES); - if (SSIF_IDLE(ssif_info)) { + if (IS_SSIF_IDLE(ssif_info)) { start_flag_fetch(ssif_info, flags); /* Releases lock */ return; } @@ -777,7 +777,7 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, }
switch (ssif_info->ssif_state) { - case SSIF_NORMAL: + case SSIF_IDLE: ipmi_ssif_unlock_cond(ssif_info, flags); if (!msg) break; @@ -795,7 +795,7 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, * Error fetching flags, or invalid length, * just give up for now. */ - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); pr_warn(PFX "Error getting flags: %d %d, %x\n", result, len, (len >= 3) ? data[2] : 0); @@ -826,7 +826,7 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, pr_warn(PFX "Invalid response clearing flags: %x %x\n", data[0], data[1]); } - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); break;
@@ -879,7 +879,7 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, }
flags = ipmi_ssif_lock_cond(ssif_info, &oflags); - if (SSIF_IDLE(ssif_info) && !ssif_info->stopping) { + if (IS_SSIF_IDLE(ssif_info) && !ssif_info->stopping) { if (ssif_info->req_events) start_event_fetch(ssif_info, flags); else if (ssif_info->req_flags) @@ -1048,7 +1048,7 @@ static void start_next_msg(struct ssif_info *ssif_info, unsigned long *flags) unsigned long oflags;
restart: - if (!SSIF_IDLE(ssif_info)) { + if (!IS_SSIF_IDLE(ssif_info)) { ipmi_ssif_unlock_cond(ssif_info, flags); return; } @@ -1264,7 +1264,7 @@ static void shutdown_ssif(void *send_info) dev_set_drvdata(&ssif_info->client->dev, NULL);
/* make sure the driver is not looking for flags any more. */ - while (ssif_info->ssif_state != SSIF_NORMAL) + while (ssif_info->ssif_state != SSIF_IDLE) schedule_timeout(1);
ssif_info->stopping = true; @@ -1646,7 +1646,7 @@ static int ssif_probe(struct i2c_client *client, const struct i2c_device_id *id) }
spin_lock_init(&ssif_info->lock); - ssif_info->ssif_state = SSIF_NORMAL; + ssif_info->ssif_state = SSIF_IDLE; timer_setup(&ssif_info->retry_timer, retry_timeout, 0); timer_setup(&ssif_info->watch_timer, watch_timeout, 0);
From: Zhang Yuchen zhangyuchen.lcr@bytedance.com
stable inclusion from stable-v4.19.283 commit ba810999356bffa4627985123c15327c692318e5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 6d2555cde2918409b0331560e66f84a0ad4849c6 ]
The ipmi communication is not restored after a specific version of BMC is upgraded on our server. The ipmi driver does not respond after printing the following log:
ipmi_ssif: Invalid response getting flags: 1c 1
I found that after entering this branch, ssif_info->ssif_state always holds SSIF_GETTING_FLAGS and never return to IDLE.
As a result, the driver cannot be loaded, because the driver status is checked during the unload process and must be IDLE in shutdown_ssif():
while (ssif_info->ssif_state != SSIF_IDLE) schedule_timeout(1);
The process trigger this problem is:
1. One msg timeout and next msg start send, and call ssif_set_need_watch().
2. ssif_set_need_watch()->watch_timeout()->start_flag_fetch() change ssif_state to SSIF_GETTING_FLAGS.
3. In msg_done_handler() ssif_state == SSIF_GETTING_FLAGS, if an error message is received, the second branch does not modify the ssif_state.
4. All retry action need IS_SSIF_IDLE() == True. Include retry action in watch_timeout(), msg_done_handler(). Sending msg does not work either. SSIF_IDLE is also checked in start_next_msg().
5. The only thing that can be triggered in the SSIF driver is watch_timeout(), after destory_user(), this timer will stop too.
So, if enter this branch, the ssif_state will remain SSIF_GETTING_FLAGS and can't send msg, no timer started, can't unload.
We did a comparative test before and after adding this patch, and the result is effective.
Fixes: 259307074bfc ("ipmi: Add SMBus interface driver (SSIF)")
Cc: stable@vger.kernel.org Signed-off-by: Zhang Yuchen zhangyuchen.lcr@bytedance.com Message-Id: 20230412074907.80046-1-zhangyuchen.lcr@bytedance.com Signed-off-by: Corey Minyard minyard@acm.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/char/ipmi/ipmi_ssif.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index 0a098c29f247..3160f6565f33 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -802,9 +802,9 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, } else if (data[0] != (IPMI_NETFN_APP_REQUEST | 1) << 2 || data[1] != IPMI_GET_MSG_FLAGS_CMD) { /* - * Don't abort here, maybe it was a queued - * response to a previous command. + * Recv error response, give up. */ + ssif_info->ssif_state = SSIF_IDLE; ipmi_ssif_unlock_cond(ssif_info, flags); pr_warn(PFX "Invalid response getting flags: %x %x\n", data[0], data[1]);