From: Greg Kroah-Hartman gregkh@linuxfoundation.org
stable inclusion from linux-4.19.228 commit cf94a8a3ff3d7aa06742f1beb5166fa45ba72122
--------------------------------
commit c9d967b2ce40d71e968eb839f36c936b8a9cf1ea upstream.
The buffer handling in pm_show_wakelocks() is tricky, and hopefully correct. Ensure it really is correct by using sysfs_emit_at() which handles all of the tricky string handling logic in a PAGE_SIZE buffer for us automatically as this is a sysfs file being read from.
Reviewed-by: Lee Jones lee.jones@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- kernel/power/wakelock.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/power/wakelock.c b/kernel/power/wakelock.c index 4210152e56f0..aad7c8fcb22f 100644 --- a/kernel/power/wakelock.c +++ b/kernel/power/wakelock.c @@ -39,23 +39,19 @@ ssize_t pm_show_wakelocks(char *buf, bool show_active) { struct rb_node *node; struct wakelock *wl; - char *str = buf; - char *end = buf + PAGE_SIZE; + int len = 0;
mutex_lock(&wakelocks_lock);
for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) { wl = rb_entry(node, struct wakelock, node); if (wl->ws.active == show_active) - str += scnprintf(str, end - str, "%s ", wl->name); + len += sysfs_emit_at(buf, len, "%s ", wl->name); } - if (str > buf) - str--; - - str += scnprintf(str, end - str, "\n"); + len += sysfs_emit_at(buf, len, "\n");
mutex_unlock(&wakelocks_lock); - return (str - buf); + return len; }
#if CONFIG_PM_WAKELOCKS_LIMIT > 0
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from linux-4.19.228 commit a622b1fdfe1c725a6e9cf34d04e92f5c36a99606
--------------------------------
commit 4e1860a3863707e8177329c006d10f9e37e097a8 upstream.
IP fragments do not come with the transport header, hence skip bogus layer 4 checksum updates.
Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields") Reported-and-tested-by: Steffen Weinreich steve@weinreich.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/netfilter/nft_payload.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index b1a9f330a51f..fd87216bc0a9 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -194,6 +194,9 @@ static int nft_payload_l4csum_offset(const struct nft_pktinfo *pkt, struct sk_buff *skb, unsigned int *l4csum_offset) { + if (pkt->xt.fragoff) + return -1; + switch (pkt->tprot) { case IPPROTO_TCP: *l4csum_offset = offsetof(struct tcphdr, check);
From: Robert Hancock robert.hancock@calian.com
stable inclusion from linux-4.19.228 commit ff6cb8fc12edee1a4cfe5f2915b06a4d6a5f9d1f
--------------------------------
commit d06b1cf28297e27127d3da54753a3a01a2fa2f28 upstream.
8250_of supports a reg-offset property which is intended to handle cases where the device registers start at an offset inside the region of memory allocated to the device. The Xilinx 16550 UART, for which this support was initially added, requires this. However, the code did not adjust the overall size of the mapped region accordingly, causing the driver to request an area of memory past the end of the device's allocation. For example, if the UART was allocated an address of 0xb0130000, size of 0x10000 and reg-offset of 0x1000 in the device tree, the region of memory reserved was b0131000-b0140fff, which caused the driver for the region starting at b0140000 to fail to probe.
Fix this by subtracting reg-offset from the mapped region size.
Fixes: b912b5e2cfb3 ([POWERPC] Xilinx: of_serial support for Xilinx uart 16550.) Cc: stable stable@vger.kernel.org Signed-off-by: Robert Hancock robert.hancock@calian.com Link: https://lore.kernel.org/r/20220112194214.881844-1-robert.hancock@calian.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/serial/8250/8250_of.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_of.c b/drivers/tty/serial/8250/8250_of.c index 2488de1c4bc4..dcd2105e2b60 100644 --- a/drivers/tty/serial/8250/8250_of.c +++ b/drivers/tty/serial/8250/8250_of.c @@ -104,8 +104,17 @@ static int of_platform_serial_setup(struct platform_device *ofdev, port->mapsize = resource_size(&resource);
/* Check for shifted address mapping */ - if (of_property_read_u32(np, "reg-offset", &prop) == 0) + if (of_property_read_u32(np, "reg-offset", &prop) == 0) { + if (prop >= port->mapsize) { + dev_warn(&ofdev->dev, "reg-offset %u exceeds region size %pa\n", + prop, &port->mapsize); + ret = -EINVAL; + goto err_unprepare; + } + port->mapbase += prop; + port->mapsize -= prop; + }
port->iotype = UPIO_MEM; if (of_property_read_u32(np, "reg-io-width", &prop) == 0) {
From: Valentin Caron valentin.caron@foss.st.com
stable inclusion from linux-4.19.228 commit a41398a5a1a0dde794c65ef1fa6b2323e2db68bc
--------------------------------
commit 037b91ec7729524107982e36ec4b40f9b174f7a2 upstream.
x_char is ignored by stm32_usart_start_tx() when xmit buffer is empty.
Fix start_tx condition to allow x_char to be sent.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver") Cc: stable stable@vger.kernel.org Signed-off-by: Erwan Le Ray erwan.leray@foss.st.com Signed-off-by: Valentin Caron valentin.caron@foss.st.com Link: https://lore.kernel.org/r/20220111164441.6178-3-valentin.caron@foss.st.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/serial/stm32-usart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c index e8d7a7bb4339..f761e1038c34 100644 --- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -506,7 +506,7 @@ static void stm32_start_tx(struct uart_port *port) { struct circ_buf *xmit = &port->state->xmit;
- if (uart_circ_empty(xmit)) + if (uart_circ_empty(xmit) && !port->x_char) return;
stm32_transmit_chars(port);
From: "daniel.starke@siemens.com" daniel.starke@siemens.com
stable inclusion from linux-4.19.228 commit 47bc9be8d6d94a25a598b4056078925dd6d15851
--------------------------------
commit 8838b2af23caf1ff0610caef2795d6668a013b2d upstream.
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010. See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.a... The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to the newer 27.010 here. Chapter 5.2.7.3 states that DC1 (XON) and DC3 (XOFF) are the control characters defined in ISO/IEC 646. These shall be quoted if seen in the data stream to avoid interpretation as flow control characters.
ISO/IEC 646 refers to the set of ISO standards described as the ISO 7-bit coded character set for information interchange. Its final version is also known as ITU T.50. See https://www.itu.int/rec/T-REC-T.50-199209-I/en
To abide the standard it is needed to quote DC1 and DC3 correctly if these are seen as data bytes and not as control characters. The current implementation already tries to enforce this but fails to catch all defined cases. 3GPP 27.010 chapter 5.2.7.3 clearly states that the most significant bit shall be ignored for DC1 and DC3 handling. The current implementation handles only the case with the most significant bit set 0. Cases in which DC1 and DC3 have the most significant bit set 1 are left unhandled.
This patch fixes this by masking the data bytes with ISO_IEC_646_MASK (only the 7 least significant bits set 1) before comparing them with XON (a.k.a. DC1) and XOFF (a.k.a. DC3) when testing which byte values need quotation via byte stuffing.
Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke daniel.starke@siemens.com Link: https://lore.kernel.org/r/20220120101857.2509-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/n_gsm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 86b7e20ffd7f..7de54c78a7b9 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -313,6 +313,7 @@ static struct tty_driver *gsm_tty_driver; #define GSM1_ESCAPE_BITS 0x20 #define XON 0x11 #define XOFF 0x13 +#define ISO_IEC_646_MASK 0x7F
static const struct tty_port_operations gsm_port_ops;
@@ -531,7 +532,8 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) int olen = 0; while (len--) { if (*input == GSM1_SOF || *input == GSM1_ESCAPE - || *input == XON || *input == XOFF) { + || (*input & ISO_IEC_646_MASK) == XON + || (*input & ISO_IEC_646_MASK) == XOFF) { *output++ = GSM1_ESCAPE; *output++ = *input++ ^ GSM1_ESCAPE_BITS; olen++;
From: Ido Schimmel idosch@nvidia.com
stable inclusion from linux-4.19.228 commit e3e01ed702c7b5f5d306c2ef566d30b84aa57126
--------------------------------
commit 6cee105e7f2ced596373951d9ea08dacc3883c68 upstream.
The warning messages can be invoked from the data path for every packet transmitted through an ip6gre netdev, leading to high CPU utilization.
Fix that by rate limiting the messages.
Fixes: 09c6bbf090ec ("[IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime") Reported-by: Maksym Yaremchuk maksymy@nvidia.com Tested-by: Maksym Yaremchuk maksymy@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Amit Cohen amcohen@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv6/ip6_tunnel.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 35c127c3eee7..b647a4037679 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1005,14 +1005,14 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t,
if (unlikely(!ipv6_chk_addr_and_flags(net, laddr, ldev, false, 0, IFA_F_TENTATIVE))) - pr_warn("%s xmit: Local address not yet configured!\n", - p->name); + pr_warn_ratelimited("%s xmit: Local address not yet configured!\n", + p->name); else if (!(p->flags & IP6_TNL_F_ALLOW_LOCAL_REMOTE) && !ipv6_addr_is_multicast(raddr) && unlikely(ipv6_chk_addr_and_flags(net, raddr, ldev, true, 0, IFA_F_TENTATIVE))) - pr_warn("%s xmit: Routing loop! Remote address found on this node!\n", - p->name); + pr_warn_ratelimited("%s xmit: Routing loop! Remote address found on this node!\n", + p->name); else ret = 1; rcu_read_unlock();
From: Xin Long lucien.xin@gmail.com
stable inclusion from linux-4.19.228 commit a921507836877c4e28084f7bfd35ac2608321268
--------------------------------
commit 2afc3b5a31f9edf3ef0f374f5d70610c79c93a42 upstream.
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
the selftests 'router_broadcast.sh' will fail, as such command
# ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
can't receive the response skb by the PING socket. It's caused by mismatch of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket, as dif is vrf-h1 if dif's master was set to vrf-h1.
This patch is to fix this regression by also checking the sk_bound_dev_if against sdif so that the packets can stil be received even if the socket is not bound to the vrf device but to the real iif.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/ping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 862744c28548..276442443d32 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -225,7 +225,8 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident) continue; }
- if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif) + if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif && + sk->sk_bound_dev_if != inet_sdif(skb)) continue;
sock_hold(sk);
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.228 commit eb04c6d1ec67e30f3aa5ef82112cbfdbddfd4f65
--------------------------------
commit 23f57406b82de51809d5812afd96f210f8b627f3 upstream.
ip_select_ident_segs() has been very conservative about using the connected socket private generator only for packets with IP_DF set, claiming it was needed for some VJ compression implementations.
As mentioned in this referenced document, this can be abused. (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
Before switching to pure random IPID generation and possibly hurt some workloads, lets use the private inet socket generator.
Not only this will remove one vulnerability, this will also improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT
Fixes: 73f156a6e8c1 ("inetpeer: get rid of ip_id_count") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reported-by: Ray Che xijiache@gmail.com Cc: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- include/net/ip.h | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h index 17766586dfa0..621eeb9f142e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -437,19 +437,18 @@ static inline void ip_select_ident_segs(struct net *net, struct sk_buff *skb, { struct iphdr *iph = ip_hdr(skb);
+ /* We had many attacks based on IPID, use the private + * generator as much as we can. + */ + if (sk && inet_sk(sk)->inet_daddr) { + iph->id = htons(inet_sk(sk)->inet_id); + inet_sk(sk)->inet_id += segs; + return; + } if ((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) { - /* This is only to work around buggy Windows95/2000 - * VJ compression implementations. If the ID field - * does not change, they drop every other packet in - * a TCP stream using header compression. - */ - if (sk && inet_sk(sk)->inet_daddr) { - iph->id = htons(inet_sk(sk)->inet_id); - inet_sk(sk)->inet_id += segs; - } else { - iph->id = 0; - } + iph->id = 0; } else { + /* Unfortunately we need the big hammer to get a suitable IPID */ __ip_select_ident(net, iph, segs); } }
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.228 commit 243e898452adcf6055c9b2632becef7c59f02874
--------------------------------
commit aafc2e3285c2d7a79b7ee15221c19fbeca7b1509 upstream.
struct fib6_node's fn_sernum field can be read while other threads change it.
Add READ_ONCE()/WRITE_ONCE() annotations.
Do not change existing smp barriers in fib6_get_cookie_safe() and __fib6_update_sernum_upto_root()
syzbot reported:
BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket
write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1: fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178 fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112 fib6_walk net/ipv6/ip6_fib.c:2160 [inline] fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline] __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256 fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281 rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline] addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230 addrconf_dad_work+0x908/0x1170 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:359 ret_from_fork+0x1f/0x30
read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0: fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline] rt6_get_cookie include/net/ip6_fib.h:306 [inline] ip6_dst_store include/net/ip6_route.h:234 [inline] inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109 inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121 __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline] tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864 tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725 mptcp_push_release net/mptcp/protocol.c:1491 [inline] __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578 mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] kernel_sendmsg+0x97/0xd0 net/socket.c:745 sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086 inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834 kernel_sendpage+0x187/0x200 net/socket.c:3492 sock_sendpage+0x5a/0x70 net/socket.c:1007 pipe_to_sendpage+0x128/0x160 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x207/0x500 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0x94/0xd0 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:936 splice_direct_to_actor+0x345/0x650 fs/splice.c:891 do_splice_direct+0x106/0x190 fs/splice.c:979 do_sendfile+0x675/0xc40 fs/read_write.c:1245 __do_sys_sendfile64 fs/read_write.c:1310 [inline] __se_sys_sendfile64 fs/read_write.c:1296 [inline] __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000026f -> 0x00000271
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
The Fixes tag I chose is probably arbitrary, I do not think we need to backport this patch to older kernels.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- include/net/ip6_fib.h | 2 +- net/ipv6/ip6_fib.c | 23 +++++++++++++---------- net/ipv6/route.c | 2 +- 3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 62c936230cc8..b4fea9dd589d 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -243,7 +243,7 @@ static inline bool fib6_get_cookie_safe(const struct fib6_info *f6i, fn = rcu_dereference(f6i->fib6_node);
if (fn) { - *cookie = fn->fn_sernum; + *cookie = READ_ONCE(fn->fn_sernum); /* pairs with smp_wmb() in fib6_update_sernum_upto_root() */ smp_rmb(); status = true; diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index e0e464b72c1f..5ff67cb8b6ac 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -112,7 +112,7 @@ void fib6_update_sernum(struct net *net, struct fib6_info *f6i) fn = rcu_dereference_protected(f6i->fib6_node, lockdep_is_held(&f6i->fib6_table->tb6_lock)); if (fn) - fn->fn_sernum = fib6_new_sernum(net); + WRITE_ONCE(fn->fn_sernum, fib6_new_sernum(net)); }
/* @@ -544,12 +544,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, spin_unlock_bh(&table->tb6_lock); if (res > 0) { cb->args[4] = 1; - cb->args[5] = w->root->fn_sernum; + cb->args[5] = READ_ONCE(w->root->fn_sernum); } } else { - if (cb->args[5] != w->root->fn_sernum) { + int sernum = READ_ONCE(w->root->fn_sernum); + if (cb->args[5] != sernum) { /* Begin at the root if the tree changed */ - cb->args[5] = w->root->fn_sernum; + cb->args[5] = sernum; w->state = FWS_INIT; w->node = w->root; w->skip = w->count; @@ -1203,7 +1204,7 @@ static void __fib6_update_sernum_upto_root(struct fib6_info *rt, /* paired with smp_rmb() in rt6_get_cookie_safe() */ smp_wmb(); while (fn) { - fn->fn_sernum = sernum; + WRITE_ONCE(fn->fn_sernum, sernum); fn = rcu_dereference_protected(fn->parent, lockdep_is_held(&rt->fib6_table->tb6_lock)); } @@ -1983,8 +1984,8 @@ static int fib6_clean_node(struct fib6_walker *w) };
if (c->sernum != FIB6_NO_SERNUM_CHANGE && - w->node->fn_sernum != c->sernum) - w->node->fn_sernum = c->sernum; + READ_ONCE(w->node->fn_sernum) != c->sernum) + WRITE_ONCE(w->node->fn_sernum, c->sernum);
if (!c->func) { WARN_ON_ONCE(c->sernum == FIB6_NO_SERNUM_CHANGE); @@ -2332,7 +2333,7 @@ static void ipv6_route_seq_setup_walk(struct ipv6_route_iter *iter, iter->w.state = FWS_INIT; iter->w.node = iter->w.root; iter->w.args = iter; - iter->sernum = iter->w.root->fn_sernum; + iter->sernum = READ_ONCE(iter->w.root->fn_sernum); INIT_LIST_HEAD(&iter->w.lh); fib6_walker_link(net, &iter->w); } @@ -2360,8 +2361,10 @@ static struct fib6_table *ipv6_route_seq_next_table(struct fib6_table *tbl,
static void ipv6_route_check_sernum(struct ipv6_route_iter *iter) { - if (iter->sernum != iter->w.root->fn_sernum) { - iter->sernum = iter->w.root->fn_sernum; + int sernum = READ_ONCE(iter->w.root->fn_sernum); + + if (iter->sernum != sernum) { + iter->sernum = sernum; iter->w.state = FWS_INIT; iter->w.node = iter->w.root; WARN_ON(iter->w.skip); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 80dd55c436fa..8efb03cc6f24 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2320,7 +2320,7 @@ static void ip6_link_failure(struct sk_buff *skb) if (from) { fn = rcu_dereference(from->fib6_node); if (fn && (rt->rt6i_flags & RTF_DEFAULT)) - fn->fn_sernum = -1; + WRITE_ONCE(fn->fn_sernum, -1); } } rcu_read_unlock();
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from linux-4.19.228 commit ddeea0002d17984d66bb63f9b66b96fd604756ad
--------------------------------
[ Upstream commit 204975036b34f55237bc44c8a302a88468ef21b5 ]
Creating a hard link is required by POSIX to update the file ctime, so ensure that the file data is synced to disk so that we don't clobber the updated ctime by writing back after creating the hard link.
Fixes: 9f7682728728 ("NFS: Move the delegation return down into nfs4_proc_link()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/dir.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 67ece39c7800..ede35472280d 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2052,6 +2052,8 @@ nfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
trace_nfs_link_enter(inode, dir, dentry); d_drop(dentry); + if (S_ISREG(inode->i_mode)) + nfs_sync_inode(inode); error = NFS_PROTO(dir)->link(inode, dir, &dentry->d_name); if (error == 0) { ihold(inode);
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from linux-4.19.228 commit 8b5c9de150c62f686ff274039b97fe3a2297ada6
--------------------------------
[ Upstream commit 6ff9d99bb88faebf134ca668842349d9718e5464 ]
Renaming a file is required by POSIX to update the file ctime, so ensure that the file data is synced to disk so that we don't clobber the updated ctime by writing back after creating the hard link.
Fixes: f2c2c552f119 ("NFS: Move delegation recall into the NFSv4 callback for rename_setup()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/dir.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index ede35472280d..72e5ae8cc4ff 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2142,6 +2142,8 @@ int nfs_rename(struct inode *old_dir, struct dentry *old_dentry, } }
+ if (S_ISREG(old_inode->i_mode)) + nfs_sync_inode(old_inode); task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry, NULL); if (IS_ERR(task)) { error = PTR_ERR(task);
From: Marek Behún kabel@kernel.org
stable inclusion from linux-4.19.228 commit 67d271760b037ce0806d687ee6057edc8afd4205
--------------------------------
[ Upstream commit cbda1b16687580d5beee38273f6241ae3725960c ]
Commit bafbdd527d56 ("phylib: Add device reset GPIO support") added call to phy_device_reset(phydev) after the put_device() call in phy_detach().
The comment before the put_device() call says that the phydev might go away with put_device().
Fix potential use-after-free by calling phy_device_reset() before put_device().
Fixes: bafbdd527d56 ("phylib: Add device reset GPIO support") Signed-off-by: Marek Behún kabel@kernel.org Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20220119162748.32418-1-kabel@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/net/phy/phy_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 1117355313bc..2de2cbed7c5f 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1154,6 +1154,9 @@ void phy_detach(struct phy_device *phydev) phydev->mdio.dev.driver == &genphy_driver.mdiodrv.driver) device_release_driver(&phydev->mdio.dev);
+ /* Assert the reset signal */ + phy_device_reset(phydev, 1); + /* * The phydev might go away on the put_device() below, so avoid * a use-after-free bug by reading the underlying bus first. @@ -1163,9 +1166,6 @@ void phy_detach(struct phy_device *phydev) put_device(&phydev->mdio.dev); if (ndev_owner != bus->owner) module_put(bus->owner); - - /* Assert the reset signal */ - phy_device_reset(phydev, 1); } EXPORT_SYMBOL(phy_detach);
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.228 commit 3edc526019751c551a05b3ef465d07678fd239de
--------------------------------
[ Upstream commit 153a0d187e767c68733b8e9f46218eb1f41ab902 ]
For some reason, raw_bind() forgot to lock the socket.
BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind
write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0: raw_bind+0x1b0/0x250 net/ipv4/raw.c:739 inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443 __sys_bind+0x14b/0x1b0 net/socket.c:1697 __do_sys_bind net/socket.c:1708 [inline] __se_sys_bind net/socket.c:1706 [inline] __x64_sys_bind+0x3d/0x50 net/socket.c:1706 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1: __ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39 ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576 __sys_connect_file net/socket.c:1900 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1917 __do_sys_connect net/socket.c:1927 [inline] __se_sys_connect net/socket.c:1924 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1924 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0x0003007f
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/raw.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index bc3da7796e8a..a24f94abe14f 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -724,6 +724,7 @@ static int raw_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) int ret = -EINVAL; int chk_addr_ret;
+ lock_sock(sk); if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_in)) goto out;
@@ -743,7 +744,9 @@ static int raw_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) inet->inet_saddr = 0; /* Use device */ sk_dst_reset(sk); ret = 0; -out: return ret; +out: + release_sock(sk); + return ret; }
/*
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.228 commit 08ed3cabc07e9f6035466e33fbdfad1978e05e06
--------------------------------
[ Upstream commit 970a5a3ea86da637471d3cd04d513a0755aba4bf ]
In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state") we took care of some ctl packets sent by TCP.
It turns out we need to use a similar strategy for SYNACK packets.
By default, they carry IP_DF and IPID==0, but there are ways to ask them to use the hashed IP ident generator and thus be used to build off-path attacks. (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
One of this way is to force (before listener is started) echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc
Another way is using forged ICMP ICMP_FRAG_NEEDED with a very small MTU (like 68) to force a false return from ip_dont_fragment()
In this patch, ip_build_and_send_pkt() uses the following heuristics.
1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore can use IP_DF regardless of the listener or route pmtu setting.
2) In case the SYNACK packet is bigger than IPV4_MIN_MTU, we use prandom_u32() generator instead of the IPv4 hashed ident one.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Ray Che xijiache@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Cc: Geoff Alexander alexandg@cs.unm.edu Cc: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/ip_output.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index e63905f7f6f9..1a6c05908584 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -160,12 +160,19 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, iph->daddr = (opt && opt->opt.srr ? opt->opt.faddr : daddr); iph->saddr = saddr; iph->protocol = sk->sk_protocol; - if (ip_dont_fragment(sk, &rt->dst)) { + /* Do not bother generating IPID for small packets (eg SYNACK) */ + if (skb->len <= IPV4_MIN_MTU || ip_dont_fragment(sk, &rt->dst)) { iph->frag_off = htons(IP_DF); iph->id = 0; } else { iph->frag_off = 0; - __ip_select_ident(net, iph, 1); + /* TCP packets here are SYNACK with fat IPv4/TCP options. + * Avoid using the hashed IP ident generator. + */ + if (sk->sk_protocol == IPPROTO_TCP) + iph->id = (__force __be16)prandom_u32(); + else + __ip_select_ident(net, iph, 1); }
if (opt && opt->opt.optlen) {
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.228 commit a01e60a1ec6bef9be471fb7182a33c6d6f124e93
--------------------------------
commit c6f6f2444bdbe0079e41914a35081530d0409963 upstream.
While looking at one unrelated syzbot bug, I found the replay logic in __rtnl_newlink() to potentially trigger use-after-free.
It is better to clear master_dev and m_ops inside the loop, in case we have to replay it.
Fixes: ba7d49b1f0f8 ("rtnetlink: provide api for getting and setting slave info") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20220201012106.216495-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/core/rtnetlink.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 83de32e34bb5..6cd0dbe42baa 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2936,9 +2936,9 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, { struct net *net = sock_net(skb->sk); const struct rtnl_link_ops *ops; - const struct rtnl_link_ops *m_ops = NULL; + const struct rtnl_link_ops *m_ops; struct net_device *dev; - struct net_device *master_dev = NULL; + struct net_device *master_dev; struct ifinfomsg *ifm; char kind[MODULE_NAME_LEN]; char ifname[IFNAMSIZ]; @@ -2973,6 +2973,8 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, dev = NULL; }
+ master_dev = NULL; + m_ops = NULL; if (dev) { master_dev = netdev_master_upper_dev_get(dev); if (master_dev)
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.228 commit 1f0c712832907baef27d13721b3ca2e515d39ecf
--------------------------------
commit e42e70ad6ae2ae511a6143d2e8da929366e58bd9 upstream.
When packet_setsockopt( PACKET_FANOUT_DATA ) reads po->fanout, no lock is held, meaning that another thread can change po->fanout.
Given that po->fanout can only be set once during the socket lifetime (it is only cleared from fanout_release()), we can use READ_ONCE()/WRITE_ONCE() to document the race.
BUG: KCSAN: data-race in packet_setsockopt / packet_setsockopt
write to 0xffff88813ae8e300 of 8 bytes by task 14653 on cpu 0: fanout_add net/packet/af_packet.c:1791 [inline] packet_setsockopt+0x22fe/0x24a0 net/packet/af_packet.c:3931 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88813ae8e300 of 8 bytes by task 14654 on cpu 1: packet_setsockopt+0x691/0x24a0 net/packet/af_packet.c:3935 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000000000000000 -> 0xffff888106f8c000
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 14654 Comm: syz-executor.3 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 47dceb8ecdc1 ("packet: add classic BPF fanout mode") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Reported-by: syzbot syzkaller@googlegroups.com Link: https://lore.kernel.org/r/20220201022358.330621-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/packet/af_packet.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index fbf7d5ef83c7..bacb1bbe53e2 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1729,7 +1729,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) err = -ENOSPC; if (refcount_read(&match->sk_ref) < PACKET_FANOUT_MAX) { __dev_remove_pack(&po->prot_hook); - po->fanout = match; + + /* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */ + WRITE_ONCE(po->fanout, match); + po->rollover = rollover; rollover = NULL; refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1); @@ -3871,7 +3874,8 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv } case PACKET_FANOUT_DATA: { - if (!po->fanout) + /* Paired with the WRITE_ONCE() in fanout_add() */ + if (!READ_ONCE(po->fanout)) return -EINVAL;
return fanout_set_data(po, optval, optlen);
From: "Martin K. Petersen" martin.petersen@oracle.com
stable inclusion from linux-4.19.228 commit d21b6bbc78442bf490e7215ed620ec0878856a0e
--------------------------------
commit b13e0c71856817fca67159b11abac350e41289f5 upstream.
Commit 309a62fa3a9e ("bio-integrity: bio_integrity_advance must update integrity seed") added code to update the integrity seed value when advancing a bio. However, it failed to take into account that the integrity interval might be larger than the 512-byte block layer sector size. This broke bio splitting on PI devices with 4KB logical blocks.
The seed value should be advanced by bio_integrity_intervals() and not the number of sectors.
Cc: Dmitry Monakhov dmonakhov@openvz.org Cc: stable@vger.kernel.org Fixes: 309a62fa3a9e ("bio-integrity: bio_integrity_advance must update integrity seed") Tested-by: Dmitry Ivanov dmitry.ivanov2@hpe.com Reported-by: Alexey Lyashkov alexey.lyashkov@hpe.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Link: https://lore.kernel.org/r/20220204034209.4193-1-martin.petersen@oracle.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- block/bio-integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 0b96220d0efd..2e22a3f7466a 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -399,7 +399,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done) struct blk_integrity *bi = blk_get_integrity(bio->bi_disk); unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
- bip->bip_iter.bi_sector += bytes_done >> 9; + bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9); bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes); } EXPORT_SYMBOL(bio_integrity_advance);
From: Guoqing Jiang guoqing.jiang@linux.dev
stable inclusion from linux-4.19.228 commit a31cb1f0fb6caf46ffe88c41252b6b7a4ee062d9
--------------------------------
commit 99e675d473eb8cf2deac1376a0f840222fc1adcf upstream.
After commit e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated"). For tear down scenario, fn is only freed after fail to allocate ir_domain, though it also should be freed in case dmar_enable_qi returns error.
Besides free fn, irq_domain and ir_msi_domain need to be removed as well if intel_setup_irq_remapping fails to enable queued invalidation.
Improve the rewinding path by add out_free_ir_domain and out_free_fwnode lables per Baolu's suggestion.
Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Suggested-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Guoqing Jiang guoqing.jiang@linux.dev Link: https://lore.kernel.org/r/20220119063640.16864-1-guoqing.jiang@linux.dev Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220128031002.2219155-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/iommu/intel_irq_remapping.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 967450bd421a..276005451914 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -539,7 +539,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu) irq_domain_free_fwnode(fn); if (!iommu->ir_domain) { pr_err("IR%d: failed to allocate irqdomain\n", iommu->seq_id); - goto out_free_bitmap; + goto out_free_fwnode; } iommu->ir_msi_domain = arch_create_remap_msi_irq_domain(iommu->ir_domain, @@ -563,7 +563,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
if (dmar_enable_qi(iommu)) { pr_err("Failed to enable queued invalidation\n"); - goto out_free_bitmap; + goto out_free_ir_domain; } }
@@ -587,6 +587,14 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
return 0;
+out_free_ir_domain: + if (iommu->ir_msi_domain) + irq_domain_remove(iommu->ir_msi_domain); + iommu->ir_msi_domain = NULL; + irq_domain_remove(iommu->ir_domain); + iommu->ir_domain = NULL; +out_free_fwnode: + irq_domain_free_fwnode(fn); out_free_bitmap: kfree(bitmap); out_free_pages:
From: Joerg Roedel jroedel@suse.de
stable inclusion from linux-4.19.228 commit bd2a6021b8778bf671a2f9739241543398a7c853
--------------------------------
commit 9b45a7738eec52bf0f5d8d3d54e822962781c5f2 upstream.
The polling loop for the register change in iommu_ga_log_enable() needs to have a udelay() in it. Otherwise the CPU might be faster than the IOMMU hardware and wrongly trigger the WARN_ON() further down the code stream. Use a 10us for udelay(), has there is some hardware where activation of the GA log can take more than a 100ms.
A future optimization should move the activation check of the GA log to the point where it gets used for the first time. But that is a bigger change and not suitable for a fix.
Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") Signed-off-by: Joerg Roedel jroedel@suse.de Link: https://lore.kernel.org/r/20220204115537.3894-1-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/iommu/amd_iommu_init.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 1e9a5da562f0..29bf91e50d41 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -30,6 +30,7 @@ #include <linux/iommu.h> #include <linux/kmemleak.h> #include <linux/mem_encrypt.h> +#include <linux/iopoll.h> #include <asm/pci-direct.h> #include <asm/iommu.h> #include <asm/gart.h> @@ -769,6 +770,7 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu) status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); if (status & (MMIO_STATUS_GALOG_RUN_MASK)) break; + udelay(10); }
if (i >= LOOP_TIMEOUT)
From: Ritesh Harjani riteshh@linux.ibm.com
stable inclusion from linux-4.19.228 commit e9368c941a26098a199ee357f0370d49ac30b47f
--------------------------------
commit 897026aaa73eb2517dfea8d147f20ddb0b813044 upstream.
While running "./check -I 200 generic/475" it sometimes gives below kernel BUG(). Ideally we should not call ext4_write_inline_data() if ext4_create_inline_data() has failed.
<log snip> [73131.453234] kernel BUG at fs/ext4/inline.c:223!
<code snip> 212 static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc, 213 void *buffer, loff_t pos, unsigned int len) 214 { <...> 223 BUG_ON(!EXT4_I(inode)->i_inline_off); 224 BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
This patch handles the error and prints out a emergency msg saying potential data loss for the given inode (since we couldn't restore the original inline_data due to some previous error).
[ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)
Reported-by: Eric Whitney enwlinux@gmail.com Signed-off-by: Ritesh Harjani riteshh@linux.ibm.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/9f4cd7dfd54fa58ff27270881823d94ddf78dd07.164241699... Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/ext4/inline.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index f36936bcd260..abb71ba0b1e6 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -1126,7 +1126,15 @@ static void ext4_restore_inline_data(handle_t *handle, struct inode *inode, struct ext4_iloc *iloc, void *buf, int inline_size) { - ext4_create_inline_data(handle, inode, inline_size); + int ret; + + ret = ext4_create_inline_data(handle, inode, inline_size); + if (ret) { + ext4_msg(inode->i_sb, KERN_EMERG, + "error restoring inline_data for inode -- potential data loss! (inode %lu, error %d)", + inode->i_ino, ret); + return; + } ext4_write_inline_data(inode, iloc, buf, 0, inline_size); ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); }
From: Xiaoke Wang xkernel.wang@foxmail.com
stable inclusion from linux-4.19.230 commit e9fa71bab4de31155eebb3074e29f49d6748fd1a
--------------------------------
commit 83230351c523b04ff8a029a4bdf97d881ecb96fc upstream.
audit_log_start() returns audit_buffer pointer on success or NULL on error, so it is better to check the return value of it.
Fixes: 3323eec921ef ("integrity: IMA as an integrity service provider") Signed-off-by: Xiaoke Wang xkernel.wang@foxmail.com Cc: stable@vger.kernel.org Reviewed-by: Paul Moore paul@paul-moore.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- security/integrity/integrity_audit.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/security/integrity/integrity_audit.c b/security/integrity/integrity_audit.c index 82c98f7d217e..d03fbdfc972e 100644 --- a/security/integrity/integrity_audit.c +++ b/security/integrity/integrity_audit.c @@ -39,6 +39,8 @@ void integrity_audit_msg(int audit_msgno, struct inode *inode, return;
ab = audit_log_start(audit_context(), GFP_KERNEL, audit_msgno); + if (!ab) + return; audit_log_format(ab, "pid=%d uid=%u auid=%u ses=%u", task_pid_nr(current), from_kuid(&init_user_ns, current_cred()->uid),
From: Stefan Berger stefanb@linux.ibm.com
stable inclusion from linux-4.19.230 commit 2c3beddb39ebe4411497629442f594133f29e0cd
--------------------------------
commit f7333b9572d0559e00352a926c92f29f061b4569 upstream.
The removal of ima_dir currently fails since ima_policy still exists, so remove the ima_policy file before removing the directory.
Fixes: 4af4662fa4a9 ("integrity: IMA policy") Signed-off-by: Stefan Berger stefanb@linux.ibm.com Cc: stable@vger.kernel.org Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- security/integrity/ima/ima_fs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c index 604cdac63d84..38bd565b9da9 100644 --- a/security/integrity/ima/ima_fs.c +++ b/security/integrity/ima/ima_fs.c @@ -497,12 +497,12 @@ int __init ima_fs_init(void)
return 0; out: + securityfs_remove(ima_policy); securityfs_remove(violations); securityfs_remove(runtime_measurements_count); securityfs_remove(ascii_runtime_measurements); securityfs_remove(binary_runtime_measurements); securityfs_remove(ima_symlink); securityfs_remove(ima_dir); - securityfs_remove(ima_policy); return -1; }
From: Roberto Sassu roberto.sassu@huawei.com
stable inclusion from linux-4.19.230 commit 4d9eb5b2ef21c496597f09355d9d6f5508731e98
--------------------------------
commit bb8e52e4906f148c2faf6656b5106cf7233e9301 upstream.
Commit c2426d2ad5027 ("ima: added support for new kernel cmdline parameter ima_template_fmt") introduced an additional check on the ima_template variable to avoid multiple template selection.
Unfortunately, ima_template could be also set by the setup function of the ima_hash= parameter, when it calls ima_template_desc_current(). This causes attempts to choose a new template with ima_template= or with ima_template_fmt=, after ima_hash=, to be ignored.
Achieve the goal of the commit mentioned with the new static variable template_setup_done, so that template selection requests after ima_hash= are not ignored.
Finally, call ima_init_template_list(), if not already done, to initialize the list of templates before lookup_template_desc() is called.
Reported-by: Guo Zihua guozihua@huawei.com Signed-off-by: Roberto Sassu roberto.sassu@huawei.com Cc: stable@vger.kernel.org Fixes: c2426d2ad5027 ("ima: added support for new kernel cmdline parameter ima_template_fmt") Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- security/integrity/ima/ima_template.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/security/integrity/ima/ima_template.c b/security/integrity/ima/ima_template.c index 30db39b23804..4dfdccce497b 100644 --- a/security/integrity/ima/ima_template.c +++ b/security/integrity/ima/ima_template.c @@ -31,6 +31,7 @@ static struct ima_template_desc builtin_templates[] = {
static LIST_HEAD(defined_templates); static DEFINE_SPINLOCK(template_list); +static int template_setup_done;
static struct ima_template_field supported_fields[] = { {.field_id = "d", .field_init = ima_eventdigest_init, @@ -57,10 +58,11 @@ static int __init ima_template_setup(char *str) struct ima_template_desc *template_desc; int template_len = strlen(str);
- if (ima_template) + if (template_setup_done) return 1;
- ima_init_template_list(); + if (!ima_template) + ima_init_template_list();
/* * Verify that a template with the supplied name exists. @@ -84,6 +86,7 @@ static int __init ima_template_setup(char *str) }
ima_template = template_desc; + template_setup_done = 1; return 1; } __setup("ima_template=", ima_template_setup); @@ -92,7 +95,7 @@ static int __init ima_template_fmt_setup(char *str) { int num_templates = ARRAY_SIZE(builtin_templates);
- if (ima_template) + if (template_setup_done) return 1;
if (template_desc_init_fields(str, NULL, NULL) < 0) { @@ -103,6 +106,7 @@ static int __init ima_template_fmt_setup(char *str)
builtin_templates[num_templates - 1].fmt = str; ima_template = builtin_templates + num_templates - 1; + template_setup_done = 1;
return 1; }
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from linux-4.19.230 commit 57e13bdd9634a48c0bdbf988858144350321c0bf
--------------------------------
commit 468d126dab45718feeb728319be20bd869a5eaa7 upstream.
For some long forgotten reason, the nfs_client cl_flags field is initialised in nfs_get_client() instead of being initialised at allocation time. This quirk was harmless until we moved the call to nfs_create_rpc_client().
Fixes: dd99e9f98fbf ("NFSv4: Initialise connection to the server in nfs4_alloc_client()") Cc: stable@vger.kernel.org # 4.8.x Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c index b8adf3467786..7d02dc52209d 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -180,6 +180,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init) INIT_LIST_HEAD(&clp->cl_superblocks); clp->cl_rpcclient = ERR_PTR(-EINVAL);
+ clp->cl_flags = cl_init->init_flags; clp->cl_proto = cl_init->proto; clp->cl_net = get_net(cl_init->net);
@@ -427,7 +428,6 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init) list_add_tail(&new->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); - new->cl_flags = cl_init->init_flags; return rpc_ops->init_client(new, cl_init); }
From: Olga Kornievskaia kolga@netapp.com
stable inclusion from linux-4.19.230 commit 1789f59f1779c99d0756b40036c62a7519f53543
--------------------------------
[ Upstream commit 2c52c8376db7160a1dd8a681c61c9258405ef143 ]
When the bitmask of the attributes doesn't include the security label, don't bother printing it. Since the label might not be null terminated, adjust the printing format accordingly.
Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/nfs4xdr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index feebe80f2092..982fac53e0c6 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -4300,10 +4300,11 @@ static int decode_attr_security_label(struct xdr_stream *xdr, uint32_t *bitmap, } else printk(KERN_WARNING "%s: label too long (%u)!\n", __func__, len); + if (label && label->label) + dprintk("%s: label=%.*s, len=%d, PI=%d, LFS=%d\n", + __func__, label->len, (char *)label->label, + label->len, label->pi, label->lfs); } - if (label && label->label) - dprintk("%s: label=%s, len=%d, PI=%d, LFS=%d\n", __func__, - (char *)label->label, label->len, label->pi, label->lfs); return status;
out_overflow:
From: Xiaoke Wang xkernel.wang@foxmail.com
stable inclusion from linux-4.19.230 commit 2c9587f72ff4b502c2fd15eb3ccff388eae12d07
--------------------------------
[ Upstream commit fbd2057e5329d3502a27491190237b6be52a1cb6 ]
kstrdup() returns NULL when some internal memory errors happen, it is better to check the return value of it so to catch the memory error in time.
Signed-off-by: Xiaoke Wang xkernel.wang@foxmail.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/nfs4client.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 36282e48cff6..60999b261666 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -1276,8 +1276,11 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname, } nfs_put_client(clp);
- if (server->nfs_client->cl_hostname == NULL) + if (server->nfs_client->cl_hostname == NULL) { server->nfs_client->cl_hostname = kstrdup(hostname, GFP_KERNEL); + if (server->nfs_client->cl_hostname == NULL) + return -ENOMEM; + } nfs_server_insert_lists(server);
return nfs_probe_destination(server);
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from linux-4.19.230 commit e6b0f9177c43ff9ad0f1a6dd3639c383373911b7
--------------------------------
[ Upstream commit b05bf5c63b326ce1da84ef42498d8e0e292e694c ]
When decode_devicenotify_args() exits with no entries, we need to ensure that the struct cb_devicenotifyargs is initialised to { 0, NULL } in order to avoid problems in nfs4_callback_devicenotify().
Reported-by: rtm@csail.mit.edu Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/callback.h | 2 +- fs/nfs/callback_proc.c | 2 +- fs/nfs/callback_xdr.c | 18 +++++++++--------- 3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h index 8f34daf85f70..5d5227ce4d91 100644 --- a/fs/nfs/callback.h +++ b/fs/nfs/callback.h @@ -168,7 +168,7 @@ struct cb_devicenotifyitem { };
struct cb_devicenotifyargs { - int ndevs; + uint32_t ndevs; struct cb_devicenotifyitem *devs; };
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c index bcc51f131a49..868d66ed8bcf 100644 --- a/fs/nfs/callback_proc.c +++ b/fs/nfs/callback_proc.c @@ -364,7 +364,7 @@ __be32 nfs4_callback_devicenotify(void *argp, void *resp, struct cb_process_state *cps) { struct cb_devicenotifyargs *args = argp; - int i; + uint32_t i; __be32 res = 0; struct nfs_client *clp = cps->clp; struct nfs_server *server = NULL; diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 57558a8d92e9..76aa1b456c52 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -268,11 +268,9 @@ __be32 decode_devicenotify_args(struct svc_rqst *rqstp, void *argp) { struct cb_devicenotifyargs *args = argp; + uint32_t tmp, n, i; __be32 *p; __be32 status = 0; - u32 tmp; - int n, i; - args->ndevs = 0;
/* Num of device notifications */ p = read_buf(xdr, sizeof(uint32_t)); @@ -281,7 +279,7 @@ __be32 decode_devicenotify_args(struct svc_rqst *rqstp, goto out; } n = ntohl(*p++); - if (n <= 0) + if (n == 0) goto out; if (n > ULONG_MAX / sizeof(*args->devs)) { status = htonl(NFS4ERR_BADXDR); @@ -339,19 +337,21 @@ __be32 decode_devicenotify_args(struct svc_rqst *rqstp, dev->cbd_immediate = 0; }
- args->ndevs++; - dprintk("%s: type %d layout 0x%x immediate %d\n", __func__, dev->cbd_notify_type, dev->cbd_layout_type, dev->cbd_immediate); } + args->ndevs = n; + dprintk("%s: ndevs %d\n", __func__, args->ndevs); + return 0; +err: + kfree(args->devs); out: + args->devs = NULL; + args->ndevs = 0; dprintk("%s: status %d ndevs %d\n", __func__, ntohl(status), args->ndevs); return status; -err: - kfree(args->devs); - goto out; }
static __be32 decode_sessionid(struct xdr_stream *xdr,
From: Olga Kornievskaia kolga@netapp.com
stable inclusion from linux-4.19.230 commit 152f7db416c4bfcc8fc01e55cae60f63489580fa
--------------------------------
[ Upstream commit 90e12a3191040bd3854d3e236c35921e4e92a044 ]
Remove the check for the zero length fs_locations reply in the xdr decoding, and instead check for that in the migration code.
Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/nfs4state.c | 3 +++ fs/nfs/nfs4xdr.c | 2 -- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 5ff516104699..f53f8f17629b 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2064,6 +2064,9 @@ static int nfs4_try_migration(struct nfs_server *server, struct rpc_cred *cred) }
result = -NFS4ERR_NXIO; + if (!locations->nlocations) + goto out; + if (!(locations->fattr.valid & NFS_ATTR_FATTR_V4_LOCATIONS)) { dprintk("<-- %s: No fs_locations data, migration skipped\n", __func__); diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index 982fac53e0c6..e8917fc393a4 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -3753,8 +3753,6 @@ static int decode_attr_fs_locations(struct xdr_stream *xdr, uint32_t *bitmap, st if (unlikely(!p)) goto out_overflow; n = be32_to_cpup(p); - if (n <= 0) - goto out_eio; for (res->nlocations = 0; res->nlocations < n; res->nlocations++) { u32 m; struct nfs4_fs_location *loc;
From: Olga Kornievskaia kolga@netapp.com
stable inclusion from linux-4.19.230 commit 42dc3cf3174ec714732825638947e26923d9ea2a
--------------------------------
[ Upstream commit f5b27cc6761e27ee6387a24df1a99ca77b360fea ]
Make nfs_parse_server_name available outside of nfs4namespace.c.
Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/nfs4_fs.h | 3 ++- fs/nfs/nfs4namespace.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 4f3b5add052b..d22176d87448 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -277,7 +277,8 @@ struct vfsmount *nfs4_submount(struct nfs_server *, struct dentry *, struct nfs_fh *, struct nfs_fattr *); int nfs4_replace_transport(struct nfs_server *server, const struct nfs4_fs_locations *locations); - +size_t nfs_parse_server_name(char *string, size_t len, struct sockaddr *sa, + size_t salen, struct net *net); /* nfs4proc.c */ extern int nfs4_handle_exception(struct nfs_server *, int, struct nfs4_exception *); extern int nfs4_async_handle_error(struct rpc_task *task, diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c index 24f06dcc2b08..936c412be28e 100644 --- a/fs/nfs/nfs4namespace.c +++ b/fs/nfs/nfs4namespace.c @@ -121,8 +121,8 @@ static int nfs4_validate_fspath(struct dentry *dentry, return 0; }
-static size_t nfs_parse_server_name(char *string, size_t len, - struct sockaddr *sa, size_t salen, struct net *net) +size_t nfs_parse_server_name(char *string, size_t len, struct sockaddr *sa, + size_t salen, struct net *net) { ssize_t ret;
From: ZouMingzhe mingzhe.zou@easystack.cn
stable inclusion from linux-4.19.230 commit ded80123b84253ecc6c6cfd1fbbbd99f51c984ec
--------------------------------
[ Upstream commit a861790afaa8b6369eee8a88c5d5d73f5799c0c6 ]
iscsit_tpg_check_network_portal() has nested for_each loops and is supposed to return true when a match is found. However, the tpg loop will still continue after existing the tpg_np loop. If this tpg_np is not the last the match value will be changed.
Break the outer loop after finding a match and make sure the np under each tpg is unique.
Link: https://lore.kernel.org/r/20220111054742.19582-1-mingzhe.zou@easystack.cn Signed-off-by: ZouMingzhe mingzhe.zou@easystack.cn Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/target/iscsi/iscsi_target_tpg.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c index 101d62105c93..f3671ffdf149 100644 --- a/drivers/target/iscsi/iscsi_target_tpg.c +++ b/drivers/target/iscsi/iscsi_target_tpg.c @@ -451,6 +451,9 @@ static bool iscsit_tpg_check_network_portal( break; } spin_unlock(&tpg->tpg_np_lock); + + if (match) + break; } spin_unlock(&tiqn->tiqn_tpg_lock);
From: Daniel Borkmann daniel@iogearbox.net
stable inclusion from linux-4.19.230 commit 07e7f7cc619d15645e45d04b1c99550c6d292e9c
--------------------------------
commit 08389d888287c3823f80b0216766b71e17f0aba5 upstream.
Add a kconfig knob which allows for unprivileged bpf to be disabled by default. If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
This still allows a transition of 2 -> {0,1} through an admin. Similarly, this also still keeps 1 -> {1} behavior intact, so that once set to permanently disabled, it cannot be undone aside from a reboot.
We've also added extra2 with max of 2 for the procfs handler, so that an admin still has a chance to toggle between 0 <-> 2.
Either way, as an additional alternative, applications can make use of CAP_BPF that we added a while ago.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765... [fllinden@amazon.com: backported to 4.19] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- Documentation/sysctl/kernel.txt | 21 +++++++++++++++++++++ init/Kconfig | 10 ++++++++++ kernel/bpf/syscall.c | 3 ++- kernel/sysctl.c | 29 +++++++++++++++++++++++++---- 4 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 7e33d039ea17..1fa03508ff5a 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -95,6 +95,7 @@ show up in /proc/sys/kernel: - sysctl_writes_strict - tainted - threads-max +- unprivileged_bpf_disabled - unknown_nmi_panic - watchdog - watchdog_thresh @@ -1072,6 +1073,26 @@ available RAM pages threads-max is reduced accordingly.
==============================================================
+unprivileged_bpf_disabled: + +Writing 1 to this entry will disable unprivileged calls to bpf(); +once disabled, calling bpf() without CAP_SYS_ADMIN will return +-EPERM. Once set to 1, this can't be cleared from the running kernel +anymore. + +Writing 2 to this entry will also disable unprivileged calls to bpf(), +however, an admin can still change this setting later on, if needed, by +writing 0 or 1 to this entry. + +If BPF_UNPRIV_DEFAULT_OFF is enabled in the kernel config, then this +entry will default to 2 instead of 0. + + 0 - Unprivileged calls to bpf() are enabled + 1 - Unprivileged calls to bpf() are disabled without recovery + 2 - Unprivileged calls to bpf() are disabled + +============================================================== + unknown_nmi_panic:
The value in this file affects behavior of handling NMI. When the diff --git a/init/Kconfig b/init/Kconfig index 1a0b15c5a82b..cef0e5fdc901 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1539,6 +1539,16 @@ config BPF_JIT_ALWAYS_ON Enables BPF JIT and removes BPF interpreter to avoid speculative execution of BPF instructions by the interpreter
+config BPF_UNPRIV_DEFAULT_OFF + bool "Disable unprivileged BPF by default" + depends on BPF_SYSCALL + help + Disables unprivileged BPF by default by setting the corresponding + /proc/sys/kernel/unprivileged_bpf_disabled knob to 2. An admin can + still reenable it by setting it to 0 later on, or permanently + disable it by setting it to 1 (from which no other transition to + 0 is possible anymore). + config USERFAULTFD bool "Enable userfaultfd() system call" select ANON_INODES diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 30253fb4254e..b11852ea1822 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,7 +48,8 @@ static DEFINE_SPINLOCK(prog_idr_lock); static DEFINE_IDR(map_idr); static DEFINE_SPINLOCK(map_idr_lock);
-int sysctl_unprivileged_bpf_disabled __read_mostly; +int sysctl_unprivileged_bpf_disabled __read_mostly = + IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0;
static const struct bpf_map_ops * const bpf_map_types[] = { #define BPF_PROG_TYPE(_id, _ops) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 5445dead9911..d2eebbcdff52 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -257,6 +257,28 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
#endif
+#ifdef CONFIG_BPF_SYSCALL +static int bpf_unpriv_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int ret, unpriv_enable = *(int *)table->data; + bool locked_state = unpriv_enable == 1; + struct ctl_table tmp = *table; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + tmp.data = &unpriv_enable; + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); + if (write && !ret) { + if (locked_state && unpriv_enable != 1) + return -EPERM; + *(int *)table->data = unpriv_enable; + } + return ret; +} +#endif + static struct ctl_table kern_table[]; static struct ctl_table vm_table[]; static struct ctl_table fs_table[]; @@ -1247,10 +1269,9 @@ static struct ctl_table kern_table[] = { .data = &sysctl_unprivileged_bpf_disabled, .maxlen = sizeof(sysctl_unprivileged_bpf_disabled), .mode = 0644, - /* only handle a transition from default "0" to "1" */ - .proc_handler = proc_dointvec_minmax, - .extra1 = &one, - .extra2 = &one, + .proc_handler = bpf_unpriv_handler, + .extra1 = &zero, + .extra2 = &two, }, #endif #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
From: Mahesh Bandewar maheshb@google.com
stable inclusion from linux-4.19.230 commit c92b23d934f008aea3ec3239d951f6c4cc0c4422
--------------------------------
[ Upstream commit 23de0d7b6f0e3f9a6283a882594c479949da1120 ]
When 803.2ad mode enables a participating port, it should update the slave-array. I have observed that the member links are participating and are part of the active aggregator while the traffic is egressing via only one member link (in a case where two links are participating). Via kprobes I discovered that slave-arr has only one link added while the other participating link wasn't part of the slave-arr.
I couldn't see what caused that situation but the simple code-walk through provided me hints that the enable_port wasn't always associated with the slave-array update.
Fixes: ee6377147409 ("bonding: Simplify the xmit function for modes that use xmit_hash") Signed-off-by: Mahesh Bandewar maheshb@google.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Link: https://lore.kernel.org/r/20220207222901.1795287-1-maheshb@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/net/bonding/bond_3ad.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 93dfcef8afc4..035923876c61 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1012,8 +1012,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr) if (port->aggregator && port->aggregator->is_active && !__port_is_enabled(port)) { - __enable_port(port); + *update_slave_arr = true; } } break; @@ -1760,6 +1760,7 @@ static void ad_agg_selection_logic(struct aggregator *agg, port = port->next_port_in_aggregator) { __enable_port(port); } + *update_slave_arr = true; } }
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.230 commit 12b6703e9546902c56b4b9048b893ad49d62bdd4
--------------------------------
[ Upstream commit 5611a00697c8ecc5aad04392bea629e9d6a20463 ]
ip[6]mr_free_table() can only be called under RTNL lock.
RTNL: assertion failed at net/core/dev.c (10367) WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367 Modules linked in: CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367 Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 <0f> 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4 R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000 FS: 00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509 ip6mr_free_table net/ipv6/ip6mr.c:389 [inline] ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline] ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline] ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298 ops_init+0xaf/0x470 net/core/net_namespace.c:140 setup_net+0x54f/0xbb0 net/core/net_namespace.c:331 copy_net_ns+0x318/0x760 net/core/net_namespace.c:475 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 copy_namespaces+0x391/0x450 kernel/nsproxy.c:178 copy_process+0x2e0c/0x7300 kernel/fork.c:2167 kernel_clone+0xe7/0xab0 kernel/fork.c:2555 __do_sys_clone+0xc8/0x110 kernel/fork.c:2672 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4ab89f9059 Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f. RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038 RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059 RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000 RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300 R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000 </TASK>
Fixes: f243e5a7859a ("ipmr,ip6mr: call ip6mr_free_table() on failure path") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Cong Wang cong.wang@bytedance.com Reported-by: syzbot syzkaller@googlegroups.com Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/ipmr.c | 2 ++ net/ipv6/ip6mr.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index d235478d9ca3..2085af224a41 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -265,7 +265,9 @@ static int __net_init ipmr_rules_init(struct net *net) return 0;
err2: + rtnl_lock(); ipmr_free_table(mrt); + rtnl_unlock(); err1: fib_rules_unregister(ops); return err; diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 9a8923e6a8fa..d392a72cd784 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -251,7 +251,9 @@ static int __net_init ip6mr_rules_init(struct net *net) return 0;
err2: + rtnl_lock(); ip6mr_free_table(mrt); + rtnl_unlock(); err1: fib_rules_unregister(ops); return err;
From: Antoine Tenart atenart@kernel.org
stable inclusion from linux-4.19.230 commit 040e92ea3d7d6f27c1b71d6502e35c54a0939cb7
--------------------------------
[ Upstream commit cfc56f85e72f5b9c5c5be26dc2b16518d36a7868 ]
When uncloning an skb dst and its associated metadata a new dst+metadata is allocated and the tunnel information from the old metadata is copied over there.
The issue is the tunnel metadata has references to cached dst, which are copied along the way. When a dst+metadata refcount drops to 0 the metadata is freed including the cached dst entries. As they are also referenced in the initial dst+metadata, this ends up in UaFs.
In practice the above did not happen because of another issue, the dst+metadata was never freed because its refcount never dropped to 0 (this will be fixed in a subsequent patch).
Fix this by initializing the dst cache after copying the tunnel information from the old metadata to also unshare the dst cache.
Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") Cc: Paolo Abeni pabeni@redhat.com Reported-by: Vlad Buslov vladbu@nvidia.com Tested-by: Vlad Buslov vladbu@nvidia.com Signed-off-by: Antoine Tenart atenart@kernel.org Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- include/net/dst_metadata.h | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 14efa0ded75d..b997e0c1e362 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -123,6 +123,19 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, sizeof(struct ip_tunnel_info) + md_size); +#ifdef CONFIG_DST_CACHE + /* Unclone the dst cache if there is one */ + if (new_md->u.tun_info.dst_cache.cache) { + int ret; + + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); + if (ret) { + metadata_dst_free(new_md); + return ERR_PTR(ret); + } + } +#endif + skb_dst_drop(skb); dst_hold(&new_md->dst); skb_dst_set(skb, &new_md->dst);
From: Antoine Tenart atenart@kernel.org
stable inclusion from linux-4.19.230 commit 0be943916d781df2b652793bb2d3ae4f9624c10a
--------------------------------
[ Upstream commit 9eeabdf17fa0ab75381045c867c370f4cc75a613 ]
When uncloning an skb dst and its associated metadata, a new dst+metadata is allocated and later replaces the old one in the skb. This is helpful to have a non-shared dst+metadata attached to a specific skb.
The issue is the uncloned dst+metadata is initialized with a refcount of 1, which is increased to 2 before attaching it to the skb. When tun_dst_unclone returns, the dst+metadata is only referenced from a single place (the skb) while its refcount is 2. Its refcount will never drop to 0 (when the skb is consumed), leading to a memory leak.
Fix this by removing the call to dst_hold in tun_dst_unclone, as the dst+metadata refcount is already 1.
Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") Cc: Pravin B Shelar pshelar@ovn.org Reported-by: Vlad Buslov vladbu@nvidia.com Tested-by: Vlad Buslov vladbu@nvidia.com Signed-off-by: Antoine Tenart atenart@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- include/net/dst_metadata.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index b997e0c1e362..adab27ba1ecb 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -137,7 +137,6 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) #endif
skb_dst_drop(skb); - dst_hold(&new_md->dst); skb_dst_set(skb, &new_md->dst); return new_md; }
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.230 commit 8f0ea3777590c16222d4251a49e31dd376ff5ac1
--------------------------------
[ Upstream commit 68468d8c4cd4222a4ca1f185ab5a1c14480d078c ]
veth being NETIF_F_LLTX enabled, we need to be more careful whenever we read/write rq->rx_notify_masked.
BUG: KCSAN: data-race in veth_xmit / veth_xmit
write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0: __veth_xdp_flush drivers/net/veth.c:269 [inline] veth_xmit+0x307/0x470 drivers/net/veth.c:350 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53 NF_HOOK include/linux/netfilter.h:307 [inline] br_forward_finish net/bridge/br_forward.c:66 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242 br_dev_xmit+0x8b6/0x960 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 neigh_hh_output include/net/neighbour.h:525 [inline] neigh_output include/net/neighbour.h:539 [inline] ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1: __veth_xdp_flush drivers/net/veth.c:268 [inline] veth_xmit+0x2d6/0x470 drivers/net/veth.c:350 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53 NF_HOOK include/linux/netfilter.h:307 [inline] br_forward_finish net/bridge/br_forward.c:66 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242 br_dev_xmit+0x8b6/0x960 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 neigh_hh_output include/net/neighbour.h:525 [inline] neigh_output include/net/neighbour.h:539 [inline] ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23563 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00064-gc36c04c2e132 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 948d4f214fde ("veth: Add driver XDP") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/net/veth.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 749faa6fcd82..ad098169d23f 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -152,9 +152,10 @@ static void __veth_xdp_flush(struct veth_rq *rq) { /* Write ptr_ring before reading rx_notify_masked */ smp_mb(); - if (!rq->rx_notify_masked) { - rq->rx_notify_masked = true; - napi_schedule(&rq->xdp_napi); + if (!READ_ONCE(rq->rx_notify_masked) && + napi_schedule_prep(&rq->xdp_napi)) { + WRITE_ONCE(rq->rx_notify_masked, true); + __napi_schedule(&rq->xdp_napi); } }
@@ -623,8 +624,10 @@ static int veth_poll(struct napi_struct *napi, int budget) /* Write rx_notify_masked before reading ptr_ring */ smp_store_mb(rq->rx_notify_masked, false); if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) { - rq->rx_notify_masked = true; - napi_schedule(&rq->xdp_napi); + if (napi_schedule_prep(&rq->xdp_napi)) { + WRITE_ONCE(rq->rx_notify_masked, true); + __napi_schedule(&rq->xdp_napi); + } } }
From: TATSUKAWA KOSUKE (立川 江介) tatsu-ab1@nec.com
stable inclusion from linux-4.19.230 commit 031ec2d7fabb5c01601490671666c55cf92e1b98
--------------------------------
commit c816b2e65b0e86b95011418cad334f0524fc33b8 upstream.
The poll man page says POLLRDNORM is equivalent to POLLIN when used as an event. $ man poll <snip> POLLRDNORM Equivalent to POLLIN.
However, in n_tty driver, POLLRDNORM does not return until timeout even if there is terminal input, whereas POLLIN returns.
The following test program works until kernel-3.17, but the test stops in poll() after commit 57087d515441 ("tty: Fix spurious poll() wakeups").
[Steps to run test program] $ cc -o test-pollrdnorm test-pollrdnorm.c $ ./test-pollrdnorm foo <-- Type in something from the terminal followed by [RET]. The string should be echoed back.
------------------------< test-pollrdnorm.c >------------------------ #include <stdio.h> #include <errno.h> #include <poll.h> #include <unistd.h>
void main(void) { int n; unsigned char buf[8]; struct pollfd fds[1] = {{ 0, POLLRDNORM, 0 }};
n = poll(fds, 1, -1); if (n < 0) perror("poll"); n = read(0, buf, 8); if (n < 0) perror("read"); if (n > 0) write(1, buf, n); } ------------------------------------------------------------------------
The attached patch fixes this problem. Many calls to wake_up_interruptible_poll() in the kernel source code already specify "POLLIN | POLLRDNORM".
Fixes: 57087d515441 ("tty: Fix spurious poll() wakeups") Cc: stable@vger.kernel.org Signed-off-by: Kosuke Tatsukawa tatsu-ab1@nec.com Link: https://lore.kernel.org/r/TYCPR01MB81901C0F932203D30E452B3EA5209@TYCPR01MB81... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/n_tty.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 9369f5d2fdf5..a8d7ab7ee057 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1375,7 +1375,7 @@ n_tty_receive_char_special(struct tty_struct *tty, unsigned char c) put_tty_queue(c, ldata); smp_store_release(&ldata->canon_head, ldata->read_head); kill_fasync(&tty->fasync, SIGIO, POLL_IN); - wake_up_interruptible_poll(&tty->read_wait, EPOLLIN); + wake_up_interruptible_poll(&tty->read_wait, EPOLLIN | EPOLLRDNORM); return 0; } } @@ -1656,7 +1656,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp,
if (read_cnt(ldata)) { kill_fasync(&tty->fasync, SIGIO, POLL_IN); - wake_up_interruptible_poll(&tty->read_wait, EPOLLIN); + wake_up_interruptible_poll(&tty->read_wait, EPOLLIN | EPOLLRDNORM); } }
From: Kees Cook keescook@chromium.org
stable inclusion from linux-4.19.230 commit 255264d81da6edaf4cd4fab836d1ef3ba09af6aa
--------------------------------
commit 495ac3069a6235bfdf516812a2a9b256671bbdf9 upstream.
If seccomp tries to kill a process, it should never see that process again. To enforce this proactively, switch the mode to something impossible. If encountered: WARN, reject all syscalls, and attempt to kill the process again even harder.
Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Fixes: 8112c4f140fa ("seccomp: remove 2-phase API") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- kernel/seccomp.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c index fd023ac24e10..369b63ac7cf9 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -28,6 +28,9 @@ #include <linux/syscalls.h> #include <linux/sysctl.h>
+/* Not exposed in headers: strictly internal use only. */ +#define SECCOMP_MODE_DEAD (SECCOMP_MODE_FILTER + 1) + #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER #include <asm/syscall.h> #endif @@ -629,6 +632,7 @@ static void __secure_computing_strict(int this_syscall) #ifdef SECCOMP_DEBUG dump_stack(); #endif + current->seccomp.mode = SECCOMP_MODE_DEAD; seccomp_log(this_syscall, SIGKILL, SECCOMP_RET_KILL_THREAD, true); do_exit(SIGKILL); } @@ -743,6 +747,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, case SECCOMP_RET_KILL_THREAD: case SECCOMP_RET_KILL_PROCESS: default: + current->seccomp.mode = SECCOMP_MODE_DEAD; seccomp_log(this_syscall, SIGSYS, action, true); /* Dump core only if this is the last remaining thread. */ if (action == SECCOMP_RET_KILL_PROCESS || @@ -793,6 +798,11 @@ int __secure_computing(const struct seccomp_data *sd) return 0; case SECCOMP_MODE_FILTER: return __seccomp_filter(this_syscall, sd, false); + /* Surviving SECCOMP_RET_KILL_* must be proactively impossible. */ + case SECCOMP_MODE_DEAD: + WARN_ON_ONCE(1); + do_exit(SIGKILL); + return -1; default: BUG(); }
From: Song Liu song@kernel.org
stable inclusion from linux-4.19.230 commit 30d9f3cbe47e1018ddc8069ac5b5c9e66fbdf727
--------------------------------
commit 5f4e5ce638e6a490b976ade4a40017b40abb2da0 upstream.
There's list corruption on cgrp_cpuctx_list. This happens on the following path:
perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list) cpu_ctx_sched_in ctx_sched_in ctx_pinned_sched_in merge_sched_in perf_cgroup_event_disable: remove the event from the list
Use list_for_each_entry_safe() to allow removing an entry during iteration.
Fixes: 058fe1c0440e ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events") Signed-off-by: Song Liu song@kernel.org Reviewed-by: Rik van Riel riel@surriel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 00b22c820d1f..deba52307349 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -798,7 +798,7 @@ static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list); */ static void perf_cgroup_switch(struct task_struct *task, int mode) { - struct perf_cpu_context *cpuctx; + struct perf_cpu_context *cpuctx, *tmp; struct list_head *list; unsigned long flags;
@@ -809,7 +809,7 @@ static void perf_cgroup_switch(struct task_struct *task, int mode) local_irq_save(flags);
list = this_cpu_ptr(&cgrp_cpuctx_list); - list_for_each_entry(cpuctx, list, cgrp_cpuctx_entry) { + list_for_each_entry_safe(cpuctx, tmp, list, cgrp_cpuctx_entry) { WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0);
perf_ctx_lock(cpuctx, cpuctx->task_ctx);
From: Randy Dunlap rdunlap@infradead.org
stable inclusion from linux-4.19.231 commit 592b1b5ad3ee6cce1381111cfd69c6dc868050a3
--------------------------------
commit 6e8793674bb0d1135ca0e5c9f7e16fecbf815926 upstream.
There is a build error when using a kernel .config file from 'kernel test robot' for a different build problem:
hppa64-linux-ld: drivers/tty/serial/8250/8250_gsc.o: in function `.LC3': (.data.rel.ro+0x18): undefined reference to `iosapic_serial_irq'
when: CONFIG_GSC=y CONFIG_SERIO_GSCPS2=y CONFIG_SERIAL_8250_GSC=y CONFIG_PCI is not set and hence PCI_LBA is not set. IOSAPIC depends on PCI_LBA, so IOSAPIC is not set/enabled.
Make the use of iosapic_serial_irq() conditional to fix the build error.
Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: "James E.J. Bottomley" James.Bottomley@HansenPartnership.com Cc: Helge Deller deller@gmx.de Cc: linux-parisc@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-serial@vger.kernel.org Cc: Jiri Slaby jirislaby@kernel.org Cc: Johan Hovold johan@kernel.org Suggested-by: Helge Deller deller@gmx.de Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/serial/8250/8250_gsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_gsc.c b/drivers/tty/serial/8250/8250_gsc.c index 0809ae2aa9b1..51cc985216ff 100644 --- a/drivers/tty/serial/8250/8250_gsc.c +++ b/drivers/tty/serial/8250/8250_gsc.c @@ -26,7 +26,7 @@ static int __init serial_init_chip(struct parisc_device *dev) unsigned long address; int err;
-#ifdef CONFIG_64BIT +#if defined(CONFIG_64BIT) && defined(CONFIG_IOSAPIC) if (!dev->irq && (dev->id.sversion == 0xad)) dev->irq = iosapic_serial_irq(dev); #endif
From: "Darrick J. Wong" djwong@kernel.org
stable inclusion from linux-4.19.231 commit d5c33270b8b2c274eb8df7b92d166166102528c6
--------------------------------
[ Upstream commit 2719c7160dcfaae1f73a1c0c210ad3281c19022e ]
If we fail to synchronize the filesystem while preparing to freeze the fs, abort the freeze.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/super.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/super.c b/fs/super.c index 9fb4553c46e6..8dc26e23f1a3 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1404,11 +1404,9 @@ static void lockdep_sb_freeze_acquire(struct super_block *sb) percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_); }
-static void sb_freeze_unlock(struct super_block *sb) +static void sb_freeze_unlock(struct super_block *sb, int level) { - int level; - - for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--) + for (level--; level >= 0; level--) percpu_up_write(sb->s_writers.rw_sem + level); }
@@ -1479,7 +1477,14 @@ int freeze_super(struct super_block *sb) sb_wait_write(sb, SB_FREEZE_PAGEFAULT);
/* All writers are done so after syncing there won't be dirty data */ - sync_filesystem(sb); + ret = sync_filesystem(sb); + if (ret) { + sb->s_writers.frozen = SB_UNFROZEN; + sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT); + wake_up(&sb->s_writers.wait_unfrozen); + deactivate_locked_super(sb); + return ret; + }
/* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; @@ -1491,7 +1496,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - sb_freeze_unlock(sb); + sb_freeze_unlock(sb, SB_FREEZE_FS); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); return ret; @@ -1542,7 +1547,7 @@ static int thaw_super_locked(struct super_block *sb) }
sb->s_writers.frozen = SB_UNFROZEN; - sb_freeze_unlock(sb); + sb_freeze_unlock(sb, SB_FREEZE_FS); out: wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb);
From: "Darrick J. Wong" djwong@kernel.org
stable inclusion from linux-4.19.231 commit a1a41571f06e2b66229738bd0c92c1d57dd793e2
--------------------------------
[ Upstream commit dd5532a4994bfda0386eb2286ec00758cee08444 ]
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs call, which means that quotacalls like Q_SYNC never see the error. This doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/quota/dquot.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 1d1d393f4208..ddb379abd919 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -687,9 +687,14 @@ int dquot_quota_sync(struct super_block *sb, int type) /* This is not very clever (and fast) but currently I don't know about * any other simple way of getting quota data to disk and we must get * them there for userspace to be visible... */ - if (sb->s_op->sync_fs) - sb->s_op->sync_fs(sb, 1); - sync_blockdev(sb->s_bdev); + if (sb->s_op->sync_fs) { + ret = sb->s_op->sync_fs(sb, 1); + if (ret) + return ret; + } + ret = sync_blockdev(sb->s_bdev); + if (ret) + return ret;
/* * Now when everything is written we can discard the pagecache so
From: Sagi Grimberg sagi@grimberg.me
stable inclusion from linux-4.19.231 commit a25e460fbb0340488d119fb2e28fe3f829b7417e
--------------------------------
[ Upstream commit 0fa0f99fc84e41057cbdd2efbfe91c6b2f47dd9d ]
Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl readiness for AER submission. This may lead to a use-after-free condition that was observed with nvme-tcp.
The race condition may happen in the following scenario: 1. driver executes its reset_ctrl_work 2. -> nvme_stop_ctrl - flushes ctrl async_event_work 3. ctrl sends AEN which is received by the host, which in turn schedules AEN handling 4. teardown admin queue (which releases the queue socket) 5. AEN processed, submits another AER, calling the driver to submit 6. driver attempts to send the cmd ==> use-after-free
In order to fix that, add ctrl state check to validate the ctrl is actually able to accept the AER submission.
This addresses the above race in controller resets because the driver during teardown should: 1. change ctrl state to RESETTING 2. flush async_event_work (as well as other async work elements)
So after 1,2, any other AER command will find the ctrl state to be RESETTING and bail out without submitting the AER.
Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/nvme/host/core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index feba1a34c15e..164ec45dac84 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3683,7 +3683,14 @@ static void nvme_async_event_work(struct work_struct *work) container_of(work, struct nvme_ctrl, async_event_work);
nvme_aen_uevent(ctrl); - ctrl->ops->submit_async_event(ctrl); + + /* + * The transport drivers must guarantee AER submission here is safe by + * flushing ctrl async_event_work after changing the controller state + * from LIVE and before freeing the admin queue. + */ + if (ctrl->state == NVME_CTRL_LIVE) + ctrl->ops->submit_async_event(ctrl); }
static bool nvme_ctrl_pp_status(struct nvme_ctrl *ctrl)
From: Guillaume Nault gnault@redhat.com
stable inclusion from linux-4.19.231 commit bbf7a1a2fc64d89896ba5eba494a40ca151a2675
--------------------------------
commit 23e7b1bfed61e301853b5e35472820d919498278 upstream.
Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"), clear the ECN bits from iph->tos when setting ->flowi4_tos. This ensures that the last bit of ->flowi4_tos is cleared, so ip_route_output_key_hash() isn't going to restrict the scope of the route lookup.
Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason to clear the high order bits.
Found by code inspection, compile tested only.
Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups") Signed-off-by: Guillaume Nault gnault@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org [sudip: manually backport to previous location] Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/xfrm4_policy.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 1e5e2e4be0b2..e85b5f57d3e9 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -17,6 +17,7 @@ #include <net/xfrm.h> #include <net/ip.h> #include <net/l3mdev.h> +#include <net/inet_ecn.h>
static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, int tos, int oif, @@ -126,7 +127,7 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) fl4->flowi4_proto = iph->protocol; fl4->daddr = reverse ? iph->saddr : iph->daddr; fl4->saddr = reverse ? iph->daddr : iph->saddr; - fl4->flowi4_tos = iph->tos; + fl4->flowi4_tos = iph->tos & ~INET_ECN_MASK;
if (!ip_is_fragment(iph)) { switch (iph->protocol) {
From: "Eric W. Biederman" ebiederm@xmission.com
stable inclusion from linux-4.19.231 commit c6055df602b9e9ed9427b7bb9088c75cb5cf6350
--------------------------------
commit 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 upstream.
In the function bacct_add_task the code reading task->exit_code was introduced in commit f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats"), and it is not entirely clear what the taskstats interface is trying to return as only returning the exit_code of the first task in a process doesn't make a lot of sense.
As best as I can figure the intent is to return task->exit_code after a task exits. The field is returned with per task fields, so the exit_code of the entire process is not wanted. Only the value of the first task is returned so this is not a useful way to get the per task ptrace stop code. The ordinary case of returning this value is returning after a task exits, which also precludes use for getting a ptrace value.
It is common to for the first task of a process to also be the last task of a process so this field may have done something reasonable by accident in testing.
Make ac_exitcode a reliable per task value by always returning it for every exited task.
Setting ac_exitcode in a sensible mannter makes it possible to continue to provide this value going forward.
Cc: Balbir Singh bsingharora@gmail.com Fixes: f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats") Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com [sudip: adjust context] Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- kernel/tsacct.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/kernel/tsacct.c b/kernel/tsacct.c index 370724b45391..8d7c6d3f1daa 100644 --- a/kernel/tsacct.c +++ b/kernel/tsacct.c @@ -46,11 +46,10 @@ void bacct_add_tsk(struct user_namespace *user_ns, /* Convert to seconds for btime */ do_div(delta, USEC_PER_SEC); stats->ac_btime = get_seconds() - delta; - if (thread_group_leader(tsk)) { + if (tsk->flags & PF_EXITING) stats->ac_exitcode = tsk->exit_code; - if (tsk->flags & PF_FORKNOEXEC) - stats->ac_flag |= AFORK; - } + if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC)) + stats->ac_flag |= AFORK; if (tsk->flags & PF_SUPERPRIV) stats->ac_flag |= ASU; if (tsk->flags & PF_DUMPCORE)
From: Xin Long lucien.xin@gmail.com
stable inclusion from linux-4.19.231 commit b0e55a57df7deed5f0682b584e4da913db2f6e84
--------------------------------
commit 35a79e64de29e8d57a5989aac57611c0cd29e13e upstream.
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if and dif, RAW socket is using inet_iif() while PING socket lookup is using skb->dev->ifindex, the cmd below fails due to this:
# ip link add dummy0 type dummy # ip link set dummy0 up # ip addr add 192.168.111.1/24 dev dummy0 # ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when destination IP is on device, and it will cause some of kselftests to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and sdif for PING socket, and keep consistent with RAW socket.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/ping.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 276442443d32..28d171e6e10b 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -177,16 +177,23 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident) struct sock *sk = NULL; struct inet_sock *isk; struct hlist_nulls_node *hnode; - int dif = skb->dev->ifindex; + int dif, sdif;
if (skb->protocol == htons(ETH_P_IP)) { + dif = inet_iif(skb); + sdif = inet_sdif(skb); pr_debug("try to find: num = %d, daddr = %pI4, dif = %d\n", (int)ident, &ip_hdr(skb)->daddr, dif); #if IS_ENABLED(CONFIG_IPV6) } else if (skb->protocol == htons(ETH_P_IPV6)) { + dif = inet6_iif(skb); + sdif = inet6_sdif(skb); pr_debug("try to find: num = %d, daddr = %pI6c, dif = %d\n", (int)ident, &ipv6_hdr(skb)->daddr, dif); #endif + } else { + pr_err("ping: protocol(%x) is not supported\n", ntohs(skb->protocol)); + return NULL; }
read_lock_bh(&ping_table.lock); @@ -226,7 +233,7 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident) }
if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif && - sk->sk_bound_dev_if != inet_sdif(skb)) + sk->sk_bound_dev_if != sdif) continue;
sock_hold(sk);
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.231 commit e294bc65746b779e4ad50f95ed04926cf72d1454
--------------------------------
commit dcd54265c8bc14bd023815e36e2d5f9d66ee1fee upstream.
trace_napi_poll_hit() is reading stat->dev while another thread can write on it from dropmon_net_event()
Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already, we only have to take care of load/store tearing.
BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit
write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1: dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579 notifier_call_chain kernel/notifier.c:84 [inline] raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392 call_netdevice_notifiers_info net/core/dev.c:1919 [inline] call_netdevice_notifiers_extack net/core/dev.c:1931 [inline] call_netdevice_notifiers net/core/dev.c:1945 [inline] unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415 ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123 vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515 ops_exit_list net/core/net_namespace.c:173 [inline] cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30
read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0: trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292 trace_napi_poll include/trace/events/napi.h:14 [inline] __napi_poll+0x36b/0x3f0 net/core/dev.c:6366 napi_poll net/core/dev.c:6432 [inline] net_rx_action+0x29e/0x650 net/core/dev.c:6519 __do_softirq+0x158/0x2de kernel/softirq.c:558 do_softirq+0xb1/0xf0 kernel/softirq.c:459 __local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:394 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30
value changed: 0xffff88815883e000 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
Fixes: 4ea7e38696c7 ("dropmon: add ability to detect when hardware dropsrxpackets") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Neil Horman nhorman@tuxdriver.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/core/drop_monitor.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 3978a5e8d261..2ed600012640 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -219,13 +219,17 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
rcu_read_lock(); list_for_each_entry_rcu(new_stat, &hw_stats_list, list) { + struct net_device *dev; + /* * only add a note to our monitor buffer if: * 1) this is the dev we received on * 2) its after the last_rx delta * 3) our rx_dropped count has gone up */ - if ((new_stat->dev == napi->dev) && + /* Paired with WRITE_ONCE() in dropmon_net_event() */ + dev = READ_ONCE(new_stat->dev); + if ((dev == napi->dev) && (time_after(jiffies, new_stat->last_rx + dm_hw_check_delta)) && (napi->dev->stats.rx_dropped != new_stat->last_drop_val)) { trace_drop_common(NULL, NULL); @@ -340,7 +344,10 @@ static int dropmon_net_event(struct notifier_block *ev_block, mutex_lock(&trace_state_mutex); list_for_each_entry_safe(new_stat, tmp, &hw_stats_list, list) { if (new_stat->dev == dev) { - new_stat->dev = NULL; + + /* Paired with READ_ONCE() in trace_napi_poll_hit() */ + WRITE_ONCE(new_stat->dev, NULL); + if (trace_state == TRACE_OFF) { list_del_rcu(&new_stat->list); kfree_rcu(new_stat, rcu);
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.231 commit 4218e6995c19970aa7b32914be6c8e059837cbdf
--------------------------------
commit 9ceaf6f76b203682bb6100e14b3d7da4c0bedde8 upstream.
syzbot reported that two threads might write over agg_select_timer at the same time. Make agg_select_timer atomic to fix the races.
BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1: bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30
write to 0xffff8881242aea90 of 4 bytes by task 25910 on cpu 0: bond_3ad_initiate_agg_selection+0x18/0x30 drivers/net/bonding/bond_3ad.c:1998 bond_open+0x658/0x6f0 drivers/net/bonding/bond_main.c:3967 __dev_open+0x274/0x3a0 net/core/dev.c:1407 dev_open+0x54/0x190 net/core/dev.c:1443 bond_enslave+0xcef/0x3000 drivers/net/bonding/bond_main.c:1937 do_set_master net/core/rtnetlink.c:2532 [inline] do_setlink+0x94f/0x2500 net/core/rtnetlink.c:2736 __rtnl_newlink net/core/rtnetlink.c:3414 [inline] rtnl_newlink+0xfeb/0x13e0 net/core/rtnetlink.c:3529 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000050 -> 0x0000004f
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G W 5.17.0-rc4-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Jay Vosburgh j.vosburgh@gmail.com Cc: Veaceslav Falico vfalico@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/net/bonding/bond_3ad.c | 30 +++++++++++++++++++++++++----- include/net/bond_3ad.h | 2 +- 2 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 035923876c61..e3f814e83d9c 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -249,7 +249,7 @@ static inline int __check_agg_selection_timer(struct port *port) if (bond == NULL) return 0;
- return BOND_AD_INFO(bond).agg_select_timer ? 1 : 0; + return atomic_read(&BOND_AD_INFO(bond).agg_select_timer) ? 1 : 0; }
/** @@ -1965,7 +1965,7 @@ static void ad_marker_response_received(struct bond_marker *marker, */ void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout) { - BOND_AD_INFO(bond).agg_select_timer = timeout; + atomic_set(&BOND_AD_INFO(bond).agg_select_timer, timeout); }
/** @@ -2249,6 +2249,28 @@ void bond_3ad_update_ad_actor_settings(struct bonding *bond) spin_unlock_bh(&bond->mode_lock); }
+/** + * bond_agg_timer_advance - advance agg_select_timer + * @bond: bonding structure + * + * Return true when agg_select_timer reaches 0. + */ +static bool bond_agg_timer_advance(struct bonding *bond) +{ + int val, nval; + + while (1) { + val = atomic_read(&BOND_AD_INFO(bond).agg_select_timer); + if (!val) + return false; + nval = val - 1; + if (atomic_cmpxchg(&BOND_AD_INFO(bond).agg_select_timer, + val, nval) == val) + break; + } + return nval == 0; +} + /** * bond_3ad_state_machine_handler - handle state machines timeout * @bond: bonding struct to work on @@ -2284,9 +2306,7 @@ void bond_3ad_state_machine_handler(struct work_struct *work) if (!bond_has_slaves(bond)) goto re_arm;
- /* check if agg_select_timer timer after initialize is timed out */ - if (BOND_AD_INFO(bond).agg_select_timer && - !(--BOND_AD_INFO(bond).agg_select_timer)) { + if (bond_agg_timer_advance(bond)) { slave = bond_first_slave_rcu(bond); port = slave ? &(SLAVE_AD_INFO(slave)->port) : NULL;
diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h index fc3111515f5c..732bc3b4606b 100644 --- a/include/net/bond_3ad.h +++ b/include/net/bond_3ad.h @@ -265,7 +265,7 @@ struct ad_system {
struct ad_bond_info { struct ad_system system; /* 802.3ad system structure */ - u32 agg_select_timer; /* Timer to select aggregator after all adapter's hand shakes */ + atomic_t agg_select_timer; /* Timer to select aggregator after all adapter's hand shakes */ u16 aggregator_identifier; };
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from linux-4.19.231 commit 67552482aee7a139ed957595db8668d28ddc2c42
--------------------------------
commit e0caaf75d443e02e55e146fd75fe2efc8aed5540 upstream.
Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory fails) [1], part of Linux since 5.17-rc2, introduced a regression, where a symbolic link on an NFS mount to a directory on another NFS does not resolve(?) the first time it is accessed:
Reported-by: Paul Menzel pmenzel@molgen.mpg.de Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Tested-by: Donald Buczek buczek@molgen.mpg.de Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/dir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 72e5ae8cc4ff..2ec602eccb80 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1640,14 +1640,14 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, if (!res) { inode = d_inode(dentry); if ((lookup_flags & LOOKUP_DIRECTORY) && inode && - !S_ISDIR(inode->i_mode)) + !(S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))) res = ERR_PTR(-ENOTDIR); else if (inode && S_ISREG(inode->i_mode)) res = ERR_PTR(-EOPENSTALE); } else if (!IS_ERR(res)) { inode = d_inode(res); if ((lookup_flags & LOOKUP_DIRECTORY) && inode && - !S_ISDIR(inode->i_mode)) { + !(S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))) { dput(res); res = ERR_PTR(-ENOTDIR); } else if (inode && S_ISREG(inode->i_mode)) {
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from linux-4.19.231 commit d2ba21f271eb167ac2b0581598e55222b89f0f32
--------------------------------
commit d19e0183a88306acda07f4a01fedeeffe2a2a06b upstream.
The result of the writeback, whether it is an ENOSPC or an EIO, or anything else, does not inhibit the NFS client from reporting the correct file timestamps.
Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/nfs/inode.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 6d5fedcd30d0..db52430dfbd7 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -787,12 +787,9 @@ int nfs_getattr(const struct path *path, struct kstat *stat, goto out_no_update;
/* Flush out writes to the server in order to update c/mtime. */ - if ((request_mask & (STATX_CTIME|STATX_MTIME)) && - S_ISREG(inode->i_mode)) { - err = filemap_write_and_wait(inode->i_mapping); - if (err) - goto out; - } + if ((request_mask & (STATX_CTIME | STATX_MTIME)) && + S_ISREG(inode->i_mode)) + filemap_write_and_wait(inode->i_mapping);
/* * We may force a getattr if the user cares about atime.
From: Bryan O'Donoghue bryan.odonoghue@linaro.org
stable inclusion from linux-4.19.231 commit 891484112f4ffd0b576e88407bbd3653abf3faba
--------------------------------
commit 5c23b3f965bc9ee696bf2ed4bdc54d339dd9a455 upstream.
Interacting with a NAND chip on an IPQ6018 I found that the qcomsmem NAND partition parser was returning -EPROBE_DEFER waiting for the main smem driver to load.
This caused the board to reset. Playing about with the probe() function shows that the problem lies in the core clock being switched off before the nandc_unalloc() routine has completed.
If we look at how qcom_nandc_remove() tears down allocated resources we see the expected order is
qcom_nandc_unalloc(nandc);
clk_disable_unprepare(nandc->aon_clk); clk_disable_unprepare(nandc->core_clk);
dma_unmap_resource(&pdev->dev, nandc->base_dma, resource_size(res), DMA_BIDIRECTIONAL, 0);
Tweaking probe() to both bring up and tear-down in that order removes the reset if we end up deferring elsewhere.
Fixes: c76b78d8ec05 ("mtd: nand: Qualcomm NAND controller driver") Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Reviewed-by: Manivannan Sadhasivam mani@kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20220103030316.58301-2-bryan.odonoghue@lin... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/mtd/nand/raw/qcom_nandc.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c index 07d8750313fd..0e4be6814fc7 100644 --- a/drivers/mtd/nand/raw/qcom_nandc.c +++ b/drivers/mtd/nand/raw/qcom_nandc.c @@ -10,7 +10,6 @@ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. */ - #include <linux/clk.h> #include <linux/slab.h> #include <linux/bitops.h> @@ -2957,10 +2956,6 @@ static int qcom_nandc_probe(struct platform_device *pdev) if (!nandc->base_dma) return -ENXIO;
- ret = qcom_nandc_alloc(nandc); - if (ret) - goto err_nandc_alloc; - ret = clk_prepare_enable(nandc->core_clk); if (ret) goto err_core_clk; @@ -2969,6 +2964,10 @@ static int qcom_nandc_probe(struct platform_device *pdev) if (ret) goto err_aon_clk;
+ ret = qcom_nandc_alloc(nandc); + if (ret) + goto err_nandc_alloc; + ret = qcom_nandc_setup(nandc); if (ret) goto err_setup; @@ -2980,15 +2979,14 @@ static int qcom_nandc_probe(struct platform_device *pdev) return 0;
err_setup: + qcom_nandc_unalloc(nandc); +err_nandc_alloc: clk_disable_unprepare(nandc->aon_clk); err_aon_clk: clk_disable_unprepare(nandc->core_clk); err_core_clk: - qcom_nandc_unalloc(nandc); -err_nandc_alloc: dma_unmap_resource(dev, res->start, resource_size(res), DMA_BIDIRECTIONAL, 0); - return ret; }
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.231 commit f63c9fa36bd7548fd07792127057cccb68a7e274
--------------------------------
commit 5740d068909676d4bdb5c9c00c37a83df7728909 upstream.
We have been living dangerously, at the mercy of malicious users, abusing TC_ACT_REPEAT, as shown by this syzpot report [1].
Add an arbitrary limit (32) to the number of times an action can return TC_ACT_REPEAT.
v2: switch the limit to 32 instead of 10. Use net_warn_ratelimited() instead of pr_err_once().
[1] (C repro available on demand)
rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0 (t=10502 jiffies g=5305 q=190) rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=0 timer-softirq=3527 rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack:29344 pid: 14 ppid: 2 flags:0x00004000 Call Trace: <TASK> context_switch kernel/sched/core.c:4986 [inline] __schedule+0xab2/0x4db0 kernel/sched/core.c:6295 schedule+0xd2/0x260 kernel/sched/core.c:6368 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline] RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508 Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84 RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206 RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000 RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80 FS: 00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115 spin_lock_bh include/linux/spinlock.h:354 [inline] sch_tree_lock include/net/sch_generic.h:610 [inline] sch_tree_lock include/net/sch_generic.h:605 [inline] prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211 prio_init+0x5c/0x80 net/sched/sch_prio.c:244 qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7ee98aae99 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0 R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4 </TASK> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs NMI backtrace for cpu 1 CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:604 [inline] check_cpu_stall kernel/rcu/tree_stall.h:688 [inline] rcu_pending kernel/rcu/tree.c:3919 [inline] rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286 Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246 RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0 R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8 tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256 tcf_action_exec net/sched/act_api.c:1049 [inline] tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026 tcf_exts_exec include/net/pkt_cls.h:326 [inline] route4_classify+0xef0/0x1400 net/sched/cls_route.c:179 __tcf_classify net/sched/cls_api.c:1549 [inline] tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615 prio_classify net/sched/sch_prio.c:42 [inline] prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668 __dev_xmit_skb net/core/dev.c:3756 [inline] __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081 neigh_hh_output include/net/neighbour.h:533 [inline] neigh_output include/net/neighbour.h:547 [inline] ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:306 [inline] __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288 ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0x196/0x310 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126 iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82 geneve_xmit_skb drivers/net/geneve.c:966 [inline] geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one net/core/dev.c:3473 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489 __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116 neigh_hh_output include/net/neighbour.h:533 [inline] neigh_output include/net/neighbour.h:547 [inline] ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826 mld_send_cr net/ipv6/mcast.c:2127 [inline] mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> ---------------- Code disassembly (best guess): 0: 48 89 eb mov %rbp,%rbx 3: c6 45 01 01 movb $0x1,0x1(%rbp) 7: 41 bc 00 80 00 00 mov $0x8000,%r12d d: 48 c1 e9 03 shr $0x3,%rcx 11: 83 e3 07 and $0x7,%ebx 14: 41 be 01 00 00 00 mov $0x1,%r14d 1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 21: fc ff df 24: 4c 8d 2c 01 lea (%rcx,%rax,1),%r13 28: eb 0c jmp 0x36 * 2a: f3 90 pause <-- trapping instruction 2c: 41 83 ec 01 sub $0x1,%r12d 30: 0f 84 72 04 00 00 je 0x4a8 36: 41 0f b6 45 00 movzbl 0x0(%r13),%eax 3b: 38 d8 cmp %bl,%al 3d: 7f 08 jg 0x47 3f: 84 .byte 0x84
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Reported-by: syzbot syzkaller@googlegroups.com Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/sched/act_api.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c index f2c4bfc79663..26851ca5566a 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -609,15 +609,24 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, restart_act_graph: for (i = 0; i < nr_actions; i++) { const struct tc_action *a = actions[i]; + int repeat_ttl;
if (jmp_prgcnt > 0) { jmp_prgcnt -= 1; continue; } + + repeat_ttl = 32; repeat: ret = a->ops->act(skb, a, res); - if (ret == TC_ACT_REPEAT) - goto repeat; /* we need a ttl - JHS */ + + if (unlikely(ret == TC_ACT_REPEAT)) { + if (--repeat_ttl != 0) + goto repeat; + /* suspicious opcode, stop pipeline */ + net_warn_ratelimited("TC_ACT_REPEAT abuse ?\n"); + return TC_ACT_OK; + }
if (TC_ACT_EXT_CMP(ret, TC_ACT_JUMP)) { jmp_prgcnt = ret & TCA_ACT_MAX_PRIO_MASK;
From: Jiasheng Jiang jiasheng@iscas.ac.cn
stable inclusion from linux-4.19.231 commit 783d70c94e513ca715643c00c6c275d9eb3b1a9e
--------------------------------
commit 2d21543efe332cd8c8f212fb7d365bc8b0690bfa upstream.
Because of the possible failure of the dma_supported(), the dma_set_mask_and_coherent() may return error num. Therefore, it should be better to check it and return the error if fails.
Fixes: dc312349e875 ("dmaengine: rcar-dmac: Widen DMA mask to 40 bits") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20220106030939.2644320-1-jiasheng@iscas.ac.cn Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/dma/sh/rcar-dmac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c index 80ff95f75199..29c51762336d 100644 --- a/drivers/dma/sh/rcar-dmac.c +++ b/drivers/dma/sh/rcar-dmac.c @@ -1817,7 +1817,9 @@ static int rcar_dmac_probe(struct platform_device *pdev) platform_set_drvdata(pdev, dmac); dmac->dev->dma_parms = &dmac->parms; dma_set_max_seg_size(dmac->dev, RCAR_DMATCR_MASK); - dma_set_mask_and_coherent(dmac->dev, DMA_BIT_MASK(40)); + ret = dma_set_mask_and_coherent(dmac->dev, DMA_BIT_MASK(40)); + if (ret) + return ret;
ret = rcar_dmac_parse_of(&pdev->dev, dmac); if (ret < 0)
From: JaeSang Yoo js.yoo.5b@gmail.com
stable inclusion from linux-4.19.231 commit 8f8c9e71e192823e4d76fdc53b4391642291bafc
--------------------------------
[ Upstream commit 3203ce39ac0b2a57a84382ec184c7d4a0bede175 ]
The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is the same as another kernel parameter "tp_printk". If "tp_printk" setup is called before the "tp_printk_stop_on_boot", it will override the latter and keep it from being set.
This is similar to other kernel parameter issues, such as: Commit 745a600cf1a6 ("um: console: Ignore console= option") or init/do_mounts.c:45 (setup function of "ro" kernel param)
Fix it by checking for a "_" right after the "tp_printk" and if that exists do not process the parameter.
Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com
Signed-off-by: JaeSang Yoo jsyoo5b@gmail.com [ Fixed up change log and added space after if condition ] Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- kernel/trace/trace.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 119dd5fd5840..feed30698280 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -232,6 +232,10 @@ __setup("trace_clock=", set_trace_boot_clock);
static int __init set_tracepoint_printk(char *str) { + /* Ignore the "tp_printk_stop_on_boot" param */ + if (*str == '_') + return 0; + if ((strcmp(str, "=0") != 0 && strcmp(str, "=off") != 0)) tracepoint_printk = 1; return 1;
From: Zhang Qiao zhangqiao22@huawei.com
stable inclusion from linux-4.19.232 commit 4eec5fe1c680a6c47a9bc0cde00960a4eb663342
--------------------------------
commit 05c7b7a92cc87ff8d7fde189d0fade250697573c upstream.
As previously discussed(https://lkml.org/lkml/2022/1/20/51), cpuset_attach() is affected with similar cpu hotplug race, as follow scenario:
cpuset_attach() cpu hotplug --------------------------- ---------------------- down_write(cpuset_rwsem) guarantee_online_cpus() // (load cpus_attach) sched_cpu_deactivate set_cpu_active() // will change cpu_active_mask set_cpus_allowed_ptr(cpus_attach) __set_cpus_allowed_ptr_locked() // (if the intersection of cpus_attach and cpu_active_mask is empty, will return -EINVAL) up_write(cpuset_rwsem)
To avoid races such as described above, protect cpuset_attach() call with cpu_hotplug_lock.
Fixes: be367d099270 ("cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time") Cc: stable@vger.kernel.org # v2.6.32+ Reported-by: Zhao Gongyi zhaogongyi@huawei.com Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Acked-by: Waiman Long longman@redhat.com Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- kernel/cgroup/cpuset.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index feb91177247c..27dd33566df8 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1530,6 +1530,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) cgroup_taskset_first(tset, &css); cs = css_cs(css);
+ cpus_read_lock(); mutex_lock(&cpuset_mutex);
/* prepare for attach */ @@ -1585,6 +1586,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) wake_up(&cpuset_attach_wq);
mutex_unlock(&cpuset_mutex); + cpus_read_unlock(); }
/* The various types of files and directories in a cpuset file system */
From: Eric Dumazet edumazet@google.com
stable inclusion from linux-4.19.232 commit 1f4ae0f158dafa74133108bfa07b8053eb2a7898
--------------------------------
commit ef527f968ae05c6717c39f49c8709a7e2c19183a upstream.
Whenever one of these functions pull all data from an skb in a frag_list, use consume_skb() instead of kfree_skb() to avoid polluting drop monitoring.
Fixes: 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function") Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20220220154052.1308469-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/core/skbuff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e1daab49b0eb..c623c129d0ab 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1977,7 +1977,7 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta) /* Free pulled out fragments. */ while ((list = skb_shinfo(skb)->frag_list) != insp) { skb_shinfo(skb)->frag_list = list->next; - kfree_skb(list); + consume_skb(list); } /* And insert new clone at head. */ if (clone) { @@ -5482,7 +5482,7 @@ static int pskb_carve_frag_list(struct sk_buff *skb, /* Free pulled out fragments. */ while ((list = shinfo->frag_list) != insp) { shinfo->frag_list = list->next; - kfree_skb(list); + consume_skb(list); } /* And insert new clone at head. */ if (clone) {
From: Tao Liu thomas.liu@ucloud.cn
stable inclusion from linux-4.19.232 commit e9ffbe63f6f32f526a461756309b61c395168d73
--------------------------------
commit cc20cced0598d9a5ff91ae4ab147b3b5e99ee819 upstream.
We encounter a tcp drop issue in our cloud environment. Packet GROed in host forwards to a VM virtio_net nic with net_failover enabled. VM acts as a IPVS LB with ipip encapsulation. The full path like: host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat -> ipip encap -> net_failover tx -> virtio_net tx
When net_failover transmits a ipip pkt (gso_type = 0x0103, which means SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso did because it supports TSO and GSO_IPXIP4. But network_header points to inner ip header.
Call Trace: tcp4_gso_segment ------> return NULL inet_gso_segment ------> inner iph, network_header points to ipip_gso_segment inet_gso_segment ------> outer iph skb_mac_gso_segment
Afterwards virtio_net transmits the pkt, only inner ip header is modified. And the outer one just keeps unchanged. The pkt will be dropped in remote host.
Call Trace: inet_gso_segment ------> inner iph, outer iph is skipped skb_mac_gso_segment __skb_gso_segment validate_xmit_skb validate_xmit_skb_list sch_direct_xmit __qdisc_run __dev_queue_xmit ------> virtio_net dev_hard_start_xmit __dev_queue_xmit ------> net_failover ip_finish_output2 ip_output iptunnel_xmit ip_tunnel_xmit ipip_tunnel_xmit ------> ipip dev_hard_start_xmit __dev_queue_xmit ip_finish_output2 ip_output ip_forward ip_rcv __netif_receive_skb_one_core netif_receive_skb_internal napi_gro_receive receive_buf virtnet_poll net_rx_action
The root cause of this issue is specific with the rare combination of SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option. SKB_GSO_DODGY is set from external virtio_net. We need to reset network header when callbacks.gso_segment() returns NULL.
This patch also includes ipv6_gso_segment(), considering SIT, etc.
Fixes: cb32f511a70b ("ipip: add GSO/TSO support") Signed-off-by: Tao Liu thomas.liu@ucloud.cn Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- net/ipv4/af_inet.c | 5 ++++- net/ipv6/ip6_offload.c | 2 ++ 2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 84ce4f86bc19..86268f3f9bb9 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1338,8 +1338,11 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb, }
ops = rcu_dereference(inet_offloads[proto]); - if (likely(ops && ops->callbacks.gso_segment)) + if (likely(ops && ops->callbacks.gso_segment)) { segs = ops->callbacks.gso_segment(skb, features); + if (!segs) + skb->network_header = skb_mac_header(skb) + nhoff - skb->head; + }
if (IS_ERR_OR_NULL(segs)) goto out; diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index c7e495f12011..6c47cd0ef240 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -98,6 +98,8 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, if (likely(ops && ops->callbacks.gso_segment)) { skb_reset_transport_header(skb); segs = ops->callbacks.gso_segment(skb, features); + if (!segs) + skb->network_header = skb_mac_header(skb) + nhoff - skb->head; }
if (IS_ERR_OR_NULL(segs))
From: "daniel.starke@siemens.com" daniel.starke@siemens.com
stable inclusion from linux-4.19.232 commit 337e49675ce55c23a50b92aae889ec5d910d6dc7
--------------------------------
commit e3b7468f082d106459e86e8dc6fb9bdd65553433 upstream.
Trying to open a DLCI by sending a SABM frame may fail with a timeout. The link is closed on the initiator side without informing the responder about this event. The responder assumes the link is open after sending a UA frame to answer the SABM frame. The link gets stuck in a half open state.
This patch fixes this by initiating the proper link termination procedure after link setup timeout instead of silently closing it down.
Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke daniel.starke@siemens.com Link: https://lore.kernel.org/r/20220218073123.2121-3-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/n_gsm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 7de54c78a7b9..3e9c8a5b6445 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -1486,7 +1486,7 @@ static void gsm_dlci_t1(struct timer_list *t) dlci->mode = DLCI_MODE_ADM; gsm_dlci_open(dlci); } else { - gsm_dlci_close(dlci); + gsm_dlci_begin_close(dlci); /* prevent half open link */ }
break;
From: Miaohe Lin linmiaohe@huawei.com
stable inclusion from linux-4.19.232 commit a6fcced73d15ab57cd97813999edf6f19d3032f8
--------------------------------
commit c94afc46cae7ad41b2ad6a99368147879f4b0e56 upstream.
memblock.{reserved,memory}.regions may be allocated using kmalloc() in memblock_double_array(). Use kfree() to release these kmalloced regions indicated by memblock_{reserved,memory}_in_slab.
Signed-off-by: Miaohe Lin linmiaohe@huawei.com Fixes: 3010f876500f ("mm: discard memblock data later") Signed-off-by: Mike Rapoport rppt@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- mm/memblock.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c index 80c6975ace6f..daa2c42a02d9 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -330,14 +330,20 @@ void __init memblock_discard(void) addr = __pa(memblock.reserved.regions); size = PAGE_ALIGN(sizeof(struct memblock_region) * memblock.reserved.max); - __memblock_free_late(addr, size); + if (memblock_reserved_in_slab) + kfree(memblock.reserved.regions); + else + __memblock_free_late(addr, size); }
if (memblock.memory.regions != memblock_memory_init_regions) { addr = __pa(memblock.memory.regions); size = PAGE_ALIGN(sizeof(struct memblock_region) * memblock.memory.max); - __memblock_free_late(addr, size); + if (memblock_memory_in_slab) + kfree(memblock.memory.regions); + else + __memblock_free_late(addr, size); } } #endif
From: Linus Torvalds torvalds@linux-foundation.org
stable inclusion from linux-4.19.232 commit 400c2f361c25bc092d0636cfa32d0549a181e653
--------------------------------
commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream.
Commit 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it") fixed a race with getting a reference to a file just as it was being closed. It was a fairly minimal patch, and I didn't think re-checking the file pointer lookup would be a measurable overhead, since it was all right there and cached.
But I was wrong, as pointed out by the kernel test robot.
The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed quite noticeably. Admittedly it seems to be a very artificial test: doing "poll()" system calls on regular files in a very tight loop in multiple threads.
That means that basically all the time is spent just looking up file descriptors without ever doing anything useful with them (not that doing 'poll()' on a regular file is useful to begin with). And as a result it shows the extra "re-check fd" cost as a sore thumb.
Happily, the regression is fixable by just writing the code to loook up the fd to be better and clearer. There's still a cost to verify the file pointer, but now it's basically in the noise even for that benchmark that does nothing else - and the code is more understandable and has better comments too.
[ Side note: this patch is also a classic case of one that looks very messy with the default greedy Myers diff - it's much more legible with either the patience of histogram diff algorithm ]
Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/ Reported-by: kernel test robot oliver.sang@intel.com Tested-by: Carel Si beibei.si@intel.com Cc: Jann Horn jannh@google.com Cc: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- fs/file.c | 73 +++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 57 insertions(+), 16 deletions(-)
diff --git a/fs/file.c b/fs/file.c index 20a44d0ac400..eb10b7634d76 100644 --- a/fs/file.c +++ b/fs/file.c @@ -762,28 +762,69 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); }
-static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) +static inline struct file *__fget_files_rcu(struct files_struct *files, + unsigned int fd, fmode_t mask, unsigned int refs) { - struct files_struct *files = current->files; - struct file *file; + for (;;) { + struct file *file; + struct fdtable *fdt = rcu_dereference_raw(files->fdt); + struct file __rcu **fdentry;
- rcu_read_lock(); -loop: - file = fcheck_files(files, fd); - if (file) { - /* File object ref couldn't be taken. - * dup2() atomicity guarantee is the reason - * we loop to catch the new file (or NULL pointer) + if (unlikely(fd >= fdt->max_fds)) + return NULL; + + fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + file = rcu_dereference_raw(*fdentry); + if (unlikely(!file)) + return NULL; + + if (unlikely(file->f_mode & mask)) + return NULL; + + /* + * Ok, we have a file pointer. However, because we do + * this all locklessly under RCU, we may be racing with + * that file being closed. + * + * Such a race can take two forms: + * + * (a) the file ref already went down to zero, + * and get_file_rcu_many() fails. Just try + * again: + */ + if (unlikely(!get_file_rcu_many(file, refs))) + continue; + + /* + * (b) the file table entry has changed under us. + * Note that we don't need to re-check the 'fdt->fd' + * pointer having changed, because it always goes + * hand-in-hand with 'fdt'. + * + * If so, we need to put our refs and try again. */ - if (file->f_mode & mask) - file = NULL; - else if (!get_file_rcu_many(file, refs)) - goto loop; - else if (__fcheck_files(files, fd) != file) { + if (unlikely(rcu_dereference_raw(files->fdt) != fdt) || + unlikely(rcu_dereference_raw(*fdentry) != file)) { fput_many(file, refs); - goto loop; + continue; } + + /* + * Ok, we have a ref to the file, and checked that it + * still exists. + */ + return file; } +} + + +static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) +{ + struct files_struct *files = current->files; + struct file *file; + + rcu_read_lock(); + file = __fget_files_rcu(files, fd, mask, refs); rcu_read_unlock();
return file;
From: "daniel.starke@siemens.com" daniel.starke@siemens.com
stable inclusion from linux-4.19.232 commit 28ca082153794cf5c98e7bb93d7f30f8ba46bec4
--------------------------------
commit 737b0ef3be6b319d6c1fd64193d1603311969326 upstream.
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010. See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.a... The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to the newer 27.010 here. Chapter 5.4.6.3.7 describes the encoding of the control signal octet used by the MSC (modem status command). The same encoding is also used in convergence layer type 2 as described in chapter 5.5.2. Table 7 and 24 both require the DV (data valid) bit to be set 1 for outgoing control signal octets sent by the DTE (data terminal equipment), i.e. for the initiator side. Currently, the DV bit is only set if CD (carrier detect) is on, regardless of the side.
This patch fixes this behavior by setting the DV bit on the initiator side unconditionally.
Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke daniel.starke@siemens.com Link: https://lore.kernel.org/r/20220218073123.2121-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- drivers/tty/n_gsm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 3e9c8a5b6445..3f08ad7e629a 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -428,7 +428,7 @@ static u8 gsm_encode_modem(const struct gsm_dlci *dlci) modembits |= MDM_RTR; if (dlci->modem_tx & TIOCM_RI) modembits |= MDM_IC; - if (dlci->modem_tx & TIOCM_CD) + if (dlci->modem_tx & TIOCM_CD || dlci->gsm->initiator) modembits |= MDM_DV; return modembits; }
From: Igor Pylypiv ipylypiv@google.com
stable inclusion from linux-4.19.231 commit a0c66ac8b72f816d5631fde0ca0b39af602dce48
--------------------------------
[ Upstream commit 67d6212afda218d564890d1674bab28e8612170f ]
This reverts commit 774a1221e862b343388347bac9b318767336b20b.
We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule().
For example, PCI driver probing is invoked from a worker thread based on a node where device is attached:
if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi);
We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish.
The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async)
modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags |= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full();
Commit 21c3c5d28007 ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit 774a1221e862 ("module, async: async_synchronize_full() on module init iff async is used") tried to fix.
Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed.
Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested.
Signed-off-by: Igor Pylypiv ipylypiv@google.com Reviewed-by: Changyuan Lyu changyuanl@google.com Reviewed-by: Luis Chamberlain mcgrof@kernel.org Acked-by: Tejun Heo tj@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- include/linux/sched.h | 1 - kernel/async.c | 3 --- kernel/module.c | 25 +++++-------------------- 3 files changed, 5 insertions(+), 24 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index 0202fe007318..ec7418ff12d4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1432,7 +1432,6 @@ extern struct pid *cad_pid; #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */ #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ -#define PF_USED_ASYNC 0x00004000 /* Used async_schedule*(), used by module init */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ #define PF_FROZEN 0x00010000 /* Frozen for system suspend */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ diff --git a/kernel/async.c b/kernel/async.c index a893d6170944..4bf1b00a28d8 100644 --- a/kernel/async.c +++ b/kernel/async.c @@ -191,9 +191,6 @@ static async_cookie_t __async_schedule(async_func_t func, void *data, struct asy atomic_inc(&entry_count); spin_unlock_irqrestore(&async_lock, flags);
- /* mark that this task has queued an async job, used by module init */ - current->flags |= PF_USED_ASYNC; - /* schedule for execution */ queue_work(system_unbound_wq, &entry->work);
diff --git a/kernel/module.c b/kernel/module.c index 38c583d54def..591c176ac6a2 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3493,12 +3493,6 @@ static noinline int do_init_module(struct module *mod) } freeinit->module_init = mod->init_layout.base;
- /* - * We want to find out whether @mod uses async during init. Clear - * PF_USED_ASYNC. async_schedule*() will set it. - */ - current->flags &= ~PF_USED_ASYNC; - do_mod_ctors(mod); /* Start the module */ if (mod->init != NULL) @@ -3524,22 +3518,13 @@ static noinline int do_init_module(struct module *mod)
/* * We need to finish all async code before the module init sequence - * is done. This has potential to deadlock. For example, a newly - * detected block device can trigger request_module() of the - * default iosched from async probing task. Once userland helper - * reaches here, async_synchronize_full() will wait on the async - * task waiting on request_module() and deadlock. - * - * This deadlock is avoided by perfomring async_synchronize_full() - * iff module init queued any async jobs. This isn't a full - * solution as it will deadlock the same if module loading from - * async jobs nests more than once; however, due to the various - * constraints, this hack seems to be the best option for now. - * Please refer to the following thread for details. + * is done. This has potential to deadlock if synchronous module + * loading is requested from async (which is not allowed!). * - * http://thread.gmane.org/gmane.linux.kernel/1420814 + * See commit 0fdff3ec6d87 ("async, kmod: warn on synchronous + * request_module() from async workers") for more details. */ - if (!mod->async_probe_requested && (current->flags & PF_USED_ASYNC)) + if (!mod->async_probe_requested) async_synchronize_full();
ftrace_free_mem(mod, mod->init_layout.base, mod->init_layout.base +