*** BLURB HERE ***
Chen Zhongjin (2): profiling: fix shift too large makes kernel panic kprobes: Forbid probing on trampoline and BPF code areas
Dan Carpenter (1): kfifo: fix kfifo_to_user() return type
David Howells (1): vfs: Check the truncate maximum size in inode_newsize_ok()
Duoming Zhou (1): mtd: sm_ftl: Fix deadlock caused by cancel_work_sync in sm_release
Eric Dumazet (1): tcp: fix over estimation in sk_forced_mem_schedule()
Eric Whitney (1): ext4: fix extent status tree race in writeback error recovery path
Hector Martin (1): locking/atomic: Make test_and_*_bit() ordered on failure
Ilpo Järvinen (1): serial: 8250_dw: Store LSR into lsr_saved_flags in dw8250_tx_wait_empty()
Jason A. Donenfeld (1): fs: check FMODE_LSEEK to control internal pipe splicing
Johan Hovold (1): x86/pmem: Fix platform-device leak in error path
Kiselev, Oleg (1): ext4: avoid resizing to a partial cluster size
Kuniyuki Iwashima (36): ip: Fix data-races around sysctl_ip_fwd_use_pmtu. ip: Fix data-races around sysctl_ip_nonlocal_bind. ip: Fix a data-race around sysctl_fwmark_reflect. tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept. tcp: Fix data-races around sysctl_tcp_mtu_probing. tcp: Fix a data-race around sysctl_tcp_probe_threshold. tcp: Fix a data-race around sysctl_tcp_probe_interval. igmp: Fix data-races around sysctl_igmp_llm_reports. igmp: Fix a data-race around sysctl_igmp_max_memberships. tcp: Fix data-races around sysctl_tcp_reordering. tcp: Fix data-races around some timeout sysctl knobs. tcp: Fix a data-race around sysctl_tcp_notsent_lowat. tcp: Fix a data-race around sysctl_tcp_tw_reuse. tcp: Fix data-races around sysctl_tcp_fastopen. tcp: Fix a data-race around sysctl_tcp_early_retrans. tcp: Fix data-races around sysctl_tcp_recovery. tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts. tcp: Fix data-races around sysctl_tcp_slow_start_after_idle. tcp: Fix a data-race around sysctl_tcp_retrans_collapse. tcp: Fix a data-race around sysctl_tcp_stdurg. tcp: Fix a data-race around sysctl_tcp_rfc1337. tcp: Fix data-races around sysctl_tcp_max_reordering. tcp: Fix data-races around sysctl_tcp_dsack. tcp: Fix a data-race around sysctl_tcp_app_win. tcp: Fix a data-race around sysctl_tcp_adv_win_scale. tcp: Fix a data-race around sysctl_tcp_frto. tcp: Fix a data-race around sysctl_tcp_nometrics_save. tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit. net: ping6: Fix memleak in ipv6_renew_options(). igmp: Fix data-races around sysctl_igmp_qrv. tcp: Fix a data-race around sysctl_tcp_min_tso_segs. tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen. tcp: Fix a data-race around sysctl_tcp_autocorking. tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit. tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns. tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
Lukas Czerner (1): ext4: make sure ext4_append() always allocates new block
Matthias May (1): geneve: do not use RT_TOS for IPv6 flowlabel
Miaohe Lin (1): mm/mmap.c: fix missing call to vm_unacct_memory in mmap_region
Nicolas Saenz Julienne (1): nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()
Theodore Ts'o (1): ext4: update s_overhead_clusters in the superblock during an on-line resize
Trond Myklebust (2): NFSv4/pnfs: Fix a use-after-free bug in open SUNRPC: Reinitialise the backchannel request buffers before reuse
Uwe Kleine-König (1): mtd: st_spi_fsm: Add a clk_disable_unprepare() in .probe()'s error path
Vincent Mailhol (1): can: error: specify the values of data[5..7] of CAN error frames
Wang Cheng (1): mm/mempolicy: fix uninit-value in mpol_rebind_policy()
Xiu Jianfeng (1): selinux: Add boundary check in put_entry()
Yang Xu (1): fs: Add missing umask strip in vfs_tmpfile
Yang Yingliang (1): bus: hisi_lpc: fix missing platform_device_put() in hisi_lpc_acpi_probe()
Yonglong Li (1): tcp: make retransmitted SKB fit into the send window
Zhang Xianwei (1): NFSv4.1: RECLAIM_COMPLETE must handle EACCES
huhai (1): ACPI: LPSS: Fix missing check in register_device_clock()
Documentation/atomic_bitops.txt | 2 +- arch/x86/kernel/pmem.c | 7 ++++- drivers/acpi/acpi_lpss.c | 3 ++ drivers/bus/hisi_lpc.c | 10 +++--- drivers/mtd/devices/st_spi_fsm.c | 8 +++-- drivers/mtd/sm_ftl.c | 2 +- drivers/net/geneve.c | 3 +- drivers/tty/serial/8250/8250_dw.c | 3 ++ fs/attr.c | 2 ++ fs/ext4/inode.c | 7 +++++ fs/ext4/namei.c | 16 ++++++++++ fs/ext4/resize.c | 11 +++++++ fs/namei.c | 2 ++ fs/nfs/nfs4proc.c | 14 ++++++--- fs/splice.c | 10 +++--- include/asm-generic/bitops/atomic.h | 6 ---- include/linux/kfifo.h | 2 +- include/net/inet_sock.h | 5 +-- include/net/ip.h | 4 +-- include/net/tcp.h | 11 ++++--- include/uapi/linux/can/error.h | 5 ++- kernel/kprobes.c | 3 +- kernel/profile.c | 7 +++++ kernel/sched/rt.c | 15 +++++---- mm/mempolicy.c | 2 +- mm/mmap.c | 1 - net/ipv4/af_inet.c | 2 +- net/ipv4/igmp.c | 47 +++++++++++++++++------------ net/ipv4/route.c | 2 +- net/ipv4/tcp.c | 12 +++++--- net/ipv4/tcp_fastopen.c | 4 +-- net/ipv4/tcp_input.c | 39 ++++++++++++++---------- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_metrics.c | 5 +-- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c | 46 +++++++++++++++++----------- net/ipv4/tcp_recovery.c | 6 ++-- net/ipv4/tcp_timer.c | 14 ++++----- net/ipv6/ping.c | 6 ++++ net/sctp/protocol.c | 2 +- net/sunrpc/backchannel_rqst.c | 14 +++++++++ security/selinux/ss/policydb.h | 2 ++ 42 files changed, 240 insertions(+), 126 deletions(-)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit eb15262128b793e4b1d1c4514d3e6d19c3959764 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 60c158dc7b1f0558f6cadd5b50d0386da0000d50 ]
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: f87c10a8aa1e ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/ip.h | 2 +- net/ipv4/route.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h index 621eeb9f142e..e4f34bf49b86 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -401,7 +401,7 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, struct net *net = dev_net(dst->dev); unsigned int mtu;
- if (net->ipv4.sysctl_ip_fwd_use_pmtu || + if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) || ip_mtu_locked(dst) || !forwarding) return dst_mtu(dst); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index b79e67d79a97..68f67bf99ed7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1415,7 +1415,7 @@ u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr) struct net_device *dev = nh->nh_dev; u32 mtu = 0;
- if (dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu || + if (READ_ONCE(dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu) || fi->fib_metrics->metrics[RTAX_LOCK - 1] & (1 << RTAX_MTU)) mtu = fi->fib_mtu;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit cddc141d932d31eee1038bf6977c47c4575e61ae category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 289d3b21fb0bfc94c4e98f10635bba1824e5f83c ]
While reading sysctl_ip_nonlocal_bind, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/inet_sock.h | 2 +- net/sctp/protocol.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index d5f309875338..ff37a835a853 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -379,7 +379,7 @@ static inline bool inet_get_convert_csum(struct sock *sk) static inline bool inet_can_nonlocal_bind(struct net *net, struct inet_sock *inet) { - return net->ipv4.sysctl_ip_nonlocal_bind || + return READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind) || inet->freebind || inet->transparent; }
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 7207a9769f1a..8db8209c5b61 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -373,7 +373,7 @@ static int sctp_v4_available(union sctp_addr *addr, struct sctp_sock *sp) if (addr->v4.sin_addr.s_addr != htonl(INADDR_ANY) && ret != RTN_LOCAL && !sp->inet.freebind && - !net->ipv4.sysctl_ip_nonlocal_bind) + !READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind)) return 0;
if (ipv6_only_sock(sctp_opt2sk(sp)))
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 9096edcf4854289f92252e086cf6e498c7f8c21d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 85d0b4dbd74b95cc492b1f4e34497d3f894f5d9a ]
While reading sysctl_fwmark_reflect, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/ip.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/ip.h b/include/net/ip.h index e4f34bf49b86..a711bafa62ce 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -340,7 +340,7 @@ void ipfrag_init(void); void ip_static_sysctl_init(void);
#define IP4_REPLY_MARK(net, mark) \ - ((net)->ipv4.sysctl_fwmark_reflect ? (mark) : 0) + (READ_ONCE((net)->ipv4.sysctl_fwmark_reflect) ? (mark) : 0)
static inline bool ip_is_fragment(const struct iphdr *iph) {
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit abf70de2ec026ae8d7da4e79bec61888a880e00b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 1a0008f9df59451d0a17806c1ee1a19857032fa8 ]
While reading sysctl_tcp_fwmark_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 84f39b08d786 ("net: support marking accepting TCP sockets") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/inet_sock.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index ff37a835a853..7fc0df2b904d 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -112,7 +112,8 @@ static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk)
static inline u32 inet_request_mark(const struct sock *sk, struct sk_buff *skb) { - if (!sk->sk_mark && sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept) + if (!sk->sk_mark && + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept)) return skb->mark;
return sk->sk_mark;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 7e8fc428a7f680f1c4994a40e52d7f95a9a93038 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit f47d00e077e7d61baf69e46dde3210c886360207 ]
While reading sysctl_tcp_mtu_probing, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 5d424d5a674f ("[TCP]: MTU probing") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_timer.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c9d02047852d..3dff60bcda40 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1576,7 +1576,7 @@ void tcp_mtup_init(struct sock *sk) struct inet_connection_sock *icsk = inet_csk(sk); struct net *net = sock_net(sk);
- icsk->icsk_mtup.enabled = net->ipv4.sysctl_tcp_mtu_probing > 1; + icsk->icsk_mtup.enabled = READ_ONCE(net->ipv4.sysctl_tcp_mtu_probing) > 1; icsk->icsk_mtup.search_high = tp->rx_opt.mss_clamp + sizeof(struct tcphdr) + icsk->icsk_af_ops->net_header_len; icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, net->ipv4.sysctl_tcp_base_mss); diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index d071ed6b8b9a..5d761333ffc4 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -144,7 +144,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) int mss;
/* Black hole detection */ - if (!net->ipv4.sysctl_tcp_mtu_probing) + if (!READ_ONCE(net->ipv4.sysctl_tcp_mtu_probing)) return;
if (!icsk->icsk_mtup.enabled) {
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 96900fa61777402eb5056269d8000aace33a8b6c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 92c0aa4175474483d6cf373314343d4e624e882a ]
While reading sysctl_tcp_probe_threshold, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 6b58e0a5f32d ("ipv4: Use binary search to choose tcp PMTU probe_size") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 3dff60bcda40..d9b2b39f6cae 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2165,7 +2165,7 @@ static int tcp_mtu_probe(struct sock *sk) * probing process by not resetting search range to its orignal. */ if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high) || - interval < net->ipv4.sysctl_tcp_probe_threshold) { + interval < READ_ONCE(net->ipv4.sysctl_tcp_probe_threshold)) { /* Check whether enough time has elaplased for * another round of probing. */
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit b3798d3519eda9c409bb0815b0102f27ec42468d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 2a85388f1d94a9f8b5a529118a2c5eaa0520d85c ]
While reading sysctl_tcp_probe_interval, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 05cbc0db03e8 ("ipv4: Create probe timer for tcp PMTU as per RFC4821") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d9b2b39f6cae..a677b1d14d29 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2083,7 +2083,7 @@ static inline void tcp_mtu_check_reprobe(struct sock *sk) u32 interval; s32 delta;
- interval = net->ipv4.sysctl_tcp_probe_interval; + interval = READ_ONCE(net->ipv4.sysctl_tcp_probe_interval); delta = tcp_jiffies32 - icsk->icsk_mtup.probe_timestamp; if (unlikely(delta >= interval * HZ)) { int mss = tcp_current_mss(sk);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit ed876e99ccf417b8bd7fd8408ba5e8b008e46cc8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit f6da2267e71106474fbc0943dc24928b9cb79119 ]
While reading sysctl_igmp_llm_reports, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next.
if (ipv4_is_local_multicast(pmc->multiaddr) && !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
Fixes: df2cf4a78e48 ("IGMP: Inhibit reports for local multicast groups") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/igmp.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index cc25348e70c0..48dd4bf96926 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -471,7 +471,8 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc,
if (pmc->multiaddr == IGMP_ALL_HOSTS) return skb; - if (ipv4_is_local_multicast(pmc->multiaddr) && !net->ipv4.sysctl_igmp_llm_reports) + if (ipv4_is_local_multicast(pmc->multiaddr) && + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) return skb;
mtu = READ_ONCE(dev->mtu); @@ -597,7 +598,7 @@ static int igmpv3_send_report(struct in_device *in_dev, struct ip_mc_list *pmc) if (pmc->multiaddr == IGMP_ALL_HOSTS) continue; if (ipv4_is_local_multicast(pmc->multiaddr) && - !net->ipv4.sysctl_igmp_llm_reports) + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) continue; spin_lock_bh(&pmc->lock); if (pmc->sfcount[MCAST_EXCLUDE]) @@ -740,7 +741,8 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc, if (type == IGMPV3_HOST_MEMBERSHIP_REPORT) return igmpv3_send_report(in_dev, pmc);
- if (ipv4_is_local_multicast(group) && !net->ipv4.sysctl_igmp_llm_reports) + if (ipv4_is_local_multicast(group) && + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) return 0;
if (type == IGMP_HOST_LEAVE_MESSAGE) @@ -924,7 +926,8 @@ static bool igmp_heard_report(struct in_device *in_dev, __be32 group)
if (group == IGMP_ALL_HOSTS) return false; - if (ipv4_is_local_multicast(group) && !net->ipv4.sysctl_igmp_llm_reports) + if (ipv4_is_local_multicast(group) && + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) return false;
rcu_read_lock(); @@ -1049,7 +1052,7 @@ static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb, if (im->multiaddr == IGMP_ALL_HOSTS) continue; if (ipv4_is_local_multicast(im->multiaddr) && - !net->ipv4.sysctl_igmp_llm_reports) + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) continue; spin_lock_bh(&im->lock); if (im->tm_running) @@ -1299,7 +1302,8 @@ static void igmp_group_dropped(struct ip_mc_list *im) #ifdef CONFIG_IP_MULTICAST if (im->multiaddr == IGMP_ALL_HOSTS) return; - if (ipv4_is_local_multicast(im->multiaddr) && !net->ipv4.sysctl_igmp_llm_reports) + if (ipv4_is_local_multicast(im->multiaddr) && + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) return;
reporter = im->reporter; @@ -1336,7 +1340,8 @@ static void igmp_group_added(struct ip_mc_list *im) #ifdef CONFIG_IP_MULTICAST if (im->multiaddr == IGMP_ALL_HOSTS) return; - if (ipv4_is_local_multicast(im->multiaddr) && !net->ipv4.sysctl_igmp_llm_reports) + if (ipv4_is_local_multicast(im->multiaddr) && + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) return;
if (in_dev->dead) @@ -1657,7 +1662,7 @@ static void ip_mc_rejoin_groups(struct in_device *in_dev) if (im->multiaddr == IGMP_ALL_HOSTS) continue; if (ipv4_is_local_multicast(im->multiaddr) && - !net->ipv4.sysctl_igmp_llm_reports) + !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) continue;
/* a failover is happening and switches
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 97e228a4efe3a751fb9c9a147f6039d5e287824a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 6305d821e3b9b5379d348528e5b5faf316383bc2 ]
While reading sysctl_igmp_max_memberships, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/igmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 48dd4bf96926..670402e7b49b 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -2212,7 +2212,7 @@ static int __ip_mc_join_group(struct sock *sk, struct ip_mreqn *imr, count++; } err = -ENOBUFS; - if (count >= net->ipv4.sysctl_igmp_max_memberships) + if (count >= READ_ONCE(net->ipv4.sysctl_igmp_max_memberships)) goto done; iml = sock_kmalloc(sk, sizeof(*iml), GFP_KERNEL); if (!iml)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 59ce170d33dd8c3c95049765ffc311584585e366 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 46778cd16e6a5ad1b2e3a91f6c057c907379418e ]
While reading sysctl_tcp_reordering, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_input.c | 10 +++++++--- net/ipv4/tcp_metrics.c | 3 ++- 3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6d955ee09d1c..e22c5a9b29a7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -440,7 +440,7 @@ void tcp_init_sock(struct sock *sk) tp->snd_cwnd_clamp = ~0; tp->mss_cache = TCP_MSS_DEFAULT;
- tp->reordering = sock_net(sk)->ipv4.sysctl_tcp_reordering; + tp->reordering = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reordering); tcp_assign_congestion_control(sk);
tp->tsoffset = 0; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e30533a84f62..2dcc98c4950c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1986,6 +1986,7 @@ void tcp_enter_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; + u8 reordering;
tcp_timeout_mark_lost(sk);
@@ -2006,10 +2007,12 @@ void tcp_enter_loss(struct sock *sk) /* Timeout in disordered state after receiving substantial DUPACKs * suggests that the degree of reordering is over-estimated. */ + reordering = READ_ONCE(net->ipv4.sysctl_tcp_reordering); if (icsk->icsk_ca_state <= TCP_CA_Disorder && - tp->sacked_out >= net->ipv4.sysctl_tcp_reordering) + tp->sacked_out >= reordering) tp->reordering = min_t(unsigned int, tp->reordering, - net->ipv4.sysctl_tcp_reordering); + reordering); + tcp_set_ca_state(sk, TCP_CA_Loss); tp->high_seq = tp->snd_nxt; tcp_ecn_queue_cwr(tp); @@ -3306,7 +3309,8 @@ static inline bool tcp_may_raise_cwnd(const struct sock *sk, const int flag) * new SACK or ECE mark may first advance cwnd here and later reduce * cwnd in tcp_fastretrans_alert() based on more states. */ - if (tcp_sk(sk)->reordering > sock_net(sk)->ipv4.sysctl_tcp_reordering) + if (tcp_sk(sk)->reordering > + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reordering)) return flag & FLAG_FORWARD_PROGRESS;
return flag & FLAG_DATA_ACKED; diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 03b51cdcc731..61843c6d7a47 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -425,7 +425,8 @@ void tcp_update_metrics(struct sock *sk) if (!tcp_metric_locked(tm, TCP_METRIC_REORDERING)) { val = tcp_metric_get(tm, TCP_METRIC_REORDERING); if (val < tp->reordering && - tp->reordering != net->ipv4.sysctl_tcp_reordering) + tp->reordering != + READ_ONCE(net->ipv4.sysctl_tcp_reordering)) tcp_metric_set(tm, TCP_METRIC_REORDERING, tp->reordering); }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit f197442a0ea7fc0fa3103379d0be710664c76ba4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 39e24435a776e9de5c6dd188836cf2523547804b ]
While reading these sysctl knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers.
- tcp_retries1 - tcp_retries2 - tcp_orphan_retries - tcp_fin_timeout
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/tcp.h | 3 ++- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_timer.c | 10 +++++----- 4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index e34bec322f2e..970f4a70e03c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1432,7 +1432,8 @@ static inline u32 keepalive_time_elapsed(const struct tcp_sock *tp)
static inline int tcp_fin_time(const struct sock *sk) { - int fin_timeout = tcp_sk(sk)->linger2 ? : sock_net(sk)->ipv4.sysctl_tcp_fin_timeout; + int fin_timeout = tcp_sk(sk)->linger2 ? : + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fin_timeout); const int rto = inet_csk(sk)->icsk_rto;
if (fin_timeout < (rto << 2) - (rto >> 1)) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e22c5a9b29a7..5df8efdbdaa9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3391,7 +3391,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level, case TCP_LINGER2: val = tp->linger2; if (val >= 0) - val = (val ? : net->ipv4.sysctl_tcp_fin_timeout) / HZ; + val = (val ? : READ_ONCE(net->ipv4.sysctl_tcp_fin_timeout)) / HZ; break; case TCP_DEFER_ACCEPT: val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a677b1d14d29..7c7ee314531b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3829,7 +3829,7 @@ void tcp_send_probe0(struct sock *sk) }
if (err <= 0) { - if (icsk->icsk_backoff < net->ipv4.sysctl_tcp_retries2) + if (icsk->icsk_backoff < READ_ONCE(net->ipv4.sysctl_tcp_retries2)) icsk->icsk_backoff++; icsk->icsk_probes_out++; probe_max = TCP_RTO_MAX; diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 5d761333ffc4..e905fff09fea 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -124,7 +124,7 @@ static int tcp_out_of_resources(struct sock *sk, bool do_reset) */ static int tcp_orphan_retries(struct sock *sk, bool alive) { - int retries = sock_net(sk)->ipv4.sysctl_tcp_orphan_retries; /* May be zero. */ + int retries = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_orphan_retries); /* May be zero. */
/* We know from an ICMP that something is wrong. */ if (sk->sk_err_soft && !alive) @@ -226,7 +226,7 @@ static int tcp_write_timeout(struct sock *sk) retry_until = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_syn_retries; expired = icsk->icsk_retransmits >= retry_until; } else { - if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0)) { + if (retransmits_timed_out(sk, READ_ONCE(net->ipv4.sysctl_tcp_retries1), 0)) { /* Black hole detection */ tcp_mtu_probing(icsk, sk);
@@ -235,7 +235,7 @@ static int tcp_write_timeout(struct sock *sk) sk_rethink_txhash(sk); }
- retry_until = net->ipv4.sysctl_tcp_retries2; + retry_until = READ_ONCE(net->ipv4.sysctl_tcp_retries2); if (sock_flag(sk, SOCK_DEAD)) { const bool alive = icsk->icsk_rto < TCP_RTO_MAX;
@@ -362,7 +362,7 @@ static void tcp_probe_timer(struct sock *sk) (s32)(tcp_time_stamp(tp) - start_ts) > icsk->icsk_user_timeout) goto abort;
- max_probes = sock_net(sk)->ipv4.sysctl_tcp_retries2; + max_probes = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_retries2); if (sock_flag(sk, SOCK_DEAD)) { const bool alive = inet_csk_rto_backoff(icsk, TCP_RTO_MAX) < TCP_RTO_MAX;
@@ -556,7 +556,7 @@ void tcp_retransmit_timer(struct sock *sk) } inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, tcp_clamp_rto_to_user_timeout(sk), TCP_RTO_MAX); - if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1 + 1, 0)) + if (retransmits_timed_out(sk, READ_ONCE(net->ipv4.sysctl_tcp_retries1) + 1, 0)) __sk_dst_reset(sk);
out:;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 0f75343584ee474303e17efe0610bdd170af1d13 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 55be873695ed8912eb77ff46d1d1cadf028bd0f3 ]
While reading sysctl_tcp_notsent_lowat, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 970f4a70e03c..d633f98b40f9 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1878,7 +1878,7 @@ void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr); static inline u32 tcp_notsent_lowat(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); - return tp->notsent_lowat ?: net->ipv4.sysctl_tcp_notsent_lowat; + return tp->notsent_lowat ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat); }
static inline bool tcp_stream_memory_free(const struct sock *sk)
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 5b38c82dfeada3a1c020c9a962f0082fc063d665 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit cbfc6495586a3f09f6f07d9fb3c7cafe807e3c55 ]
While reading sysctl_tcp_tw_reuse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_ipv4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index d57766f32c1e..3d7ed60b456a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -110,10 +110,10 @@ static u32 tcp_v4_init_ts_off(const struct net *net, const struct sk_buff *skb)
int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp) { + int reuse = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tw_reuse); const struct inet_timewait_sock *tw = inet_twsk(sktw); const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw); struct tcp_sock *tp = tcp_sk(sk); - int reuse = sock_net(sk)->ipv4.sysctl_tcp_tw_reuse;
if (reuse == 2) { /* Still does not detect *everything* that goes through
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 03da610696a32578fc4f986479341ce9d430df08 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 5a54213318c43f4009ae158347aa6016e3b9b55a ]
While reading sysctl_tcp_fastopen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 2100c8d2d9db ("net-tcp: Fast Open base") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/af_inet.c | 2 +- net/ipv4/tcp.c | 6 ++++-- net/ipv4/tcp_fastopen.c | 4 ++-- 3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 2612f5343e58..ba716480f2f1 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -218,7 +218,7 @@ int inet_listen(struct socket *sock, int backlog) * because the socket was in TCP_LISTEN state previously but * was shutdown() rather than close(). */ - tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen; + tcp_fastopen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fastopen); if ((tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) && (tcp_fastopen & TFO_SERVER_ENABLE) && !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5df8efdbdaa9..ec6015f1c213 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1165,7 +1165,8 @@ static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, struct sockaddr *uaddr = msg->msg_name; int err, flags;
- if (!(sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) || + if (!(READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fastopen) & + TFO_CLIENT_ENABLE) || (uaddr && msg->msg_namelen >= sizeof(uaddr->sa_family) && uaddr->sa_family == AF_UNSPEC)) return -EOPNOTSUPP; @@ -3061,7 +3062,8 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_FASTOPEN_CONNECT: if (val > 1 || val < 0) { err = -EINVAL; - } else if (net->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) { + } else if (READ_ONCE(net->ipv4.sysctl_tcp_fastopen) & + TFO_CLIENT_ENABLE) { if (sk->sk_state == TCP_CLOSE) tp->fastopen_connect = val; else diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 2ab371f55525..54cc3b9e443c 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -313,7 +313,7 @@ static bool tcp_fastopen_no_cookie(const struct sock *sk, const struct dst_entry *dst, int flag) { - return (sock_net(sk)->ipv4.sysctl_tcp_fastopen & flag) || + return (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fastopen) & flag) || tcp_sk(sk)->fastopen_no_cookie || (dst && dst_metric(dst, RTAX_FASTOPEN_NO_COOKIE)); } @@ -328,7 +328,7 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb, const struct dst_entry *dst) { bool syn_data = TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq + 1; - int tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen; + int tcp_fastopen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fastopen); struct tcp_fastopen_cookie valid_foc = { .len = -1 }; struct sock *child;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 488d3ad98ef7cddce7054193dbae6b4349c6807d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 52e65865deb6a36718a463030500f16530eaab74 ]
While reading sysctl_tcp_early_retrans, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7c7ee314531b..45ee2dbc2340 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2507,7 +2507,7 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto) if (tp->fastopen_rsk) return false;
- early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans; + early_retrans = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_early_retrans); /* Schedule a loss probe in 2*RTT for SACK capable connections * not in loss recovery, that are either limited by cwnd or application. */
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit c7a492db1f7c37c758a66915908677bd8bc5d368 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit e7d2ef837e14a971a05f60ea08c47f3fed1a36e4 ]
While reading sysctl_tcp_recovery, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 3 ++- net/ipv4/tcp_recovery.c | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2dcc98c4950c..f07c2175b225 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1942,7 +1942,8 @@ static inline void tcp_init_undo(struct tcp_sock *tp)
static bool tcp_is_rack(const struct sock *sk) { - return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION; + return READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_recovery) & + TCP_RACK_LOSS_DETECTION; }
/* If we detect SACK reneging, forget all SACK information diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 0d96decba13d..61969bb9395c 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -33,7 +33,8 @@ static u32 tcp_rack_reo_wnd(const struct sock *sk) return 0;
if (tp->sacked_out >= tp->reordering && - !(sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_NO_DUPTHRESH)) + !(READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_recovery) & + TCP_RACK_NO_DUPTHRESH)) return 0; }
@@ -203,7 +204,8 @@ void tcp_rack_update_reo_wnd(struct sock *sk, struct rate_sample *rs) { struct tcp_sock *tp = tcp_sk(sk);
- if (sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_STATIC_REO_WND || + if ((READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_recovery) & + TCP_RACK_STATIC_REO_WND) || !rs->prior_delivered) return;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit f4b0295be9a3c4260de4585fac4062e602a88ac7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 7c6f2a86ca590d5187a073d987e9599985fb1c7c ]
While reading sysctl_tcp_thin_linear_timeouts, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index e905fff09fea..84069db0423a 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -545,7 +545,7 @@ void tcp_retransmit_timer(struct sock *sk) * linear-timeout retransmissions into a black hole */ if (sk->sk_state == TCP_ESTABLISHED && - (tp->thin_lto || net->ipv4.sysctl_tcp_thin_linear_timeouts) && + (tp->thin_lto || READ_ONCE(net->ipv4.sysctl_tcp_thin_linear_timeouts)) && tcp_stream_is_thin(tp) && icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) { icsk->icsk_backoff = 0;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 3b26e11b07a09b31247688bec61e2925d4a571b6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 4845b5713ab18a1bb6e31d1fbb4d600240b8b691 ]
While reading sysctl_tcp_slow_start_after_idle, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 35089bb203f4 ("[TCP]: Add tcp_slow_start_after_idle sysctl.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/tcp.h | 4 ++-- net/ipv4/tcp_output.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index d633f98b40f9..5a841786988d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1341,8 +1341,8 @@ static inline void tcp_slow_start_after_idle_check(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); s32 delta;
- if (!sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle || tp->packets_out || - ca_ops->cong_control) + if (!READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle) || + tp->packets_out || ca_ops->cong_control) return; delta = tcp_jiffies32 - tp->lsndtime; if (delta > inet_csk(sk)->icsk_rto) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 45ee2dbc2340..3ab8c8c72beb 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1711,7 +1711,7 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited) if (tp->packets_out > tp->snd_cwnd_used) tp->snd_cwnd_used = tp->packets_out;
- if (sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle && + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle) && (s32)(tcp_jiffies32 - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto && !ca_ops->cong_control) tcp_cwnd_application_limited(sk);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit d4a90eb32681c79a140ff2600b52fce291cf4e91 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 1a63cb91f0c2fcdeced6d6edee8d1d886583d139 ]
While reading sysctl_tcp_retrans_collapse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 3ab8c8c72beb..d6d979c127fe 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2869,7 +2869,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, struct sk_buff *skb = to, *tmp; bool first = true;
- if (!sock_net(sk)->ipv4.sysctl_tcp_retrans_collapse) + if (!READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_retrans_collapse)) return; if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN) return;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit e570964c9f36fa386db95cc776d2e1dc16e69496 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 4e08ed41cb1194009fc1a916a59ce3ed4afd77cd ]
While reading sysctl_tcp_stdurg, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f07c2175b225..bd9386584346 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5327,7 +5327,7 @@ static void tcp_check_urg(struct sock *sk, const struct tcphdr *th) struct tcp_sock *tp = tcp_sk(sk); u32 ptr = ntohs(th->urg_ptr);
- if (ptr && !sock_net(sk)->ipv4.sysctl_tcp_stdurg) + if (ptr && !READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_stdurg)) ptr--; ptr += ntohl(th->seq);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit f7b3610619ec123f36b14bacad540d59a5a76526 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 0b484c91911e758e53656d570de58c2ed81ec6f2 ]
While reading sysctl_tcp_rfc1337, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_minisocks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index d71c01a9c1e6..e2dbc9ab892a 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -179,7 +179,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, * Oh well... nobody has a sufficient solution to this * protocol bug yet. */ - if (twsk_net(tw)->ipv4.sysctl_tcp_rfc1337 == 0) { + if (!READ_ONCE(twsk_net(tw)->ipv4.sysctl_tcp_rfc1337)) { kill: inet_twsk_deschedule_put(tw); return TCP_TW_SUCCESS;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.254 commit 5e38cee24f19d19280c68f1ac8bf6790d607f60a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit a11e5b3e7a59fde1a90b0eaeaa82320495cf8cae ]
While reading sysctl_tcp_max_reordering, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: dca145ffaa8d ("tcp: allow for bigger reordering level") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bd9386584346..bfdac3ec5444 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -893,7 +893,7 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq, tp->undo_marker ? tp->undo_retrans : 0); #endif tp->reordering = min_t(u32, (metric + mss - 1) / mss, - sock_net(sk)->ipv4.sysctl_tcp_max_reordering); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_max_reordering)); }
/* This exciting event is worth to be remembered. 8) */ @@ -1878,7 +1878,7 @@ static void tcp_check_reno_reordering(struct sock *sk, const int addend) return;
tp->reordering = min_t(u32, tp->packets_out + addend, - sock_net(sk)->ipv4.sysctl_tcp_max_reordering); + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_max_reordering)); tp->reord_seen++; NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRENOREORDER); }
From: Wang Cheng wanngchenng@gmail.com
stable inclusion from stable-v4.19.254 commit 13d51565cec1aa432a6ab363edc2bbc53c6f49cb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 018160ad314d75b1409129b2247b614a9f35894c upstream.
mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when pol->mode is MPOL_LOCAL. Check pol->mode before access pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).
BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline] BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368 mpol_rebind_policy mm/mempolicy.c:352 [inline] mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368 cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline] cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278 cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515 cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline] cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804 __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520 cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539 cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852 kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296 call_write_iter include/linux/fs.h:2162 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0x1318/0x2030 fs/read_write.c:590 ksys_write+0x28b/0x510 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was created at: slab_post_alloc_hook mm/slab.h:524 [inline] slab_alloc_node mm/slub.c:3251 [inline] slab_alloc mm/slub.c:3259 [inline] kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264 mpol_new mm/mempolicy.c:293 [inline] do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853 kernel_set_mempolicy mm/mempolicy.c:1504 [inline] __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline] __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507 __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae
KMSAN: uninit-value in mpol_rebind_task (2) https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59d...
This patch seems to fix below bug too. KMSAN: uninit-value in mpol_rebind_mm (2) https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926...
The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy(). When syzkaller reproducer runs to the beginning of mpol_new(),
mpol_new() mm/mempolicy.c do_mbind() mm/mempolicy.c kernel_mbind() mm/mempolicy.c
`mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags` is 0. Then
mode = MPOL_LOCAL; ... policy->mode = mode; policy->flags = flags;
will be executed. So in mpol_set_nodemask(),
mpol_set_nodemask() mm/mempolicy.c do_mbind() kernel_mbind()
pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized, which will be accessed in mpol_rebind_policy().
Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomain Signed-off-by: Wang Cheng wanngchenng@gmail.com Reported-by: syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com Tested-by: syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com Cc: David Rientjes rientjes@google.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/mempolicy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f06ee5a5558a..42f71113698d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -387,7 +387,7 @@ static void mpol_rebind_preferred(struct mempolicy *pol, */ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask) { - if (!pol) + if (!pol || pol->mode == MPOL_LOCAL) return; if (!mpol_store_user_nodemask(pol) && !(pol->flags & MPOL_F_LOCAL) && nodes_equal(pol->w.cpuset_mems_allowed, *newmask))
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 62acabe02a2b1a65be563e1a8825f80dcb059aed category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 58ebb1c8b35a8ef38cd6927431e0fa7b173a632d upstream.
While reading sysctl_tcp_dsack, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bfdac3ec5444..2f50056e7221 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4232,7 +4232,7 @@ static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq) { struct tcp_sock *tp = tcp_sk(sk);
- if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) { int mib_idx;
if (before(seq, tp->rcv_nxt)) @@ -4267,7 +4267,7 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb) NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST); tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS);
- if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) { u32 end_seq = TCP_SKB_CB(skb)->end_seq;
if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 4b4d0f7d204eca946881d7233b131cec0b00e889 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 02ca527ac5581cf56749db9fd03d854e842253dd upstream.
While reading sysctl_tcp_app_win, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2f50056e7221..ddf00a2394fc 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -432,7 +432,7 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) */ void tcp_init_buffer_space(struct sock *sk) { - int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win; + int tcp_app_win = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_app_win); struct tcp_sock *tp = tcp_sk(sk); int maxwin;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 3e39da4423c6feaab42e70cbeded4ec63f2b9ff3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 36eeee75ef0157e42fb6593dcc65daab289b559e upstream.
While reading sysctl_tcp_adv_win_scale, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 5a841786988d..e2c09ab2e1f8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1357,7 +1357,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space,
static inline int tcp_win_from_space(const struct sock *sk, int space) { - int tcp_adv_win_scale = sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale; + int tcp_adv_win_scale = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);
return tcp_adv_win_scale <= 0 ? (space>>(-tcp_adv_win_scale)) :
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 79ea5b5190d5ba107c31f0f6490dac2c90fc9913 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 706c6202a3589f290e1ef9be0584a8f4a3cc0507 upstream.
While reading sysctl_tcp_frto, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ddf00a2394fc..c2644fe52a77 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2022,7 +2022,7 @@ void tcp_enter_loss(struct sock *sk) * loss recovery is underway except recurring timeout(s) on * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing */ - tp->frto = net->ipv4.sysctl_tcp_frto && + tp->frto = READ_ONCE(net->ipv4.sysctl_tcp_frto) && (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 5e08afbdfba71b1ada34d379b84688c84cbdcab9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 8499a2454d9e8a55ce616ede9f9580f36fd5b0f3 upstream.
While reading sysctl_tcp_nometrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_metrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 61843c6d7a47..4960e2b6bd7f 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -329,7 +329,7 @@ void tcp_update_metrics(struct sock *sk) int m;
sk_dst_confirm(sk); - if (net->ipv4.sysctl_tcp_nometrics_save || !dst) + if (READ_ONCE(net->ipv4.sysctl_tcp_nometrics_save) || !dst) return;
rcu_read_lock();
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 3bf8415f605526a7eea9f3f7d7e30d8867b2cd1e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit db3815a2fa691da145cfbe834584f31ad75df9ff upstream.
While reading sysctl_tcp_challenge_ack_limit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c2644fe52a77..c018969bf6b9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3471,7 +3471,7 @@ static void tcp_send_challenge_ack(struct sock *sk, const struct sk_buff *skb) /* Then check host-wide RFC 5961 rate limit. */ now = jiffies / HZ; if (now != challenge_timestamp) { - u32 ack_limit = net->ipv4.sysctl_tcp_challenge_ack_limit; + u32 ack_limit = READ_ONCE(net->ipv4.sysctl_tcp_challenge_ack_limit); u32 half = (ack_limit + 1) >> 1;
challenge_timestamp = now;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 7cbb889748931d6552c8a8f0fe7f8cbb2edfc1c1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit e27326009a3d247b831eda38878c777f6f4eb3d1 upstream.
When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr; char data[24] = {0}; int fd;
hdr = (struct ipv6_sr_hdr *)data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647
Also, there could be another path reported with these options, and some of them require CAP_NET_RAW.
setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta ayudutta@amazon.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/ping.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 5c9be8594483..23ae01715e7b 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -27,6 +27,11 @@ #include <linux/proc_fs.h> #include <net/ping.h>
+static void ping_v6_destroy(struct sock *sk) +{ + inet6_destroy_sock(sk); +} + /* Compatibility glue so we can support IPv6 when it's compiled as a module */ static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) @@ -170,6 +175,7 @@ struct proto pingv6_prot = { .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, + .destroy = ping_v6_destroy, .connect = ip6_datagram_connect_v6_only, .disconnect = __udp_disconnect, .setsockopt = ipv6_setsockopt,
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 9eeb3a7702998bdccbfcc37997b5dd9215b9a7f7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 8ebcc62c738f68688ee7c6fec2efe5bc6d3d7e60 ]
While reading sysctl_igmp_qrv, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next.
qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);
Fixes: a9fe8e29945d ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/igmp.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 670402e7b49b..0e95260bd099 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -831,7 +831,7 @@ static void igmp_ifc_event(struct in_device *in_dev) struct net *net = dev_net(in_dev->dev); if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) return; - WRITE_ONCE(in_dev->mr_ifc_count, in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv); + WRITE_ONCE(in_dev->mr_ifc_count, in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv)); igmp_ifc_start_timer(in_dev, 1); }
@@ -1013,7 +1013,7 @@ static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb, * received value was zero, use the default or statically * configured value. */ - in_dev->mr_qrv = ih3->qrv ?: net->ipv4.sysctl_igmp_qrv; + in_dev->mr_qrv = ih3->qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); in_dev->mr_qi = IGMPV3_QQIC(ih3->qqic)*HZ ?: IGMP_QUERY_INTERVAL;
/* RFC3376, 8.3. Query Response Interval: @@ -1192,7 +1192,7 @@ static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im) pmc->interface = im->interface; in_dev_hold(in_dev); pmc->multiaddr = im->multiaddr; - pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + pmc->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); pmc->sfmode = im->sfmode; if (pmc->sfmode == MCAST_INCLUDE) { struct ip_sf_list *psf; @@ -1243,9 +1243,11 @@ static void igmpv3_del_delrec(struct in_device *in_dev, struct ip_mc_list *im) swap(im->tomb, pmc->tomb); swap(im->sources, pmc->sources); for (psf = im->sources; psf; psf = psf->sf_next) - psf->sf_crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + psf->sf_crcount = in_dev->mr_qrv ?: + READ_ONCE(net->ipv4.sysctl_igmp_qrv); } else { - im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + im->crcount = in_dev->mr_qrv ?: + READ_ONCE(net->ipv4.sysctl_igmp_qrv); } in_dev_put(pmc->interface); kfree_pmc(pmc); @@ -1347,7 +1349,7 @@ static void igmp_group_added(struct ip_mc_list *im) if (in_dev->dead) return;
- im->unsolicit_count = net->ipv4.sysctl_igmp_qrv; + im->unsolicit_count = READ_ONCE(net->ipv4.sysctl_igmp_qrv); if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) { spin_lock_bh(&im->lock); igmp_start_timer(im, IGMP_INITIAL_REPORT_DELAY); @@ -1361,7 +1363,7 @@ static void igmp_group_added(struct ip_mc_list *im) * IN() to IN(A). */ if (im->sfmode == MCAST_EXCLUDE) - im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + im->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);
igmp_ifc_event(in_dev); #endif @@ -1769,7 +1771,7 @@ static void ip_mc_reset(struct in_device *in_dev)
in_dev->mr_qi = IGMP_QUERY_INTERVAL; in_dev->mr_qri = IGMP_QUERY_RESPONSE_INTERVAL; - in_dev->mr_qrv = net->ipv4.sysctl_igmp_qrv; + in_dev->mr_qrv = READ_ONCE(net->ipv4.sysctl_igmp_qrv); } #else static void ip_mc_reset(struct in_device *in_dev) @@ -1903,7 +1905,7 @@ static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode, #ifdef CONFIG_IP_MULTICAST if (psf->sf_oldin && !IGMP_V1_SEEN(in_dev) && !IGMP_V2_SEEN(in_dev)) { - psf->sf_crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + psf->sf_crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); psf->sf_next = pmc->tomb; pmc->tomb = psf; rv = 1; @@ -1967,7 +1969,7 @@ static int ip_mc_del_src(struct in_device *in_dev, __be32 *pmca, int sfmode, /* filter mode change */ pmc->sfmode = MCAST_INCLUDE; #ifdef CONFIG_IP_MULTICAST - pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + pmc->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); WRITE_ONCE(in_dev->mr_ifc_count, pmc->crcount); for (psf = pmc->sources; psf; psf = psf->sf_next) psf->sf_crcount = 0; @@ -2146,7 +2148,7 @@ static int ip_mc_add_src(struct in_device *in_dev, __be32 *pmca, int sfmode, #ifdef CONFIG_IP_MULTICAST /* else no filters; keep old mode for reports */
- pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + pmc->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); WRITE_ONCE(in_dev->mr_ifc_count, pmc->crcount); for (psf = pmc->sources; psf; psf = psf->sf_next) psf->sf_crcount = 0;
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 15085721749ad8d3063a54252d974a24a9751c15 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit e0bb4ab9dfddd872622239f49fb2bd403b70853b ]
While reading sysctl_tcp_min_tso_segs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d6d979c127fe..b61d0d8e4374 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1798,7 +1798,7 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
min_tso = ca_ops->min_tso_segs ? ca_ops->min_tso_segs(sk) : - sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs; + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
tso_segs = tcp_tso_autosize(sk, mss_now, min_tso); return min_t(u32, tso_segs, sk->sk_gso_max_segs);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 3a73df9db9297ada3e58776810c8e52332f3f1a0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 1330ffacd05fc9ac4159d19286ce119e22450ed2 ]
While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c018969bf6b9..1cdfe11ec1f2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2906,7 +2906,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag) { - u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ; + u32 wlen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen) * HZ; struct tcp_sock *tp = tcp_sk(sk);
if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) {
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 5e3d32bb5884cd0c992542f7a887e26d4118b415 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 85225e6f0a76e6745bc841c9f25169c509b573d8 ]
While reading sysctl_tcp_autocorking, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ec6015f1c213..325a1dafbd90 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -711,7 +711,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb, int size_goal) { return skb->len < size_goal && - sock_net(sk)->ipv4.sysctl_tcp_autocorking && + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_autocorking) && !tcp_rtx_queue_empty(sk) && refcount_read(&sk->sk_wmem_alloc) > skb->truesize; }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 69eab4c78d7a366d75de3d07e0a4a9e7c81291c1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 2afdbe7b8de84c28e219073a6661080e1b3ded48 ]
While reading sysctl_tcp_invalid_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1cdfe11ec1f2..02127a6fa2e6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3423,7 +3423,8 @@ static bool __tcp_oow_rate_limited(struct net *net, int mib_idx, if (*last_oow_ack_time) { s32 elapsed = (s32)(tcp_jiffies32 - *last_oow_ack_time);
- if (0 <= elapsed && elapsed < net->ipv4.sysctl_tcp_invalid_ratelimit) { + if (0 <= elapsed && + elapsed < READ_ONCE(net->ipv4.sysctl_tcp_invalid_ratelimit)) { NET_INC_STATS(net, mib_idx); return true; /* rate-limited: don't send yet! */ }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 32917d384b89836e593aa8ef795508257fd7d102 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 4866b2b0f7672b6d760c4b8ece6fb56f965dcc8a ]
While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 02127a6fa2e6..01278fdd981f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5297,7 +5297,8 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) if (tp->srtt_us && tp->srtt_us < rtt) rtt = tp->srtt_us;
- delay = min_t(unsigned long, sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns, + delay = min_t(unsigned long, + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns), rtt * (NSEC_PER_USEC >> 3)/20); sock_hold(sk); hrtimer_start(&tp->compressed_ack_timer, ns_to_ktime(delay),
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.255 commit 37ef2ea54885ffebc45fae73f04352bab0ffb367 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 79f55473bfc8ac51bd6572929a679eeb4da22251 ]
While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 01278fdd981f..921e09c7d716 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5274,7 +5274,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) }
if (!tcp_is_sack(tp) || - tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr) + tp->compressed_ack >= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)) goto send_now;
if (tp->compressed_ack_rcv_nxt != tp->rcv_nxt) {
From: David Howells dhowells@redhat.com
stable inclusion from stable-v4.19.256 commit 76a633ff35b6434f9c93b87d5dcbc655bb52246a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit e2ebff9c57fe4eb104ce4768f6ebcccf76bef849 upstream.
If something manages to set the maximum file size to MAX_OFFSET+1, this can cause the xfs and ext4 filesystems at least to become corrupt.
Ordinarily, the kernel protects against userspace trying this by checking the value early in the truncate() and ftruncate() system calls calls - but there are at least two places that this check is bypassed:
(1) Cachefiles will round up the EOF of the backing file to DIO block size so as to allow DIO on the final block - but this might push the offset negative. It then calls notify_change(), but this inadvertently bypasses the checking. This can be triggered if someone puts an 8EiB-1 file on a server for someone else to try and access by, say, nfs.
(2) ksmbd doesn't check the value it is given in set_end_of_file_info() and then calls vfs_truncate() directly - which also bypasses the check.
In both cases, it is potentially possible for a network filesystem to cause a disk filesystem to be corrupted: cachefiles in the client's cache filesystem; ksmbd in the server's filesystem.
nfsd is okay as it checks the value, but we can then remove this check too.
Fix this by adding a check to inode_newsize_ok(), as called from setattr_prepare(), thereby catching the issue as filesystems set up to perform the truncate with minimal opportunity for bypassing the new check.
Fixes: 1f08c925e7a3 ("cachefiles: Implement backing file wrangling") Fixes: f44158485826 ("cifsd: add file operations") Signed-off-by: David Howells dhowells@redhat.com Reported-by: Jeff Layton jlayton@kernel.org Tested-by: Jeff Layton jlayton@kernel.org Reviewed-by: Namjae Jeon linkinjeon@kernel.org Cc: stable@kernel.org Acked-by: Alexander Viro viro@zeniv.linux.org.uk cc: Steve French sfrench@samba.org cc: Hyunchul Lee hyc.lee@gmail.com cc: Chuck Lever chuck.lever@oracle.com cc: Dave Wysochanski dwysocha@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/attr.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/attr.c b/fs/attr.c index d22e8187477f..4d2541c1e68c 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -134,6 +134,8 @@ EXPORT_SYMBOL(setattr_prepare); */ int inode_newsize_ok(const struct inode *inode, loff_t offset) { + if (offset < 0) + return -EINVAL; if (inode->i_size < offset) { unsigned long limit;
From: Yang Xu xuyang2018.jy@fujitsu.com
stable inclusion from stable-v4.19.256 commit a7abfbc07dedb69840a81f40f21696390436fa0e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit ac6800e279a22b28f4fc21439843025a0d5bf03e upstream.
All creation paths except for O_TMPFILE handle umask in the vfs directly if the filesystem doesn't support or enable POSIX ACLs. If the filesystem does then umask handling is deferred until posix_acl_create(). Because, O_TMPFILE misses umask handling in the vfs it will not honor umask settings. Fix this by adding the missing umask handling.
Link: https://lore.kernel.org/r/1657779088-2242-2-git-send-email-xuyang2018.jy@fuj... Fixes: 60545d0d4610 ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...") Cc: stable@vger.kernel.org # 4.19+ Reported-by: Christian Brauner (Microsoft) brauner@kernel.org Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-and-Tested-by: Jeff Layton jlayton@kernel.org Acked-by: Christian Brauner (Microsoft) brauner@kernel.org Signed-off-by: Yang Xu xuyang2018.jy@fujitsu.com Signed-off-by: Christian Brauner (Microsoft) brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/namei.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/namei.c b/fs/namei.c index eeb2c064dd46..fb87793e9adf 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3460,6 +3460,8 @@ struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag) child = d_alloc(dentry, &slash_name); if (unlikely(!child)) goto out_err; + if (!IS_POSIXACL(dir)) + mode &= ~current_umask(); error = dir->i_op->tmpfile(dir, child, mode); if (error) goto out_err;
From: huhai huhai@kylinos.cn
stable inclusion from stable-v4.19.256 commit 7566bc6556185f14a9b8c42940b7ebd6b2d4aad6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit b4f1f61ed5928b1128e60e38d0dffa16966f06dc ]
register_device_clock() misses a check for platform_device_register_simple(). Add a check to fix it.
Signed-off-by: huhai huhai@kylinos.cn Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/acpi/acpi_lpss.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 30ccd94f87d2..b9d407135f94 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -401,6 +401,9 @@ static int register_device_clock(struct acpi_device *adev, if (!lpss_clk_dev) lpt_register_clock_device();
+ if (IS_ERR(lpss_clk_dev)) + return PTR_ERR(lpss_clk_dev); + clk_data = platform_get_drvdata(lpss_clk_dev); if (!clk_data) return -ENODEV;
From: Xiu Jianfeng xiujianfeng@huawei.com
stable inclusion from stable-v4.19.256 commit 477722f31ad73aa779154d1d7e00825538389f76 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 15ec76fb29be31df2bccb30fc09875274cba2776 ]
Just like next_entry(), boundary check is necessary to prevent memory out-of-bound access.
Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- security/selinux/ss/policydb.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h index 215f8f30ac5a..2a479785ebd4 100644 --- a/security/selinux/ss/policydb.h +++ b/security/selinux/ss/policydb.h @@ -360,6 +360,8 @@ static inline int put_entry(const void *buf, size_t bytes, int num, struct polic { size_t len = bytes * num;
+ if (len > fp->len) + return -EINVAL; memcpy(fp->data, buf, len); fp->data += len; fp->len -= len;
From: Johan Hovold johan@kernel.org
stable inclusion from stable-v4.19.256 commit e2b45e82e453c01ffe97308aad81ac44c4bbd019 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 229e73d46994f15314f58b2d39bf952111d89193 ]
Make sure to free the platform device in the unlikely event that registration fails.
Fixes: 7a67832c7e44 ("libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option") Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Borislav Petkov bp@suse.de Link: https://lore.kernel.org/r/20220620140723.9810-1-johan@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/x86/kernel/pmem.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c index 6b07faaa1579..23154d24b117 100644 --- a/arch/x86/kernel/pmem.c +++ b/arch/x86/kernel/pmem.c @@ -27,6 +27,11 @@ static __init int register_e820_pmem(void) * simply here to trigger the module to load on demand. */ pdev = platform_device_alloc("e820_pmem", -1); - return platform_device_add(pdev); + + rc = platform_device_add(pdev); + if (rc) + platform_device_put(pdev); + + return rc; } device_initcall(register_e820_pmem);
From: Yang Yingliang yangyingliang@huawei.com
stable inclusion from stable-v4.19.256 commit f2a1ab0222865d7d6779136f256425d02e62aad0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 54872fea6a5ac967ec2272aea525d1438ac6735a ]
In error case in hisi_lpc_acpi_probe() after calling platform_device_add(), hisi_lpc_acpi_remove() can't release the failed 'pdev', so it will be leak, call platform_device_put() to fix this problem. I'v constructed this error case and tested this patch on D05 board.
Fixes: 99c0228d6ff1 ("HISI LPC: Re-Add ACPI child enumeration support") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Acked-by: John Garry john.garry@huawei.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/bus/hisi_lpc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/bus/hisi_lpc.c b/drivers/bus/hisi_lpc.c index 20c957185af2..75c61c528a1c 100644 --- a/drivers/bus/hisi_lpc.c +++ b/drivers/bus/hisi_lpc.c @@ -481,13 +481,13 @@ static int hisi_lpc_acpi_probe(struct device *hostdev) { struct acpi_device *adev = ACPI_COMPANION(hostdev); struct acpi_device *child; + struct platform_device *pdev; int ret;
/* Only consider the children of the host */ list_for_each_entry(child, &adev->children, node) { const char *hid = acpi_device_hid(child); const struct hisi_lpc_acpi_cell *cell; - struct platform_device *pdev; const struct resource *res; bool found = false; int num_res; @@ -549,22 +549,24 @@ static int hisi_lpc_acpi_probe(struct device *hostdev)
ret = platform_device_add_resources(pdev, res, num_res); if (ret) - goto fail; + goto fail_put_device;
ret = platform_device_add_data(pdev, cell->pdata, cell->pdata_size); if (ret) - goto fail; + goto fail_put_device;
ret = platform_device_add(pdev); if (ret) - goto fail; + goto fail_put_device;
acpi_device_set_enumerated(child); }
return 0;
+fail_put_device: + platform_device_put(pdev); fail: hisi_lpc_acpi_remove(hostdev); return ret;
From: Nicolas Saenz Julienne nsaenzju@redhat.com
stable inclusion from stable-v4.19.256 commit 952146183df68a7fb29c3c29d56ffdc941fbdc39 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 5c66d1b9b30f737fcef85a0b75bfe0590e16b62a ]
dequeue_task_rt() only decrements 'rt_rq->rt_nr_running' after having called sched_update_tick_dependency() preventing it from re-enabling the tick on systems that no longer have pending SCHED_RT tasks but have multiple runnable SCHED_OTHER tasks:
dequeue_task_rt() dequeue_rt_entity() dequeue_rt_stack() dequeue_top_rt_rq() sub_nr_running() // decrements rq->nr_running sched_update_tick_dependency() sched_can_stop_tick() // checks rq->rt.rt_nr_running, ... __dequeue_rt_entity() dec_rt_tasks() // decrements rq->rt.rt_nr_running ...
Every other scheduler class performs the operation in the opposite order, and sched_update_tick_dependency() expects the values to be updated as such. So avoid the misbehaviour by inverting the order in which the above operations are performed in the RT scheduler.
Fixes: 76d92ac305f2 ("sched: Migrate sched to use new tick dependency mask model") Signed-off-by: Nicolas Saenz Julienne nsaenzju@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider vschneid@redhat.com Reviewed-by: Phil Auld pauld@redhat.com Link: https://lore.kernel.org/r/20220628092259.330171-1-nsaenzju@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/sched/rt.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 301ba04d9130..a500bd78199b 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -434,7 +434,7 @@ static inline void rt_queue_push_tasks(struct rq *rq) #endif /* CONFIG_SMP */
static void enqueue_top_rt_rq(struct rt_rq *rt_rq); -static void dequeue_top_rt_rq(struct rt_rq *rt_rq); +static void dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count);
static inline int on_rt_rq(struct sched_rt_entity *rt_se) { @@ -516,7 +516,7 @@ static void sched_rt_rq_dequeue(struct rt_rq *rt_rq) rt_se = rt_rq->tg->rt_se[cpu];
if (!rt_se) { - dequeue_top_rt_rq(rt_rq); + dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running); /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ cpufreq_update_util(rq_of_rt_rq(rt_rq), 0); } @@ -602,7 +602,7 @@ static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq) { - dequeue_top_rt_rq(rt_rq); + dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running); }
static inline int rt_rq_throttled(struct rt_rq *rt_rq) @@ -1000,7 +1000,7 @@ static void update_curr_rt(struct rq *rq) }
static void -dequeue_top_rt_rq(struct rt_rq *rt_rq) +dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count) { struct rq *rq = rq_of_rt_rq(rt_rq);
@@ -1011,7 +1011,7 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)
BUG_ON(!rq->nr_running);
- sub_nr_running(rq, rt_rq->rt_nr_running); + sub_nr_running(rq, count); rt_rq->rt_queued = 0;
} @@ -1290,18 +1290,21 @@ static void __dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag static void dequeue_rt_stack(struct sched_rt_entity *rt_se, unsigned int flags) { struct sched_rt_entity *back = NULL; + unsigned int rt_nr_running;
for_each_sched_rt_entity(rt_se) { rt_se->back = back; back = rt_se; }
- dequeue_top_rt_rq(rt_rq_of_se(back)); + rt_nr_running = rt_rq_of_se(back)->rt_nr_running;
for (rt_se = back; rt_se; rt_se = rt_se->back) { if (on_rt_rq(rt_se)) __dequeue_rt_entity(rt_se, flags); } + + dequeue_top_rt_rq(rt_rq_of_se(back), rt_nr_running); }
static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
From: Yonglong Li liyonglong@chinatelecom.cn
stable inclusion from stable-v4.19.256 commit 05c6ca8d701189cb0d0520fe59b2b4b25e44d9d4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 536a6c8e05f95e3d1118c40ae8b3022ee2d05d52 ]
current code of __tcp_retransmit_skb only check TCP_SKB_CB(skb)->seq in send window, and TCP_SKB_CB(skb)->seq_end maybe out of send window. If receiver has shrunk his window, and skb is out of new window, it should retransmit a smaller portion of the payload.
test packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8> +.05 < S. 0:0(0) ack 1 win 6000 <mss 1000,nop,nop,sackOK> +0 > . 1:1(0) ack 1
+0 write(3, ..., 10000) = 10000
+0 > . 1:2001(2000) ack 1 win 65535 +0 > . 2001:4001(2000) ack 1 win 65535 +0 > . 4001:6001(2000) ack 1 win 65535
+.05 < . 1:1(0) ack 4001 win 1001
and tcpdump show: 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 1:2001, ack 1, win 65535, length 2000 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 2001:4001, ack 1, win 65535, length 2000 192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000 192.0.2.1.8080 > 192.168.226.67.55: Flags [.], ack 4001, win 1001, length 0 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000 192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000
when cient retract window to 1001, send window is [4001,5002], but TLP send 5001-6001 packet which is out of send window.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yonglong Li liyonglong@chinatelecom.cn Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/1657532838-20200-1-git-send-email-liyonglong@china... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b61d0d8e4374..e1760b125b09 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2909,7 +2909,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) struct tcp_sock *tp = tcp_sk(sk); unsigned int cur_mss; int diff, len, err; - + int avail_wnd;
/* Inconclusive MTU probe */ if (icsk->icsk_mtup.probe_size) @@ -2939,17 +2939,25 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) return -EHOSTUNREACH; /* Routing failure or similar. */
cur_mss = tcp_current_mss(sk); + avail_wnd = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;
/* If receiver has shrunk his window, and skb is out of * new window, do not retransmit it. The exception is the * case, when window is shrunk to zero. In this case - * our retransmit serves as a zero window probe. + * our retransmit of one segment serves as a zero window probe. */ - if (!before(TCP_SKB_CB(skb)->seq, tcp_wnd_end(tp)) && - TCP_SKB_CB(skb)->seq != tp->snd_una) - return -EAGAIN; + if (avail_wnd <= 0) { + if (TCP_SKB_CB(skb)->seq != tp->snd_una) + return -EAGAIN; + avail_wnd = cur_mss; + }
len = cur_mss * segs; + if (len > avail_wnd) { + len = rounddown(avail_wnd, cur_mss); + if (!len) + len = avail_wnd; + } if (skb->len > len) { if (tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb, len, cur_mss, GFP_ATOMIC)) @@ -2963,8 +2971,9 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) diff -= tcp_skb_pcount(skb); if (diff) tcp_adjust_pcount(sk, skb, diff); - if (skb->len < cur_mss) - tcp_retrans_try_collapse(sk, skb, cur_mss); + avail_wnd = min_t(int, avail_wnd, cur_mss); + if (skb->len < avail_wnd) + tcp_retrans_try_collapse(sk, skb, avail_wnd); }
/* RFC3168, section 6.1.1.1. ECN fallback */
From: "Jason A. Donenfeld" Jason@zx2c4.com
stable inclusion from stable-v4.19.256 commit ab37175dd3593b9098f2242e370a7b1af4c35368 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 97ef77c52b789ec1411d360ed99dca1efe4b2c81 ]
The original direct splicing mechanism from Jens required the input to be a regular file because it was avoiding the special socket case. It also recognized blkdevs as being close enough to a regular file. But it forgot about chardevs, which behave the same way and work fine here.
This is an okayish heuristic, but it doesn't totally work. For example, a few chardevs should be spliceable here. And a few regular files shouldn't. This patch fixes this by instead checking whether FMODE_LSEEK is set, which represents decently enough what we need rewinding for when splicing to internal pipes.
Fixes: b92ce5589374 ("[PATCH] splice: add direct fd <-> fd splicing support") Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/splice.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c index 230367f3df41..d71716f94e3c 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -899,17 +899,15 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, { struct pipe_inode_info *pipe; long ret, bytes; - umode_t i_mode; size_t len; int i, flags, more;
/* - * We require the input being a regular file, as we don't want to - * randomly drop data for eg socket -> socket splicing. Use the - * piped splicing for that! + * We require the input to be seekable, as we don't want to randomly + * drop data for eg socket -> socket splicing. Use the piped splicing + * for that! */ - i_mode = file_inode(in)->i_mode; - if (unlikely(!S_ISREG(i_mode) && !S_ISBLK(i_mode))) + if (unlikely(!(in->f_mode & FMODE_LSEEK))) return -EINVAL;
/*
From: Vincent Mailhol mailhol.vincent@wanadoo.fr
stable inclusion from stable-v4.19.256 commit e5ed6be01b2846c55a3e729a3d0a2d661853a606 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit e70a3263a7eed768d5f947b8f2aff8d2a79c9d97 ]
Currently, data[5..7] of struct can_frame, when used as a CAN error frame, are defined as being "controller specific". Device specific behaviours are problematic because it prevents someone from writing code which is portable between devices.
As a matter of fact, data[5] is never used, data[6] is always used to report TX error counter and data[7] is always used to report RX error counter. can-utils also relies on this.
This patch updates the comment in the uapi header to specify that data[5] is reserved (and thus should not be used) and that data[6..7] are used for error counters.
Fixes: 0d66548a10cb ("[CAN]: Add PF_CAN core module") Link: https://lore.kernel.org/all/20220719143550.3681-11-mailhol.vincent@wanadoo.f... Signed-off-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/uapi/linux/can/error.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/can/error.h b/include/uapi/linux/can/error.h index bfc4b5d22a5e..383f3d508a53 100644 --- a/include/uapi/linux/can/error.h +++ b/include/uapi/linux/can/error.h @@ -120,6 +120,9 @@ #define CAN_ERR_TRX_CANL_SHORT_TO_GND 0x70 /* 0111 0000 */ #define CAN_ERR_TRX_CANL_SHORT_TO_CANH 0x80 /* 1000 0000 */
-/* controller specific additional information / data[5..7] */ +/* data[5] is reserved (do not use) */ + +/* TX error counter / data[6] */ +/* RX error counter / data[7] */
#endif /* _UAPI_CAN_ERROR_H */
From: Duoming Zhou duoming@zju.edu.cn
stable inclusion from stable-v4.19.256 commit 867fe9100c1e87b4e2410176433afca6bc3e79cc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit a61528d997619a518ee8c51cf0ef0513021afaff ]
There is a deadlock between sm_release and sm_cache_flush_work which is a work item. The cancel_work_sync in sm_release will not return until sm_cache_flush_work is finished. If we hold mutex_lock and use cancel_work_sync to wait the work item to finish, the work item also requires mutex_lock. As a result, the sm_release will be blocked forever. The race condition is shown below:
(Thread 1) | (Thread 2) sm_release | mutex_lock(&ftl->mutex) | sm_cache_flush_work | mutex_lock(&ftl->mutex) cancel_work_sync | ...
This patch moves del_timer_sync and cancel_work_sync out of mutex_lock in order to mitigate deadlock.
Fixes: 7d17c02a01a1 ("mtd: Add new SmartMedia/xD FTL") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20220524044841.10517-1-duoming@zju.edu.cn Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/mtd/sm_ftl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/sm_ftl.c b/drivers/mtd/sm_ftl.c index f3bd86e13603..e57f7ba054bc 100644 --- a/drivers/mtd/sm_ftl.c +++ b/drivers/mtd/sm_ftl.c @@ -1091,9 +1091,9 @@ static void sm_release(struct mtd_blktrans_dev *dev) { struct sm_ftl *ftl = dev->priv;
- mutex_lock(&ftl->mutex); del_timer_sync(&ftl->timer); cancel_work_sync(&ftl->flush_work); + mutex_lock(&ftl->mutex); sm_cache_flush(ftl); mutex_unlock(&ftl->mutex); }
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
stable inclusion from stable-v4.19.256 commit 9ae20c4af2af4cb26292c8029fa990a8a8123c8c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 28607b426c3d050714f250d0faeb99d2e9106e90 ]
For all but one error path clk_disable_unprepare() is already there. Add it to the one location where it's missing.
Fixes: 481815a6193b ("mtd: st_spi_fsm: Handle clk_prepare_enable/clk_disable_unprepare.") Fixes: 69d5af8d016c ("mtd: st_spi_fsm: Obtain and use EMI clock") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20220607152458.232847-2-u.kleine-koenig@pe... Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/mtd/devices/st_spi_fsm.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/devices/st_spi_fsm.c b/drivers/mtd/devices/st_spi_fsm.c index 55d4a77f3b7f..533096c88ae1 100644 --- a/drivers/mtd/devices/st_spi_fsm.c +++ b/drivers/mtd/devices/st_spi_fsm.c @@ -2120,10 +2120,12 @@ static int stfsm_probe(struct platform_device *pdev) (long long)fsm->mtd.size, (long long)(fsm->mtd.size >> 20), fsm->mtd.erasesize, (fsm->mtd.erasesize >> 10));
- return mtd_device_register(&fsm->mtd, NULL, 0); - + ret = mtd_device_register(&fsm->mtd, NULL, 0); + if (ret) { err_clk_unprepare: - clk_disable_unprepare(fsm->clk); + clk_disable_unprepare(fsm->clk); + } + return ret; }
From: Miaohe Lin linmiaohe@huawei.com
stable inclusion from stable-v4.19.256 commit f84c69bbb3ccc544f0023c4dbbdf922c09b5d80b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 7f82f922319ede486540e8746769865b9508d2c2 ]
Since the beginning, charged is set to 0 to avoid calling vm_unacct_memory twice because vm_unacct_memory will be called by above unmap_region. But since commit 4f74d2c8e827 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces"), unmap_region doesn't call vm_unacct_memory anymore. So charged shouldn't be set to 0 now otherwise the calling to paired vm_unacct_memory will be missed and leads to imbalanced account.
Link: https://lkml.kernel.org/r/20220618082027.43391-1-linmiaohe@huawei.com Fixes: 4f74d2c8e827 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/mmap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c index ce3fba7c31cb..a2699bc10f7b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2105,7 +2105,6 @@ static unsigned long __mmap_region(struct mm_struct *mm, struct file *file,
/* Undo any partial mapping done by a device driver. */ unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); - charged = 0; if (vm_flags & VM_SHARED) mapping_unmap_writable(file->f_mapping); allow_write_and_free_vma:
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.256 commit c3ca8a752a422ba1f9580f2e1d41ad650252d8ac category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit af14f3007e2dca0d112f10f6717ba43093f74e81 ]
Make sure LSR flags are preserved in dw8250_tx_wait_empty(). This function is called from a low-level out function and therefore cannot call serial_lsr_in() as it would lead to infinite recursion.
It is borderline if the flags need to be saved here at all since this code relates to writing LCR register which usually implies no important characters should be arriving.
Fixes: 914eaf935ec7 ("serial: 8250_dw: Allow TX FIFO to drain before writing to UART_LCR") Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20220608095431.18376-7-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/serial/8250/8250_dw.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index c73d0eddd9b8..cc9d1f416db8 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -140,12 +140,15 @@ static void dw8250_check_lcr(struct uart_port *p, int value) /* Returns once the transmitter is empty or we run out of retries */ static void dw8250_tx_wait_empty(struct uart_port *p) { + struct uart_8250_port *up = up_to_u8250p(p); unsigned int tries = 20000; unsigned int delay_threshold = tries - 1000; unsigned int lsr;
while (tries--) { lsr = readb (p->membase + (UART_LSR << p->regshift)); + up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS; + if (lsr & UART_LSR_TEMT) break;
From: Chen Zhongjin chenzhongjin@huawei.com
stable inclusion from stable-v4.19.256 commit 8eb2fd54d106ca38ec452f1315cba3a56f4a8581 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 0fe6ee8f123a4dfb529a5aff07536bb481f34043 ]
2d186afd04d6 ("profiling: fix shift-out-of-bounds bugs") limits shift value by [0, BITS_PER_LONG -1], which means [0, 63].
However, syzbot found that the max shift value should be the bit number of (_etext - _stext). If shift is outside of this, the "buffer_bytes" will be zero and will cause kzalloc(0). Then the kernel panics due to dereferencing the returned pointer 16.
This can be easily reproduced by passing a large number like 60 to enable profiling and then run readprofile.
LOGS: BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 6148067 P4D 6148067 PUD 6142067 PMD 0 PREEMPT SMP CPU: 4 PID: 184 Comm: readprofile Not tainted 5.18.0+ #162 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_profile+0x104/0x220 RSP: 0018:ffffc900006fbe80 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888006150000 RSI: 0000000000000001 RDI: ffffffff82aba4a0 RBP: 000000000188bb60 R08: 0000000000000010 R09: ffff888006151000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82aba4a0 R13: 0000000000000000 R14: ffffc900006fbf08 R15: 0000000000020c30 FS: 000000000188a8c0(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000006144000 CR4: 00000000000006e0 Call Trace: <TASK> proc_reg_read+0x56/0x70 vfs_read+0x9a/0x1b0 ksys_read+0xa1/0xe0 ? fpregs_assert_state_consistent+0x1e/0x40 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x4d4b4e RSP: 002b:00007ffebb668d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000000000188a8a0 RCX: 00000000004d4b4e RDX: 0000000000000400 RSI: 000000000188bb60 RDI: 0000000000000003 RBP: 0000000000000003 R08: 000000000000006e R09: 0000000000000000 R10: 0000000000000041 R11: 0000000000000246 R12: 000000000188bb60 R13: 0000000000000400 R14: 0000000000000000 R15: 000000000188bb60 </TASK> Modules linked in: CR2: 0000000000000010 Killed ---[ end trace 0000000000000000 ]---
Check prof_len in profile_init() to prevent it be zero.
Link: https://lkml.kernel.org/r/20220531012854.229439-1-chenzhongjin@huawei.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/profile.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/kernel/profile.c b/kernel/profile.c index efa58f63dc1b..7fc621404230 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -108,6 +108,13 @@ int __ref profile_init(void)
/* only text is profiled */ prof_len = (_etext - _stext) >> prof_shift; + + if (!prof_len) { + pr_warn("profiling shift: %u too large\n", prof_shift); + prof_on = 0; + return -EINVAL; + } + buffer_bytes = prof_len*sizeof(atomic_t);
if (!alloc_cpumask_var(&prof_cpu_mask, GFP_KERNEL))
From: Dan Carpenter dan.carpenter@oracle.com
stable inclusion from stable-v4.19.256 commit 2cff35bb4573cbdbf1e0a405539dc9680004abe8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 045ed31e23aea840648c290dbde04797064960db ]
The kfifo_to_user() macro is supposed to return zero for success or negative error codes. Unfortunately, there is a signedness bug so it returns unsigned int. This only affects callers which try to save the result in ssize_t and as far as I can see the only place which does that is line6_hwdep_read().
TL;DR: s/_uint/_int/.
Link: https://lkml.kernel.org/r/YrVL3OJVLlNhIMFs@kili Fixes: 144ecf310eb5 ("kfifo: fix kfifo_alloc() to return a signed int value") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Cc: Stefani Seibold stefani@seibold.net Cc: Randy Dunlap randy.dunlap@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/kfifo.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h index 89fc8dc7bf38..ab9ff74818a4 100644 --- a/include/linux/kfifo.h +++ b/include/linux/kfifo.h @@ -629,7 +629,7 @@ __kfifo_uint_must_check_helper( \ * writer, you don't need extra locking to use these macro. */ #define kfifo_to_user(fifo, to, len, copied) \ -__kfifo_uint_must_check_helper( \ +__kfifo_int_must_check_helper( \ ({ \ typeof((fifo) + 1) __tmp = (fifo); \ void __user *__to = (to); \
From: Chen Zhongjin chenzhongjin@huawei.com
stable inclusion from stable-v4.19.256 commit 1c836bad43f3e2ff71cc397a6e6ccb4e7bd116f8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 28f6c37a2910f565b4f5960df52b2eccae28c891 ]
kernel_text_address() treats ftrace_trampoline, kprobe_insn_slot and bpf_text_address as valid kprobe addresses - which is not ideal.
These text areas are removable and changeable without any notification to kprobes, and probing on them can trigger unexpected behavior:
https://lkml.org/lkml/2022/7/26/1148
Considering that jump_label and static_call text are already forbiden to probe, kernel_text_address() should be replaced with core_kernel_text() and is_module_text_address() to check other text areas which are unsafe to kprobe.
[ mingo: Rewrote the changelog. ]
Fixes: 5b485629ba0d ("kprobes, extable: Identify kprobes trampolines as kernel text area") Fixes: 74451e66d516 ("bpf: make jited programs visible in traces") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/r/20220801033719.228248-1-chenzhongjin@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/kprobes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 05fef05263de..2be52a2d24ad 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1568,7 +1568,8 @@ static int check_kprobe_address_safe(struct kprobe *p, preempt_disable();
/* Ensure it is not in reserved area nor out of text */ - if (!kernel_text_address((unsigned long) p->addr) || + if (!(core_kernel_text((unsigned long) p->addr) || + is_module_text_address((unsigned long) p->addr)) || within_kprobe_blacklist((unsigned long) p->addr) || jump_label_text_reserved(p->addr, p->addr) || find_bug((unsigned long)p->addr)) {
From: Lukas Czerner lczerner@redhat.com
stable inclusion from stable-v4.19.256 commit 4816177c9f154594c8a841ed95b883eb07e4d6ce category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit b8a04fe77ef1360fbf73c80fddbdfeaa9407ed1b upstream.
ext4_append() must always allocate a new block, otherwise we run the risk of overwriting existing directory block corrupting the directory tree in the process resulting in all manner of problems later on.
Add a sanity check to see if the logical block is already allocated and error out if it is.
Cc: stable@kernel.org Signed-off-by: Lukas Czerner lczerner@redhat.com Reviewed-by: Andreas Dilger adilger@dilger.ca Link: https://lore.kernel.org/r/20220704142721.157985-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/namei.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 134fef1f6edc..b2d4fb82e8f4 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -53,6 +53,7 @@ static struct buffer_head *ext4_append(handle_t *handle, struct inode *inode, ext4_lblk_t *block) { + struct ext4_map_blocks map; struct buffer_head *bh; int err;
@@ -62,6 +63,21 @@ static struct buffer_head *ext4_append(handle_t *handle, return ERR_PTR(-ENOSPC);
*block = inode->i_size >> inode->i_sb->s_blocksize_bits; + map.m_lblk = *block; + map.m_len = 1; + + /* + * We're appending new directory block. Make sure the block is not + * allocated yet, otherwise we will end up corrupting the + * directory. + */ + err = ext4_map_blocks(NULL, inode, &map, 0); + if (err < 0) + return ERR_PTR(err); + if (err) { + EXT4_ERROR_INODE(inode, "Logical block already allocated"); + return ERR_PTR(-EFSCORRUPTED); + }
bh = ext4_bread(handle, inode, *block, EXT4_GET_BLOCKS_CREATE); if (IS_ERR(bh))
From: Theodore Ts'o tytso@mit.edu
stable inclusion from stable-v4.19.256 commit 38ac2511a56a74dfc6e0143bae58ef5fc4e318ea category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit de394a86658ffe4e89e5328fd4993abfe41b7435 upstream.
When doing an online resize, the on-disk superblock on-disk wasn't updated. This means that when the file system is unmounted and remounted, and the on-disk overhead value is non-zero, this would result in the results of statfs(2) to be incorrect.
This was partially fixed by Commits 10b01ee92df5 ("ext4: fix overhead calculation to account for the reserved gdt blocks"), 85d825dbf489 ("ext4: force overhead calculation if the s_overhead_cluster makes no sense"), and eb7054212eac ("ext4: update the cached overhead value in the superblock").
However, since it was too expensive to forcibly recalculate the overhead for bigalloc file systems at every mount, this didn't fix the problem for bigalloc file systems. This commit should address the problem when resizing file systems with the bigalloc feature enabled.
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Reviewed-by: Andreas Dilger adilger@dilger.ca Link: https://lore.kernel.org/r/20220629040026.112371-1-tytso@mit.edu Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/resize.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 0f0bf9796efd..104093f82704 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1469,6 +1469,7 @@ static void ext4_update_super(struct super_block *sb, * Update the fs overhead information */ ext4_calculate_overhead(sb); + es->s_overhead_clusters = cpu_to_le32(sbi->s_overhead);
if (test_opt(sb, DEBUG)) printk(KERN_DEBUG "EXT4-fs: added group %u:"
From: Eric Whitney enwlinux@gmail.com
stable inclusion from stable-v4.19.256 commit adf404e9d73dcae4f28fff9f7bcd857cfee2b7de category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 7f0d8e1d607c1a4fa9a27362a108921d82230874 upstream.
A race can occur in the unlikely event ext4 is unable to allocate a physical cluster for a delayed allocation in a bigalloc file system during writeback. Failure to allocate a cluster forces error recovery that includes a call to mpage_release_unused_pages(). That function removes any corresponding delayed allocated blocks from the extent status tree. If a new delayed write is in progress on the same cluster simultaneously, resulting in the addition of an new extent containing one or more blocks in that cluster to the extent status tree, delayed block accounting can be thrown off if that delayed write then encounters a similar cluster allocation failure during future writeback.
Write lock the i_data_sem in mpage_release_unused_pages() to fix this problem. Ext4's block/cluster accounting code for bigalloc relies on i_data_sem for mutual exclusion, as is found in the delayed write path, and the locking in mpage_release_unused_pages() is missing.
Cc: stable@kernel.org Reported-by: Ye Bin yebin10@huawei.com Signed-off-by: Eric Whitney enwlinux@gmail.com Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9a4125a5da57..ce88148cdc4a 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1729,7 +1729,14 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd, ext4_lblk_t start, last; start = index << (PAGE_SHIFT - inode->i_blkbits); last = end << (PAGE_SHIFT - inode->i_blkbits); + + /* + * avoid racing with extent status tree scans made by + * ext4_insert_delayed_block() + */ + down_write(&EXT4_I(inode)->i_data_sem); ext4_es_remove_extent(inode, start, last - start + 1); + up_write(&EXT4_I(inode)->i_data_sem); }
pagevec_init(&pvec);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.256 commit e4a7b589cf19d20d587bc720ffbc55bad8ef26c6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit c4ee118561a0f74442439b7b5b486db1ac1ddfeb upstream.
sk_forced_mem_schedule() has a bug similar to ones fixed in commit 7c80b038d23e ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors")
While this bug has little chance to trigger in old kernels, we need to fix it before the following patch.
Fixes: d83769a580f1 ("tcp: fix possible deadlock in tcp_send_fin()") Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Reviewed-by: Shakeel Butt shakeelb@google.com Reviewed-by: Wei Wang weiwan@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_output.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e1760b125b09..24e1f79e73d2 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3132,11 +3132,12 @@ void tcp_xmit_retransmit_queue(struct sock *sk) */ void sk_forced_mem_schedule(struct sock *sk, int size) { - int amt; + int delta, amt;
- if (size <= sk->sk_forward_alloc) + delta = size - sk->sk_forward_alloc; + if (delta <= 0) return; - amt = sk_mem_pages(size); + amt = sk_mem_pages(delta); sk->sk_forward_alloc += amt * SK_MEM_QUANTUM; sk_memory_allocated_add(sk, amt);
From: Zhang Xianwei zhang.xianwei8@zte.com.cn
stable inclusion from stable-v4.19.256 commit a0b3a9b0e6832be88b426b242aa3320c3817e2fb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit e35a5e782f67ed76a65ad0f23a484444a95f000f upstream.
A client should be able to handle getting an EACCES error while doing a mount operation to reclaim state due to NFS4CLNT_RECLAIM_REBOOT being set. If the server returns RPC_AUTH_BADCRED because authentication failed when we execute "exportfs -au", then RECLAIM_COMPLETE will go a wrong way. After mount succeeds, all OPEN call will fail due to an NFS4ERR_GRACE error being returned. This patch is to fix it by resending a RPC request.
Signed-off-by: Zhang Xianwei zhang.xianwei8@zte.com.cn Signed-off-by: Yi Wang wang.yi59@zte.com.cn Fixes: aa5190d0ed7d ("NFSv4: Kill nfs4_async_handle_error() abuses by NFSv4.1") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/nfs/nfs4proc.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index fa90d83197e3..ed38f5a793d0 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -8823,6 +8823,9 @@ static int nfs41_reclaim_complete_handle_errors(struct rpc_task *task, struct nf rpc_delay(task, NFS4_POLL_RETRY_MAX); /* fall through */ case -NFS4ERR_RETRY_UNCACHED_REP: + case -EACCES: + dprintk("%s: failed to reclaim complete error %d for server %s, retrying\n", + __func__, task->tk_status, clp->cl_hostname); return -EAGAIN; case -NFS4ERR_BADSESSION: case -NFS4ERR_DEADSESSION:
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from stable-v4.19.256 commit 0fffb46ff3d5ed4668aca96441ec7a25b793bd6f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 2135e5d56278ffdb1c2e6d325dc6b87f669b9dac upstream.
If someone cancels the open RPC call, then we must not try to free either the open slot or the layoutget operation arguments, since they are likely still in use by the hung RPC call.
Fixes: 6949493884fe ("NFSv4: Don't hold the layoutget locks across multiple RPC calls") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/nfs/nfs4proc.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ed38f5a793d0..ef9e80928f2a 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2964,12 +2964,13 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, }
out: - if (opendata->lgp) { - nfs4_lgopen_release(opendata->lgp); - opendata->lgp = NULL; - } - if (!opendata->cancelled) + if (!opendata->cancelled) { + if (opendata->lgp) { + nfs4_lgopen_release(opendata->lgp); + opendata->lgp = NULL; + } nfs4_sequence_free_slot(&opendata->o_res.seq_res); + } return ret; }
From: Trond Myklebust trond.myklebust@hammerspace.com
stable inclusion from stable-v4.19.256 commit 1429b2fa358c6fbd5e037edac59e20fa02aa783e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 6622e3a73112fc336c1c2c582428fb5ef18e456a upstream.
When we're reusing the backchannel requests instead of freeing them, then we should reinitialise any values of the send/receive xdr_bufs so that they reflect the available space.
Fixes: 0d2a970d0ae5 ("SUNRPC: Fix a backchannel race") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/sunrpc/backchannel_rqst.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c index 3c15a99b9700..e41427e1740d 100644 --- a/net/sunrpc/backchannel_rqst.c +++ b/net/sunrpc/backchannel_rqst.c @@ -69,6 +69,17 @@ static void xprt_free_allocation(struct rpc_rqst *req) kfree(req); }
+static void xprt_bc_reinit_xdr_buf(struct xdr_buf *buf) +{ + buf->head[0].iov_len = PAGE_SIZE; + buf->tail[0].iov_len = 0; + buf->pages = NULL; + buf->page_len = 0; + buf->flags = 0; + buf->len = 0; + buf->buflen = PAGE_SIZE; +} + static int xprt_alloc_xdr_buf(struct xdr_buf *buf, gfp_t gfp_flags) { struct page *page; @@ -291,6 +302,9 @@ void xprt_free_bc_rqst(struct rpc_rqst *req) */ spin_lock_bh(&xprt->bc_pa_lock); if (xprt_need_to_requeue(xprt)) { + xprt_bc_reinit_xdr_buf(&req->rq_snd_buf); + xprt_bc_reinit_xdr_buf(&req->rq_rcv_buf); + req->rq_rcv_buf.len = PAGE_SIZE; list_add_tail(&req->rq_bc_pa_list, &xprt->bc_pa_list); xprt->bc_alloc_count++; req = NULL;
From: Matthias May matthias.may@westermo.com
stable inclusion from stable-v4.19.256 commit 381a9935cca88cf66e3098d733cec9e640247ade category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit ca2bb69514a8bc7f83914122f0d596371352416c upstream.
According to Guillaume Nault RT_TOS should never be used for IPv6.
Quote: RT_TOS() is an old macro used to interprete IPv4 TOS as described in the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4 code, although, given the current state of the code, most of the existing calls have no consequence.
But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS" field to be interpreted the RFC 1349 way. There's no historical compatibility to worry about.
Fixes: 3a56f86f1be6 ("geneve: handle ipv6 priority like ipv4 tos") Acked-by: Guillaume Nault gnault@redhat.com Signed-off-by: Matthias May matthias.may@westermo.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/net/geneve.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index e3b15daeedfe..a83a7e6fb9ca 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -799,8 +799,7 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb, use_cache = false; }
- fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio), - info->key.label); + fl6->flowlabel = ip6_make_flowinfo(prio, info->key.label); dst_cache = (struct dst_cache *)&info->dst_cache; if (use_cache) { dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
From: Hector Martin marcan@marcan.st
stable inclusion from stable-v4.19.256 commit 4fffd776f56cc82280fa8c6036ff2051a511bcf8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
commit 415d832497098030241605c52ea83d4e2cfa7879 upstream.
These operations are documented as always ordered in include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer type use cases where one side needs to ensure a flag is left pending after some shared data was updated rely on this ordering, even in the failure case.
This is the case with the workqueue code, which currently suffers from a reproducible ordering violation on Apple M1 platforms (which are notoriously out-of-order) that ends up causing the TTY layer to fail to deliver data to userspace properly under the right conditions. This change fixes that bug.
Change the documentation to restrict the "no order on failure" story to the _lock() variant (for which it makes sense), and remove the early-exit from the generic implementation, which is what causes the missing barrier semantics in that case. Without this, the remaining atomic op is fully ordered (including on ARM64 LSE, as of recent versions of the architecture spec).
Suggested-by: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs") Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()") Signed-off-by: Hector Martin marcan@marcan.st Acked-by: Will Deacon will@kernel.org Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- Documentation/atomic_bitops.txt | 2 +- include/asm-generic/bitops/atomic.h | 6 ------ 2 files changed, 1 insertion(+), 7 deletions(-)
diff --git a/Documentation/atomic_bitops.txt b/Documentation/atomic_bitops.txt index be70b32c95d9..bc3fac8e1db3 100644 --- a/Documentation/atomic_bitops.txt +++ b/Documentation/atomic_bitops.txt @@ -59,7 +59,7 @@ Like with atomic_t, the rule of thumb is: - RMW operations that have a return value are fully ordered.
- RMW operations that are conditional are unordered on FAILURE, - otherwise the above rules apply. In the case of test_and_{}_bit() operations, + otherwise the above rules apply. In the case of test_and_set_bit_lock(), if the bit in memory is unchanged by the operation then it is deemed to have failed.
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h index dd90c9792909..18399e3759a2 100644 --- a/include/asm-generic/bitops/atomic.h +++ b/include/asm-generic/bitops/atomic.h @@ -35,9 +35,6 @@ static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p) unsigned long mask = BIT_MASK(nr);
p += BIT_WORD(nr); - if (READ_ONCE(*p) & mask) - return 1; - old = atomic_long_fetch_or(mask, (atomic_long_t *)p); return !!(old & mask); } @@ -48,9 +45,6 @@ static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p) unsigned long mask = BIT_MASK(nr);
p += BIT_WORD(nr); - if (!(READ_ONCE(*p) & mask)) - return 0; - old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p); return !!(old & mask); }
From: "Kiselev, Oleg" okiselev@amazon.com
stable inclusion from stable-v4.19.256 commit 7bdfb01fc5f6b3696728aeb527c50386e0ee09a1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ CVE: NA
--------------------------------
[ Upstream commit 69cb8e9d8cd97cdf5e293b26d70a9dee3e35e6bd ]
This patch avoids an attempt to resize the filesystem to an unaligned cluster boundary. An online resize to a size that is not integral to cluster size results in the last iteration attempting to grow the fs by a negative amount, which trips a BUG_ON and leaves the fs with a corrupted in-memory superblock.
Signed-off-by: Oleg Kiselev okiselev@amazon.com Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/resize.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 104093f82704..bc870561e394 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1973,6 +1973,16 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count) } brelse(bh);
+ /* + * For bigalloc, trim the requested size to the nearest cluster + * boundary to avoid creating an unusable filesystem. We do this + * silently, instead of returning an error, to avoid breaking + * callers that blindly resize the filesystem to the full size of + * the underlying block device. + */ + if (ext4_has_feature_bigalloc(sb)) + n_blocks_count &= ~((1 << EXT4_CLUSTER_BITS(sb)) - 1); + retry: o_blocks_count = ext4_blocks_count(es);