From: Lu Wei luwei32@huawei.com
maillist inclusion category: bugfix bugzilla: 187792, https://gitee.com/openeuler/kernel/issues/I5ZG7O
Reference: https://www.spinics.net/lists/netdev/msg856902.html
--------------------------------
If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code of TCPOPT_SACK_PERM is called to enable sack after data is sent and dupacks are received , it will trigger a warning in function tcp_verify_left_out() as follows:
============================================ WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 tcp_timeout_mark_lost+0x154/0x160 tcp_enter_loss+0x2b/0x290 tcp_retransmit_timer+0x50b/0x640 tcp_write_timer_handler+0x1c8/0x340 tcp_write_timer+0xe5/0x140 call_timer_fn+0x3a/0x1b0 __run_timers.part.0+0x1bf/0x2d0 run_timer_softirq+0x43/0xb0 __do_softirq+0xfd/0x373 __irq_exit_rcu+0xf6/0x140
The warning is caused in the following steps: 1. a socket named socketA is created 2. socketA enters repair mode without build a connection 3. socketA calls connect() and its state is changed to TCP_ESTABLISHED directly 4. socketA leaves repair mode 5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup ack receives) increase 6. socketA enters repair mode again 7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack 8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out increases 9. sack_outs + lost_out > packets_out triggers since lost_out and sack_outs increase repeatly
In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step7 not happen and the warning will not be triggered. As suggested by Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already sent.
socket-tcp tests in CRIU has been tested as follows: $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ --ignore-taint
socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 094cd93e50c2..6f71b6cfc1b2 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2916,7 +2916,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_REPAIR_OPTIONS: if (!tp->repair) err = -EINVAL; - else if (sk->sk_state == TCP_ESTABLISHED) + else if (sk->sk_state == TCP_ESTABLISHED && !tp->bytes_sent) err = tcp_repair_options_est(sk, (struct tcp_repair_opt __user *)optval, optlen);
From: Lu Wei luwei32@huawei.com
mainline inclusion from mainline-v6.1-rc3 commit ec791d8149ff60c40ad2074af3b92a39c916a03f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5XG7S CVE: NA
--------------------------------
The type of sk_rcvbuf and sk_sndbuf in struct sock is int, and in tcp_add_backlog(), the variable limit is caculated by adding sk_rcvbuf, sk_sndbuf and 64 * 1024, it may exceed the max value of int and overflow. This patch reduces the limit budget by halving the sndbuf to solve this issue since ACK packets are much smaller than the payload.
Fixes: c9c3321257e1 ("tcp: add tcp_add_backlog()") Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net
conflict: net/ipv4/tcp_ipv4.c
Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_ipv4.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 3d7ed60b456a..a30808cab791 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1630,7 +1630,8 @@ int tcp_v4_early_demux(struct sk_buff *skb)
bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) { - u32 limit = READ_ONCE(sk->sk_rcvbuf) + READ_ONCE(sk->sk_sndbuf); + u32 limit = (u32)READ_ONCE(sk->sk_rcvbuf) + + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1); struct skb_shared_info *shinfo; const struct tcphdr *th; struct tcphdr *thtail;
From: Ziyang Xuan william.xuanziyang@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5Z86E
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=36...
--------------------------------
Recently, we got two syzkaller problems because of oversize packet when napi frags enabled.
One of the problems is because the first seg size of the iov_iter from user space is very big, it is 2147479538 which is bigger than the threshold value for bail out early in __alloc_pages(). And skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN flag. Thus we got a warning as following:
======================================================== WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 ... Call trace: __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 __alloc_pages_node include/linux/gfp.h:550 [inline] alloc_pages_node include/linux/gfp.h:564 [inline] kmalloc_large_node+0x94/0x350 mm/slub.c:4038 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 __skb_grow include/linux/skbuff.h:2779 [inline] tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
The other problem is because odd IPv6 packets without NEXTHDR_NONE extension header and have big packet length, it is 2127925 which is bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in ipv6_gro_receive(), network_header offset and transport_header offset are all bigger than U16_MAX. That would trigger skb->network_header and skb->transport_header overflow error, because they are all '__u16' type. Eventually, it would affect the value for __skb_push(skb, value), and make it be a big value. After __skb_push() in ipv6_gro_receive(), skb->data would less than skb->head, an out of bounds memory bug occurred. That would trigger the problem as following:
================================================================== BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260 ... Call trace: dump_backtrace+0xd8/0x130 show_stack+0x1c/0x50 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0xbc/0x2e8 print_report+0x100/0x1e4 kasan_report+0x80/0x120 __asan_load8+0x78/0xa0 eth_type_trans+0x100/0x260 napi_gro_frags+0x164/0x550 tun_get_user+0xda4/0x1270 tun_chr_write_iter+0x74/0x130 do_iter_readv_writev+0x130/0x1ec do_iter_write+0xbc/0x1e0 vfs_writev+0x13c/0x26c
To fix the problems, restrict the packet size less than (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved skb space in napi_alloc_skb() because transport_header is an offset from skb->head. Add len check in tun_napi_alloc_frags() simply.
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei... Signed-off-by: Jakub Kicinski kuba@kernel.org Conflicts: drivers/net/tun.c Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a772ceff9c1f..01de856c9774 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1465,7 +1465,8 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, int err; int i;
- if (it->nr_segs > MAX_SKB_FRAGS + 1) + if (it->nr_segs > MAX_SKB_FRAGS + 1 || + len > (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN)) return ERR_PTR(-ENOMEM);
local_bh_disable();