[PATCH openEuler-5.10-LTS-SP1 2/8] bpf: Fix sockmap calling sleepable function in teardown path

14 Feb 2023

From: John Fastabend john.fastabend@gmail.com
mainline inclusion
from mainline-v6.0-rc1
commit 697fb80a53642be624f5121b6ca9d66769c180e0
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6BRTY
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
syzbot reproduced the bug ...
BUG: sleeping function called from invalid context at kernel/workqueue.c:3010
... with the following stack trace fragment ...
start_flush_work kernel/workqueue.c:3010 [inline]
 __flush_work+0x109/0xb10 kernel/workqueue.c:3074
 __cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3162
 sk_psock_stop+0x4cb/0x630 net/core/skmsg.c:802
 sock_map_destroy+0x333/0x760 net/core/sock_map.c:1581
 inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1130
 __tcp_close+0xd5b/0x12b0 net/ipv4/tcp.c:2897
 tcp_close+0x29/0xc0 net/ipv4/tcp.c:2909
... introduced by d8616ee2affc. Do a quick trace of the code path and the
bug is obvious:
inet_csk_destroy_sock(sk)
     sk_prot->destroy(sk);      <--- sock_map_destroy
        sk_psock_stop(, true);   <--- true so cancel workqueue
          cancel_work_sync()     <--- splat, because *_bh_disable()
We can not call cancel_work_sync() from inside destroy path. So mark
the sk_psock_stop call to skip this cancel_work_sync(). This will avoid
the BUG, but means we may run sk_psock_backlog after or during the
destroy op. We zapped the ingress_skb queue in sk_psock_stop (safe to
do with local_bh_disable) so its empty and the sk_psock_backlog work
item will not find any pkts to process here. However, because we are
not going to wait for it or clear its ->state its possible it kicks off
or is already running. This should be 'safe' up until psock drops its
refcnt to psock->sk. The sock_put() that drops this reference is only
done at psock destroy time from sk_psock_destroy(). This is done through
workqueue when sk_psock_drop() is called on psock refnt reaches 0.
And importantly sk_psock_destroy() does a cancel_work_sync(). So trivial
fix works.
I've had hit or miss luck reproducing this caught it once or twice with
the provided reproducer when running with many runners. However, syzkaller
is very good at reproducing so relying on syzkaller to verify fix.
Fixes: d8616ee2affc ("bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues")
Reported-by: syzbot+140186ceba0c496183bc@syzkaller.appspotmail.com
Suggested-by: Hillf Danton hdanton@sina.com
Signed-off-by: John Fastabend john.fastabend@gmail.com
Signed-off-by: Daniel Borkmann daniel@iogearbox.net
Cc: Wang Yufen wangyufen@huawei.com
Link: https://lore.kernel.org/bpf/20220628035803.317876-1-john.fastabend@gmail.com
Signed-off-by: Liu Jian liujian56@huawei.com
Reviewed-by: Wang Yufen wangyufen@huawei.com
Reviewed-by: Yue Haibing yuehaibing@huawei.com
Signed-off-by: Jialin Zhang zhangjialin11@huawei.com
---
 net/core/sock_map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 85df06298c98..50fb43905c87 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1577,7 +1577,7 @@ void sock_map_destroy(struct sock *sk)
    saved_destroy = psock->saved_destroy;
    sock_map_remove_links(sk, psock);
    rcu_read_unlock();
-	sk_psock_stop(psock, true);
+	sk_psock_stop(psock, false);
    sk_psock_put(sk, psock);
    saved_destroy(sk);
 }
-- 
2.25.1


    

2025

2024

2023

2022

2021

2020

2019

[PATCH openEuler-5.10-LTS-SP1 2/8] bpf: Fix sockmap calling sleepable function in teardown path