[PATCH OLK-6.6 0/3] Fix netns reference count issue

Kuniyuki Iwashima (2): Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free" Revert "smb: client: fix TCP timers deadlock after rmmod" Wang Zhaolong (1): smb: client: fix netns refcount leak after net_passive changes fs/smb/client/connect.c | 31 ++++++------------------------- 1 file changed, 6 insertions(+), 25 deletions(-) -- 2.34.3

From: Kuniyuki Iwashima <kuniyu@amazon.com> mainline inclusion from mainline-v6.15-rc3 commit c707193a17128fae2802d10cbad7239cc57f0c95 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICO0HP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This reverts commit 4e7f1644f2ac6d01dc584f6301c3b1d5aac4eaef. The commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") is not only a bogus fix for LOCKDEP null-ptr-deref but also introduces a real issue, TCP sockets leak, which will be explained in detail in the next revert. Also, CNA assigned CVE-2024-54680 to it but is rejecting it. [0] Thus, we are reverting the commit and its follow-up commit 4e7f1644f2ac ("smb: client: Fix netns refcount imbalance causing leaks and use-after-free"). Link: https://lore.kernel.org/all/2025040248-tummy-smilingly-4240@gregkh/ #[0] Fixes: 4e7f1644f2ac ("smb: client: Fix netns refcount imbalance causing leaks and use-after-free") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Conflicts: fs/smb/client/connect.c [Conflicts with mainline commit 7d14dd683b1b ("cifs: Allow to disable or force initialization of NetBIOS session").] Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- fs/smb/client/connect.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index af92c40877e4..e33ceea2e6fd 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -314,11 +314,10 @@ cifs_abort_connection(struct TCP_Server_Info *server) kernel_sock_shutdown(server->ssocket, SHUT_WR); cifs_dbg(FYI, "Post shutdown state: 0x%x Flags: 0x%lx\n", server->ssocket->state, server->ssocket->flags); sock_release(server->ssocket); server->ssocket = NULL; - put_net(cifs_net_ns(server)); } server->sequence_number = 0; server->session_estab = false; kfree_sensitive(server->session_key.response); server->session_key.response = NULL; @@ -3146,16 +3145,12 @@ generic_ip_connect(struct TCP_Server_Info *server) } /* * Grab netns reference for the socket. * - * This reference will be released in several situations: - * - In the failure path before the cifsd thread is started. - * - In the all place where server->socket is released, it is - * also set to NULL. - * - Ultimately in clean_demultiplex_info(), during the final - * teardown. + * It'll be released here, on error, or in clean_demultiplex_info() upon server + * teardown. */ get_net(net); /* BB other socket options to set KEEPALIVE, NODELAY? */ cifs_dbg(FYI, "Socket created\n"); @@ -3167,12 +3162,14 @@ generic_ip_connect(struct TCP_Server_Info *server) else cifs_reclassify_socket4(socket); } rc = bind_socket(server); - if (rc < 0) + if (rc < 0) { + put_net(cifs_net_ns(server)); return rc; + } /* * Eventually check for other socket options to change from * the default. sock_setsockopt not used because it expects * user space buffer @@ -3214,10 +3211,13 @@ generic_ip_connect(struct TCP_Server_Info *server) } trace_smb3_connect_done(server->hostname, server->conn_id, &server->dstaddr); if (sport == htons(RFC1001_PORT)) rc = ip_rfc1001_connect(server); + if (rc < 0) + put_net(cifs_net_ns(server)); + return rc; } static int ip_connect(struct TCP_Server_Info *server) -- 2.34.3

From: Kuniyuki Iwashima <kuniyu@amazon.com> mainline inclusion from mainline-v6.15-rc3 commit 95d2b9f693ff2a1180a23d7d59acc0c4e72f4c41 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICO0HP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801. Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is mentioned as CVE-2024-54680, but is actually did not fix anything; The issue can be reproduced on top of it. [0] Also, it reverted the change by commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") and introduced a real issue by reviving the kernel TCP socket. When a reconnect happens for a CIFS connection, the socket state transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync() in tcp_close() stops all timers for the socket. If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1 forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans. Usually, FIN can be retransmitted by the peer, but if the peer aborts the connection, the issue comes into reality. I warned about this privately by pointing out the exact report [1], but the bogus fix was finally merged. So, we should not stop the timers to finally kill the connection on our side in that case, meaning we must not use a kernel socket for TCP whose sk->sk_net_refcnt is 0. The kernel socket does not have a reference to its netns to make it possible to tear down netns without cleaning up every resource in it. For example, tunnel devices use a UDP socket internally, but we can destroy netns without removing such devices and let it complete during exit. Otherwise, netns would be leaked when the last application died. However, this is problematic for TCP sockets because TCP has timers to close the connection gracefully even after the socket is close()d. The lifetime of the socket and its netns is different from the lifetime of the underlying connection. If the socket user does not maintain the netns lifetime, the timer could be fired after the socket is close()d and its netns is freed up, resulting in use-after-free. Actually, we have seen so many similar issues and converted such sockets to have a reference to netns. That's why I converted the CIFS client socket to have a reference to netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right fix**. Regarding the LOCKDEP issue, we can prevent the module unload by bumping the module refcount when switching the LOCKDDEP key in sock_lock_init_class_and_name(). [2] For a while, let's revert the bogus fix. Note that now we can use sk_net_refcnt_upgrade() for the socket conversion, but I'll do so later separately to make backport easy. Link: https://lore.kernel.org/all/20250402020807.28583-1-kuniyu@amazon.com/ #[0] Link: https://lore.kernel.org/netdev/c08bd5378da647a2a4c16698125d180a@huawei.com/ #[1] Link: https://lore.kernel.org/lkml/20250402005841.19846-1-kuniyu@amazon.com/ #[2] Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> [Contextual conflict.] Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- fs/smb/client/connect.c | 36 ++++++++++-------------------------- 1 file changed, 10 insertions(+), 26 deletions(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index e33ceea2e6fd..fb168ec49864 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -1001,17 +1001,13 @@ clean_demultiplex_info(struct TCP_Server_Info *server) wake_up_all(&server->request_q); /* give those requests time to exit */ msleep(125); if (cifs_rdma_enabled(server)) smbd_destroy(server); - if (server->ssocket) { sock_release(server->ssocket); server->ssocket = NULL; - - /* Release netns reference for the socket. */ - put_net(cifs_net_ns(server)); } if (!list_empty(&server->pending_mid_q)) { struct list_head dispose_list; struct mid_q_entry *mid_entry; @@ -1056,11 +1052,10 @@ clean_demultiplex_info(struct TCP_Server_Info *server) * If threads still have not exited they are probably never * coming home not much else we can do but free the memory. */ } - /* Release netns reference for this server. */ put_net(cifs_net_ns(server)); kfree(server->leaf_fullpath); kfree(server->hostname); kfree(server); @@ -1728,12 +1723,10 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx, if (ctx->nosharesock) tcp_ses->nosharesock = true; tcp_ses->ops = ctx->ops; tcp_ses->vals = ctx->vals; - - /* Grab netns reference for this server. */ cifs_set_net_ns(tcp_ses, get_net(current->nsproxy->net_ns)); tcp_ses->conn_id = atomic_inc_return(&tcpSesNextId); tcp_ses->noblockcnt = ctx->rootfs; tcp_ses->noblocksnd = ctx->noblocksnd || ctx->rootfs; @@ -1861,23 +1854,20 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx, return tcp_ses; out_err_crypto_release: cifs_crypto_secmech_release(tcp_ses); - /* Release netns reference for this server. */ put_net(cifs_net_ns(tcp_ses)); out_err: if (tcp_ses) { if (SERVER_IS_CHAN(tcp_ses)) cifs_put_tcp_session(tcp_ses->primary_server, false); kfree(tcp_ses->hostname); kfree(tcp_ses->leaf_fullpath); - if (tcp_ses->ssocket) { + if (tcp_ses->ssocket) sock_release(tcp_ses->ssocket); - put_net(cifs_net_ns(tcp_ses)); - } kfree(tcp_ses); } return ERR_PTR(rc); } @@ -3135,24 +3125,24 @@ generic_ip_connect(struct TCP_Server_Info *server) if (server->ssocket) { socket = server->ssocket; } else { struct net *net = cifs_net_ns(server); + struct sock *sk; - rc = sock_create_kern(net, sfamily, SOCK_STREAM, IPPROTO_TCP, &server->ssocket); + rc = __sock_create(net, sfamily, SOCK_STREAM, + IPPROTO_TCP, &server->ssocket, 1); if (rc < 0) { cifs_server_dbg(VFS, "Error %d creating socket\n", rc); return rc; } - /* - * Grab netns reference for the socket. - * - * It'll be released here, on error, or in clean_demultiplex_info() upon server - * teardown. - */ - get_net(net); + sk = server->ssocket->sk; + __netns_tracker_free(net, &sk->ns_tracker, false); + sk->sk_net_refcnt = 1; + get_net_track(net, &sk->ns_tracker, GFP_KERNEL); + sock_inuse_add(net, 1); /* BB other socket options to set KEEPALIVE, NODELAY? */ cifs_dbg(FYI, "Socket created\n"); socket = server->ssocket; socket->sk->sk_allocation = GFP_NOFS; @@ -3162,14 +3152,12 @@ generic_ip_connect(struct TCP_Server_Info *server) else cifs_reclassify_socket4(socket); } rc = bind_socket(server); - if (rc < 0) { - put_net(cifs_net_ns(server)); + if (rc < 0) return rc; - } /* * Eventually check for other socket options to change from * the default. sock_setsockopt not used because it expects * user space buffer @@ -3202,22 +3190,18 @@ generic_ip_connect(struct TCP_Server_Info *server) if (server->noblockcnt && rc == -EINPROGRESS) rc = 0; if (rc < 0) { cifs_dbg(FYI, "Error %d connecting to server\n", rc); trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc); - put_net(cifs_net_ns(server)); sock_release(socket); server->ssocket = NULL; return rc; } trace_smb3_connect_done(server->hostname, server->conn_id, &server->dstaddr); if (sport == htons(RFC1001_PORT)) rc = ip_rfc1001_connect(server); - if (rc < 0) - put_net(cifs_net_ns(server)); - return rc; } static int ip_connect(struct TCP_Server_Info *server) -- 2.34.3

maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICO0HP CVE: NA Reference: https://patchwork.kernel.org/project/cifs-client/patch/20250717132926.901902... ------------------------------- After commit 5c70eb5c593d ("net: better track kernel sockets lifetime"), kernel sockets now use net_passive reference counting. However, commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") restored the manual socket refcount manipulation without adapting to this new mechanism, causing a memory leak. The issue can be reproduced by[1]: 1. Creating a network namespace 2. Mounting and Unmounting CIFS within the namespace 3. Deleting the namespace Some memory leaks may appear after a period of time following step 3. unreferenced object 0xffff9951419f6b00 (size 256): comm "ip", pid 447, jiffies 4294692389 (age 14.730s) hex dump (first 32 bytes): 1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ... backtrace: __kmem_cache_alloc_node+0x30e/0x3d0 __kmalloc+0x52/0x120 net_alloc_generic+0x1d/0x30 copy_net_ns+0x86/0x200 create_new_namespaces+0x117/0x300 unshare_nsproxy_namespaces+0x60/0xa0 ksys_unshare+0x148/0x360 __x64_sys_unshare+0x12/0x20 do_syscall_64+0x59/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 ... unreferenced object 0xffff9951442e7500 (size 32): comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s) hex dump (first 32 bytes): 40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ... 01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o....... backtrace: __kmem_cache_alloc_node+0x30e/0x3d0 kmalloc_trace+0x2a/0x90 ref_tracker_alloc+0x8e/0x1d0 sk_alloc+0x18c/0x1c0 inet_create+0xf1/0x370 __sock_create+0xd7/0x1e0 generic_ip_connect+0x1d4/0x5a0 [cifs] cifs_get_tcp_session+0x5d0/0x8a0 [cifs] cifs_mount_get_session+0x47/0x1b0 [cifs] dfs_mount_share+0xfa/0xa10 [cifs] cifs_mount+0x68/0x2b0 [cifs] cifs_smb3_do_mount+0x10b/0x760 [cifs] smb3_get_tree+0x112/0x2e0 [cifs] vfs_get_tree+0x29/0xf0 path_mount+0x2d4/0xa00 __se_sys_mount+0x165/0x1d0 Root cause: When creating kernel sockets, sk_alloc() calls net_passive_inc() for sockets with sk_net_refcnt=0. The CIFS code manually converts kernel sockets to user sockets by setting sk_net_refcnt=1, but doesn't call the corresponding net_passive_dec(). This creates an imbalance in the net_passive counter, which prevents the network namespace from being destroyed when its last user reference is dropped. As a result, the entire namespace and all its associated resources remain allocated. Timeline of patches leading to this issue: - commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") in v6.12 fixed the original netns UAF by manually managing socket refcounts - commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") in v6.13 attempted to use kernel sockets but introduced TCP timer issues - commit 5c70eb5c593d ("net: better track kernel sockets lifetime") in v6.14-rc5 introduced the net_passive mechanism with sk_net_refcnt_upgrade() for proper socket conversion - commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") in v6.15-rc3 reverted to manual refcount management without adapting to the new net_passive changes Fix this by using sk_net_refcnt_upgrade() which properly handles the net_passive counter when converting kernel sockets to user sockets. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1] Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") Cc: stable@vger.kernel.org Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- fs/smb/client/connect.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index fb168ec49864..c6730cca9a22 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -3127,22 +3127,19 @@ generic_ip_connect(struct TCP_Server_Info *server) socket = server->ssocket; } else { struct net *net = cifs_net_ns(server); struct sock *sk; - rc = __sock_create(net, sfamily, SOCK_STREAM, - IPPROTO_TCP, &server->ssocket, 1); + rc = sock_create_kern(net, sfamily, SOCK_STREAM, + IPPROTO_TCP, &server->ssocket); if (rc < 0) { cifs_server_dbg(VFS, "Error %d creating socket\n", rc); return rc; } sk = server->ssocket->sk; - __netns_tracker_free(net, &sk->ns_tracker, false); - sk->sk_net_refcnt = 1; - get_net_track(net, &sk->ns_tracker, GFP_KERNEL); - sock_inuse_add(net, 1); + sk_net_refcnt_upgrade(sk); /* BB other socket options to set KEEPALIVE, NODELAY? */ cifs_dbg(FYI, "Socket created\n"); socket = server->ssocket; socket->sk->sk_allocation = GFP_NOFS; -- 2.34.3

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/17244 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/HT6... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/17244 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/HT6...
participants (2)
-
patchwork bot
-
Wang Zhaolong