[PATCH OLK-6.6 0/4] some optimization patches for inet_diag dump

some optimization patches for inet_diag dump Eric Dumazet (4): inet_diag: add module pointer to "struct inet_diag_handler" inet_diag: allow concurrent operations sock_diag: remove sock_diag_mutex inet_diag: skip over empty buckets include/linux/inet_diag.h | 1 + net/core/sock_diag.c | 4 -- net/dccp/diag.c | 1 + net/ipv4/inet_diag.c | 84 ++++++++++++++++++++------------------- net/ipv4/raw_diag.c | 1 + net/ipv4/tcp_diag.c | 1 + net/ipv4/udp_diag.c | 2 + net/mptcp/mptcp_diag.c | 1 + net/sctp/diag.c | 1 + 9 files changed, 52 insertions(+), 44 deletions(-) -- 2.34.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16903 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/5DE... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16903 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/5DE...

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v6.9-rc1 commit db5914695a84a7b128ec2e4e9272e6e8091753e1 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/ICITYB CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Following patch is going to use RCU instead of inet_diag_table_mutex acquisition. This patch is a preparation, no change of behavior yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Liu Jian <liujian56@huawei.com> --- include/linux/inet_diag.h | 1 + net/dccp/diag.c | 1 + net/ipv4/raw_diag.c | 1 + net/ipv4/tcp_diag.c | 1 + net/ipv4/udp_diag.c | 2 ++ net/mptcp/mptcp_diag.c | 1 + net/sctp/diag.c | 1 + 7 files changed, 8 insertions(+) diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h index 84abb30a3fbb..a9033696b0aa 100644 --- a/include/linux/inet_diag.h +++ b/include/linux/inet_diag.h @@ -8,6 +8,7 @@ struct inet_hashinfo; struct inet_diag_handler { + struct module *owner; void (*dump)(struct sk_buff *skb, struct netlink_callback *cb, const struct inet_diag_req_v2 *r); diff --git a/net/dccp/diag.c b/net/dccp/diag.c index 8a82c5a2c5a8..f5019d95c3ae 100644 --- a/net/dccp/diag.c +++ b/net/dccp/diag.c @@ -58,6 +58,7 @@ static int dccp_diag_dump_one(struct netlink_callback *cb, } static const struct inet_diag_handler dccp_diag_handler = { + .owner = THIS_MODULE, .dump = dccp_diag_dump, .dump_one = dccp_diag_dump_one, .idiag_get_info = dccp_diag_get_info, diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c index fe2140c8375c..cc793bd8de25 100644 --- a/net/ipv4/raw_diag.c +++ b/net/ipv4/raw_diag.c @@ -213,6 +213,7 @@ static int raw_diag_destroy(struct sk_buff *in_skb, #endif static const struct inet_diag_handler raw_diag_handler = { + .owner = THIS_MODULE, .dump = raw_diag_dump, .dump_one = raw_diag_dump_one, .idiag_get_info = raw_diag_get_info, diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index 4cbe4b44425a..f428ecf9120f 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -222,6 +222,7 @@ static int tcp_diag_destroy(struct sk_buff *in_skb, #endif static const struct inet_diag_handler tcp_diag_handler = { + .owner = THIS_MODULE, .dump = tcp_diag_dump, .dump_one = tcp_diag_dump_one, .idiag_get_info = tcp_diag_get_info, diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c index dc41a22ee80e..38cb3a28e4ed 100644 --- a/net/ipv4/udp_diag.c +++ b/net/ipv4/udp_diag.c @@ -237,6 +237,7 @@ static int udplite_diag_destroy(struct sk_buff *in_skb, #endif static const struct inet_diag_handler udp_diag_handler = { + .owner = THIS_MODULE, .dump = udp_diag_dump, .dump_one = udp_diag_dump_one, .idiag_get_info = udp_diag_get_info, @@ -260,6 +261,7 @@ static int udplite_diag_dump_one(struct netlink_callback *cb, } static const struct inet_diag_handler udplite_diag_handler = { + .owner = THIS_MODULE, .dump = udplite_diag_dump, .dump_one = udplite_diag_dump_one, .idiag_get_info = udp_diag_get_info, diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c index 740782f10a60..0566dd793810 100644 --- a/net/mptcp/mptcp_diag.c +++ b/net/mptcp/mptcp_diag.c @@ -224,6 +224,7 @@ static void mptcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r, } static const struct inet_diag_handler mptcp_diag_handler = { + .owner = THIS_MODULE, .dump = mptcp_diag_dump, .dump_one = mptcp_diag_dump_one, .idiag_get_info = mptcp_diag_get_info, diff --git a/net/sctp/diag.c b/net/sctp/diag.c index eb05131ff1dd..23359e522273 100644 --- a/net/sctp/diag.c +++ b/net/sctp/diag.c @@ -507,6 +507,7 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, } static const struct inet_diag_handler sctp_diag_handler = { + .owner = THIS_MODULE, .dump = sctp_diag_dump, .dump_one = sctp_diag_dump_one, .idiag_get_info = sctp_diag_get_info, -- 2.34.1

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v6.9-rc1 commit 223f55196bbdb182a9b8de6108a0834b5e5e832e category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/ICITYB CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- inet_diag_lock_handler() current implementation uses a mutex to protect inet_diag_table[] array against concurrent changes. This makes inet_diag dump serialized, thus less scalable than legacy /proc files. It is time to switch to full RCU protection. As a bonus, if a target is statically linked instead of being modular, inet_diag_lock_handler() & inet_diag_unlock_handler() reduce to reads only. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Liu Jian <liujian56@huawei.com> --- net/ipv4/inet_diag.c | 80 ++++++++++++++++++++++---------------------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 8383b01746e7..93fea0185725 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -32,7 +32,7 @@ #include <linux/inet_diag.h> #include <linux/sock_diag.h> -static const struct inet_diag_handler **inet_diag_table; +static const struct inet_diag_handler __rcu **inet_diag_table; struct inet_diag_entry { const __be32 *saddr; @@ -48,28 +48,28 @@ struct inet_diag_entry { #endif }; -static DEFINE_MUTEX(inet_diag_table_mutex); - static const struct inet_diag_handler *inet_diag_lock_handler(int proto) { - if (proto < 0 || proto >= IPPROTO_MAX) { - mutex_lock(&inet_diag_table_mutex); - return ERR_PTR(-ENOENT); - } + const struct inet_diag_handler *handler; + + if (proto < 0 || proto >= IPPROTO_MAX) + return NULL; if (!READ_ONCE(inet_diag_table[proto])) sock_load_diag_module(AF_INET, proto); - mutex_lock(&inet_diag_table_mutex); - if (!inet_diag_table[proto]) - return ERR_PTR(-ENOENT); + rcu_read_lock(); + handler = rcu_dereference(inet_diag_table[proto]); + if (handler && !try_module_get(handler->owner)) + handler = NULL; + rcu_read_unlock(); - return inet_diag_table[proto]; + return handler; } static void inet_diag_unlock_handler(const struct inet_diag_handler *handler) { - mutex_unlock(&inet_diag_table_mutex); + module_put(handler->owner); } void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk) @@ -104,9 +104,12 @@ static size_t inet_sk_attr_size(struct sock *sk, const struct inet_diag_handler *handler; size_t aux = 0; - handler = inet_diag_table[req->sdiag_protocol]; + rcu_read_lock(); + handler = rcu_dereference(inet_diag_table[req->sdiag_protocol]); + DEBUG_NET_WARN_ON_ONCE(!handler); if (handler && handler->idiag_get_aux_size) aux = handler->idiag_get_aux_size(sk, net_admin); + rcu_read_unlock(); return nla_total_size(sizeof(struct tcp_info)) + nla_total_size(sizeof(struct inet_diag_msg)) @@ -244,10 +247,16 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, struct nlmsghdr *nlh; struct nlattr *attr; void *info = NULL; + int protocol; cb_data = cb->data; - handler = inet_diag_table[inet_diag_get_protocol(req, cb_data)]; - BUG_ON(!handler); + protocol = inet_diag_get_protocol(req, cb_data); + + /* inet_diag_lock_handler() made sure inet_diag_table[] is stable. */ + handler = rcu_dereference_protected(inet_diag_table[protocol], 1); + DEBUG_NET_WARN_ON_ONCE(!handler); + if (!handler) + return -ENXIO; nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, cb->nlh->nlmsg_type, sizeof(*r), nlmsg_flags); @@ -605,9 +614,10 @@ static int inet_diag_cmd_exact(int cmd, struct sk_buff *in_skb, protocol = inet_diag_get_protocol(req, &dump_data); handler = inet_diag_lock_handler(protocol); - if (IS_ERR(handler)) { - err = PTR_ERR(handler); - } else if (cmd == SOCK_DIAG_BY_FAMILY) { + if (!handler) + return -ENOENT; + + if (cmd == SOCK_DIAG_BY_FAMILY) { struct netlink_callback cb = { .nlh = nlh, .skb = in_skb, @@ -1175,12 +1185,12 @@ static int __inet_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, again: prev_min_dump_alloc = cb->min_dump_alloc; handler = inet_diag_lock_handler(protocol); - if (!IS_ERR(handler)) + if (handler) { handler->dump(skb, cb, r); - else - err = PTR_ERR(handler); - inet_diag_unlock_handler(handler); - + inet_diag_unlock_handler(handler); + } else { + err = -ENOENT; + } /* The skb is not large enough to fit one sk info and * inet_sk_diag_fill() has requested for a larger skb. */ @@ -1375,10 +1385,9 @@ int inet_diag_handler_get_info(struct sk_buff *skb, struct sock *sk) } handler = inet_diag_lock_handler(sk->sk_protocol); - if (IS_ERR(handler)) { - inet_diag_unlock_handler(handler); + if (!handler) { nlmsg_cancel(skb, nlh); - return PTR_ERR(handler); + return -ENOENT; } attr = handler->idiag_info_size @@ -1415,20 +1424,12 @@ static const struct sock_diag_handler inet6_diag_handler = { int inet_diag_register(const struct inet_diag_handler *h) { const __u16 type = h->idiag_type; - int err = -EINVAL; if (type >= IPPROTO_MAX) - goto out; + return -EINVAL; - mutex_lock(&inet_diag_table_mutex); - err = -EEXIST; - if (!inet_diag_table[type]) { - WRITE_ONCE(inet_diag_table[type], h); - err = 0; - } - mutex_unlock(&inet_diag_table_mutex); -out: - return err; + return !cmpxchg((const struct inet_diag_handler **)&inet_diag_table[type], + NULL, h) ? 0 : -EEXIST; } EXPORT_SYMBOL_GPL(inet_diag_register); @@ -1439,9 +1440,8 @@ void inet_diag_unregister(const struct inet_diag_handler *h) if (type >= IPPROTO_MAX) return; - mutex_lock(&inet_diag_table_mutex); - WRITE_ONCE(inet_diag_table[type], NULL); - mutex_unlock(&inet_diag_table_mutex); + xchg((const struct inet_diag_handler **)&inet_diag_table[type], + NULL); } EXPORT_SYMBOL_GPL(inet_diag_unregister); -- 2.34.1

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v6.9-rc1 commit f44e64990beb41167bd7c313d90bcf7e290c3582 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/ICITYB CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- sock_diag_rcv() is still serializing its operations using a mutex, for no good reason. This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops in netlink_rcv_skb"), but the root cause has been fixed with commit cd40b7d3983c ("[NET]: make netlink user -> kernel interface synchronious") Remove this mutex to let multiple threads run concurrently. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Liu Jian <liujian56@huawei.com> --- net/core/sock_diag.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c index 70007fc578a1..a08eed9b9142 100644 --- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -290,13 +290,9 @@ static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, } } -static DEFINE_MUTEX(sock_diag_mutex); - static void sock_diag_rcv(struct sk_buff *skb) { - mutex_lock(&sock_diag_mutex); netlink_rcv_skb(skb, &sock_diag_rcv_msg); - mutex_unlock(&sock_diag_mutex); } static int sock_diag_bind(struct net *net, int group) -- 2.34.1

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v6.9-rc1 commit 622a08e8de9f9ad576fd56e3a2e97a84370a8100 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/ICITYB CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- After the removal of inet_diag_table_mutex, sock_diag_table_mutex and sock_diag_mutex, I was able so see spinlock contention from inet_diag_dump_icsk() when running 100 parallel invocations. It is time to skip over empty buckets. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Conflicts: net/ipv4/inet_diag.c [Caused by we did not backport 91051f003948.] Signed-off-by: Liu Jian <liujian56@huawei.com> --- net/ipv4/inet_diag.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 93fea0185725..012ef8efdf9b 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -1045,6 +1045,10 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb, num = 0; ilb = &hashinfo->lhash2[i]; + if (hlist_nulls_empty(&ilb->nulls_head)) { + s_num = 0; + continue; + } spin_lock(&ilb->lock); sk_nulls_for_each(sk, node, &ilb->nulls_head) { struct inet_sock *inet = inet_sk(sk); -- 2.34.1
participants (2)
-
Liu Jian
-
patchwork bot