From: Lu Wei luwei32@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8KBYH
--------------------------------
Commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()") allocates even ports for connect() first while leaving odd ports for bind() and this works well in busy servers.
But this strategy causes severe performance degradation in busy clients. when a client has used more than half of the local ports setted in proc/sys/net/ipv4/ip_local_port_range, if this client trys to connect to a server again, the connect time increases rapidly since it will traverse all the even ports though they are exhausted.
So this path provides another strategy by introducing a system option: local_port_allocation. If it is a busy client, users should set it to 1 to use sequential allocation while it should be set to 0 in other situations. Its default value is 0.
Signed-off-by: Lu Wei luwei32@huawei.com Signed-off-by: Liu Jian liujian56@huawei.com
Conflicts: include/net/tcp.h net/ipv4/inet_hashtables.c net/ipv4/sysctl_net_ipv4.c
Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- include/net/tcp.h | 1 + net/ipv4/inet_hashtables.c | 11 ++++++++--- net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ 3 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 0239e815edf7..e9d387fffe22 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -258,6 +258,7 @@ DECLARE_PER_CPU(int, tcp_memory_per_cpu_fw_alloc);
extern struct percpu_counter tcp_sockets_allocated; extern unsigned long tcp_memory_pressure; +extern int sysctl_local_port_allocation;
/* optimized version of sk_under_memory_pressure() for TCP sockets */ static inline bool tcp_under_memory_pressure(const struct sock *sk) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 598c1b114d2c..919f0f869118 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -1011,7 +1011,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, struct inet_bind_bucket *tb; bool tb_created = false; u32 remaining, offset; - int ret, i, low, high; + int ret, i, low, high, span_size; int l3mdev; u32 index;
@@ -1021,6 +1021,11 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, local_bh_enable(); return ret; } + /* local_port_allocation 0 means even and odd port allocation strategy + * will be applied, so span size is 2; otherwise sequential allocation + * will be used and span size is 1. Default value is 0. + */ + span_size = sysctl_local_port_allocation ? 1 : 2;
l3mdev = inet_sk_bound_l3mdev(sk);
@@ -1043,7 +1048,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, offset &= ~1U; other_parity_scan: port = low + offset; - for (i = 0; i < remaining; i += 2, port += 2) { + for (i = 0; i < remaining; i += span_size, port += span_size) { if (unlikely(port >= high)) port -= remaining; if (inet_is_local_reserved_port(net, port)) @@ -1084,7 +1089,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, }
offset++; - if ((offset & 1) && remaining > 1) + if ((offset & 1) && remaining > 1 && span_size == 2) goto other_parity_scan;
return -EADDRNOTAVAIL; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 6ac890b4073f..b17eb28a9690 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -39,6 +39,7 @@ static unsigned long ip_ping_group_range_min[] = { 0, 0 }; static unsigned long ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; static u32 u32_max_div_HZ = UINT_MAX / HZ; static int one_day_secs = 24 * 3600; +int sysctl_local_port_allocation; static u32 fib_multipath_hash_fields_all_mask __maybe_unused = FIB_MULTIPATH_HASH_FIELD_ALL_MASK; static unsigned int tcp_child_ehash_entries_max = 16 * 1024 * 1024; @@ -579,6 +580,13 @@ static struct ctl_table ipv4_table[] = { .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, + { + .procname = "local_port_allocation", + .data = &sysctl_local_port_allocation, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, { } };
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3058 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3058 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7...