issue: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue
jiazhenyuan (4): tcp: address problems caused by EDT misshaps tcp: always set retrans_stamp on recovery tcp: create a helper to model exponential backoff tcp: adjust rto_base in retransmits_timed_out()
net/ipv4/tcp_input.c | 16 +++++++---- net/ipv4/tcp_output.c | 9 +++--- net/ipv4/tcp_timer.c | 65 ++++++++++++++++++++----------------------- 3 files changed, 44 insertions(+), 46 deletions(-)
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.3.0 commit 9efdda4e3abed13f0903b7b6e4d4c2102019440a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------
When a qdisc setup including pacing FQ is dismantled and recreated, some TCP packets are sent earlier than instructed by TCP stack.
TCP can be fooled when ACK comes back, because the following operation can return a negative value.
tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
Some paths in TCP stack were not dealing properly with this, this patch addresses four of them.
Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net
Signed-off-by: Jiazhenyuan jiazhenyuan@uniontech.com --- net/ipv4/tcp_input.c | 16 ++++++++++------ net/ipv4/tcp_timer.c | 10 ++++++---- 2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a058fb936c2a..82d86cef1d10 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -581,10 +581,12 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk, u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; u32 delta_us;
- if (!delta) - delta = 1; - delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); - tcp_rcv_rtt_update(tp, delta_us, 0); + if (likely(delta < INT_MAX / (USEC_PER_SEC / TCP_TS_HZ))) { + if (!delta) + delta = 1; + delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); + tcp_rcv_rtt_update(tp, delta_us, 0); + } } }
@@ -2934,9 +2936,11 @@ static bool tcp_ack_update_rtt(struct sock *sk, const int flag, if (seq_rtt_us < 0 && tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr && flag & FLAG_ACKED) { u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; - u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
- seq_rtt_us = ca_rtt_us = delta_us; + if (likely(delta < INT_MAX / (USEC_PER_SEC / TCP_TS_HZ))) { + seq_rtt_us = delta * (USEC_PER_SEC / TCP_TS_HZ); + ca_rtt_us = seq_rtt_us; + } } rs->rtt_us = ca_rtt_us; /* RTT of last (S)ACKed packet (or -1) */ if (seq_rtt_us < 0) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 681882a40968..7c7f7e6cd955 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -40,15 +40,17 @@ static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); u32 elapsed, start_ts; + s32 remaining;
start_ts = tcp_retransmit_stamp(sk); if (!icsk->icsk_user_timeout || !start_ts) return icsk->icsk_rto; elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts; - if (elapsed >= icsk->icsk_user_timeout) + remaining = icsk->icsk_user_timeout - elapsed; + if (remaining <= 0) return 1; /* user timeout has passed; fire ASAP */ - else - return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(icsk->icsk_user_timeout - elapsed)); + + return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(remaining)); }
/** @@ -210,7 +212,7 @@ static bool retransmits_timed_out(struct sock *sk, (boundary - linear_backoff_thresh) * TCP_RTO_MAX; timeout = jiffies_to_msecs(timeout); } - return (tcp_time_stamp(tcp_sk(sk)) - start_ts) >= timeout; + return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0; }
/* A write timeout has occurred. Process the after effects. */
From: Yuchung Cheng ycheng@google.com
mainline inclusion from mainline-v5.3.0 commit 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA -------------------------------------------------------------------------
Previously TCP socket's retrans_stamp is not set if the retransmission has failed to send. As a result if a socket is experiencing local issues to retransmit packets, determining when to abort a socket is complicated w/o knowning the starting time of the recovery since retrans_stamp may remain zero.
This complication causes sub-optimal behavior that TCP may use the latest, instead of the first, retransmission time to compute the elapsed time of a stalling connection due to local issues. Then TCP may disrecard TCP retries settings and keep retrying until it finally succeed: not a good idea when the local host is already strained.
The simple fix is to always timestamp the start of a recovery. It's worth noting that retrans_stamp is also used to compare echo timestamp values to detect spurious recovery. This patch does not break that because retrans_stamp is still later than when the original packet was sent.
Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Neal Cardwell ncardwell@google.com Reviewed-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: David S. Miller davem@davemloft.net
Signed-off-by: Jiazhenyuan jiazhenyuan@uniontech.com --- net/ipv4/tcp_output.c | 9 ++++----- net/ipv4/tcp_timer.c | 23 +++-------------------- 2 files changed, 7 insertions(+), 25 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 97b9d671a83c..3bdc80ad2f46 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3011,13 +3011,12 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) #endif TCP_SKB_CB(skb)->sacked |= TCPCB_RETRANS; tp->retrans_out += tcp_skb_pcount(skb); - - /* Save stamp of the first retransmit. */ - if (!tp->retrans_stamp) - tp->retrans_stamp = tcp_skb_timestamp(skb); - }
+ /* Save stamp of the first retransmit. */ + if (!tp->retrans_stamp) + tp->retrans_stamp = tcp_skb_timestamp(skb); + if (tp->undo_retrans < 0) tp->undo_retrans = 0; tp->undo_retrans += tcp_skb_pcount(skb); diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 7c7f7e6cd955..9a904bf8decc 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -22,28 +22,14 @@ #include <linux/gfp.h> #include <net/tcp.h>
-static u32 tcp_retransmit_stamp(const struct sock *sk) -{ - u32 start_ts = tcp_sk(sk)->retrans_stamp; - - if (unlikely(!start_ts)) { - struct sk_buff *head = tcp_rtx_queue_head(sk); - - if (!head) - return 0; - start_ts = tcp_skb_timestamp(head); - } - return start_ts; -} - static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); u32 elapsed, start_ts; s32 remaining;
- start_ts = tcp_retransmit_stamp(sk); - if (!icsk->icsk_user_timeout || !start_ts) + start_ts = tcp_sk(sk)->retrans_stamp; + if (!icsk->icsk_user_timeout) return icsk->icsk_rto; elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts; remaining = icsk->icsk_user_timeout - elapsed; @@ -198,10 +184,7 @@ static bool retransmits_timed_out(struct sock *sk, if (!inet_csk(sk)->icsk_retransmits) return false;
- start_ts = tcp_retransmit_stamp(sk); - if (!start_ts) - return false; - + start_ts = tcp_sk(sk)->retrans_stamp; if (likely(timeout == 0)) { linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
From: Yuchung Cheng ycheng@google.com
mainline inclusion from mainline-v5.3.0 commit 01a523b071618abbc634d1958229fe3bd2dfa5fa category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------
Create a helper to model TCP exponential backoff for the next patch. This is pure refactor w no behavior change.
Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Neal Cardwell ncardwell@google.com Reviewed-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: David S. Miller davem@davemloft.net
Signed-off-by: Jiazhenyuan jiazhenyuan@uniontech.com --- net/ipv4/tcp_timer.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 9a904bf8decc..2c62e9e67e64 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -160,7 +160,20 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); }
- +static unsigned int tcp_model_timeout(struct sock *sk, + unsigned int boundary, + unsigned int rto_base) +{ + unsigned int linear_backoff_thresh, timeout; + + linear_backoff_thresh = ilog2(TCP_RTO_MAX / rto_base); + if (boundary <= linear_backoff_thresh) + timeout = ((2 << boundary) - 1) * rto_base; + else + timeout = ((2 << linear_backoff_thresh) - 1) * rto_base + + (boundary - linear_backoff_thresh) * TCP_RTO_MAX; + return jiffies_to_msecs(timeout); +} /** * retransmits_timed_out() - returns true if this connection has timed out * @sk: The current socket @@ -178,23 +191,15 @@ static bool retransmits_timed_out(struct sock *sk, unsigned int boundary, unsigned int timeout) { - const unsigned int rto_base = TCP_RTO_MIN; - unsigned int linear_backoff_thresh, start_ts; + unsigned int start_ts;
if (!inet_csk(sk)->icsk_retransmits) return false;
start_ts = tcp_sk(sk)->retrans_stamp; - if (likely(timeout == 0)) { - linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base); - - if (boundary <= linear_backoff_thresh) - timeout = ((2 << boundary) - 1) * rto_base; - else - timeout = ((2 << linear_backoff_thresh) - 1) * rto_base + - (boundary - linear_backoff_thresh) * TCP_RTO_MAX; - timeout = jiffies_to_msecs(timeout); - } + if (likely(timeout == 0)) + timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN); + return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0; }
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.3.0 commit 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------------
The cited commit exposed an old retransmits_timed_out() bug which assumed it could call tcp_model_timeout() with TCP_RTO_MIN as rto_base for all states.
But flows in SYN_SENT or SYN_RECV state uses a different RTO base (1 sec instead of 200 ms, unless BPF choses another value)
This caused a reduction of SYN retransmits from 6 to 4 with the default /proc/sys/net/ipv4/tcp_syn_retries value.
Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Yuchung Cheng ycheng@google.com Cc: Marek Majkowski marek@cloudflare.com Signed-off-by: David S. Miller davem@davemloft.net
Signed-off-by: Jiazhenyuan jiazhenyuan@uniontech.com --- net/ipv4/tcp_timer.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 2c62e9e67e64..36cca4612e23 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -197,8 +197,13 @@ static bool retransmits_timed_out(struct sock *sk, return false;
start_ts = tcp_sk(sk)->retrans_stamp; - if (likely(timeout == 0)) - timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN); + if (likely(timeout == 0)) { + unsigned int rto_base = TCP_RTO_MIN; + + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) + rto_base = tcp_timeout_init(sk); + timeout = tcp_model_timeout(sk, boundary, rto_base); + }
return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0; }
您好,您的补丁已经提交 CI 进行检查。
谢谢
---- 成坚(gatieme)
在 2021/10/8 17:01, jiazhenyuan 写道:
issue: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue
jiazhenyuan (4): tcp: address problems caused by EDT misshaps tcp: always set retrans_stamp on recovery tcp: create a helper to model exponential backoff tcp: adjust rto_base in retransmits_timed_out()
net/ipv4/tcp_input.c | 16 +++++++---- net/ipv4/tcp_output.c | 9 +++--- net/ipv4/tcp_timer.c | 65 ++++++++++++++++++++----------------------- 3 files changed, 44 insertions(+), 46 deletions(-)
您好,冒昧打扰。
如有疑问和补充信息,欢迎在 issue 上进行更新。
https://gitee.com/openeuler/kernel/issues/I4AFRJ
谢谢。
---- 成坚
在 2021/10/8 17:01, jiazhenyuan 写道:
issue:https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue
jiazhenyuan (4): tcp: address problems caused by EDT misshaps tcp: always set retrans_stamp on recovery tcp: create a helper to model exponential backoff tcp: adjust rto_base in retransmits_timed_out()
net/ipv4/tcp_input.c | 16 +++++++---- net/ipv4/tcp_output.c | 9 +++--- net/ipv4/tcp_timer.c | 65 ++++++++++++++++++++----------------------- 3 files changed, 44 insertions(+), 46 deletions(-)