[PATCH openEuler-1.0-LTS 0/6] Fix the problem that the number of tcp timeout retransmissions is lost

jiazhenyuan

22 Oct 2021 22 Oct '21

6:51 p.m.

From: Jiazhenyuan <jiazhenyuan@uniontech.com> issue: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue jiazhenyuan (6): tcp: switch tcp and sch_fq to new earliest departure time net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows From 9efdda4e3abed13f0903b7b6e4d4c2102019440a Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@google.com> Date: Sat, 24 Nov 2018 09:12:24 -0800 Subject: [PATCH] tcp: address problems caused by EDT misshaps From 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c Mon Sep 17 00:00:00 2001 From: Yuchung Cheng <ycheng@google.com> Date: Wed, 16 Jan 2019 15:05:30 -0800 Subject: [PATCH] tcp: always set retrans_stamp on recovery From 01a523b071618abbc634d1958229fe3bd2dfa5fa Mon Sep 17 00:00:00 2001 From: Yuchung Cheng <ycheng@google.com> Date: Wed, 16 Jan 2019 15:05:32 -0800 Subject: [PATCH] tcp: create a helper to model exponential backoff From 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@google.com> Date: Mon, 30 Sep 2019 15:44:44 -0700 Subject: [PATCH] tcp: adjust rto_base in retransmits_timed_out() net/ipv4/tcp_bbr.c | 7 +++-- net/ipv4/tcp_input.c | 17 +++++++----- net/ipv4/tcp_output.c | 29 ++++++++++++++------ net/ipv4/tcp_timer.c | 64 ++++++++++++++++++++----------------------- net/sched/sch_fq.c | 46 ++++++++++++++++++------------- 5 files changed, 92 insertions(+), 71 deletions(-) -- 2.27.0

Show replies by date

jiazhenyuan

22 Oct 22 Oct

6:51 p.m.

New subject: [PATCH] tcp: switch tcp and sch_fq to new earliest departure time model

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v5.3.0 commit ab408b6dc7449c0f791e9e5f8de72fa7428584f2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------ TCP keeps track of tcp_wstamp_ns by itself, meaning sch_fq no longer has to do it. Thanks to this model, TCP can get more accurate RTT samples, since pacing no longer inflates them. This has the nice effect of removing some delays caused by FQ quantum mechanism, causing inflated max/P99 latencies. Also we might relax TCP Small Queue tight limits in the future, since this new model allow TCP to build bigger batches, since sch_fq (or a device with earliest departure time offload) ensure these packets will be delivered on time. Note that other protocols are not converted (they will probably never be) so sch_fq has still support for SO_MAX_PACING_RATE Tested: Test showing FQ pacing quantum artifact for low-rate flows, adding unexpected throttles for RPC flows, inflating max and P99 latencies. The parameters chosen here are to show what happens typically when a TCP flow has a reduced pacing rate (this can be caused by a reduced cwin after few losses, or/and rtt above few ms) MIBS="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY" Before : $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds 19,82.78,5279,3825,482.02 After : $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds 20,49.94,128,63,3.18 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> --- net/ipv4/tcp_bbr.c | 7 ++++--- net/ipv4/tcp_output.c | 20 +++++++++++++++++--- net/sched/sch_fq.c | 21 +++++++++++---------- 3 files changed, 32 insertions(+), 16 deletions(-) diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index b70c9365e131..f9d8ec577616 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -136,6 +136,9 @@ static const u32 bbr_probe_rtt_mode_ms = 200; /* Skip TSO below the following bandwidth (bits/sec): */ static const int bbr_min_tso_rate = 1200000; +/* Pace at ~1% below estimated bw, on average, to reduce queue at bottleneck. */ +static const int bbr_pacing_marging_percent = 1; + /* We use a high_gain value of 2/ln(2) because it's the smallest pacing gain * that will allow a smoothly increasing pacing rate that will double each RTT * and send the same number of packets per RTT that an un-paced, slow-starting @@ -235,12 +238,10 @@ static u64 bbr_rate_bytes_per_sec(struct sock *sk, u64 rate, int gain) { unsigned int mss = tcp_sk(sk)->mss_cache; - if (!tcp_needs_internal_pacing(sk)) - mss = tcp_mss_to_mtu(sk, mss); rate *= mss; rate *= gain; rate >>= BBR_SCALE; - rate *= USEC_PER_SEC; + rate *= USEC_PER_SEC / 100 * (100 - bbr_pacing_marging_percent); return rate >> BW_SCALE; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 97b9d671a83c..e6a939a3d936 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1038,9 +1038,23 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb) sock_hold(sk); } -static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb) +static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb) { + struct tcp_sock *tp = tcp_sk(sk); + skb->skb_mstamp = tp->tcp_mstamp; + if (sk->sk_pacing_status != SK_PACING_NONE) { + u32 rate = sk->sk_pacing_rate; + + /* Original sch_fq does not pace first 10 MSS + * Note that tp->data_segs_out overflows after 2^32 packets, + * this is a minor annoyance. + */ + if (rate != ~0U && rate && tp->data_segs_out >= 10) { + tp->tcp_mstamp += div_u64((u64)skb->len * NSEC_PER_SEC, rate); + /* TODO: update internal pacing here */ + } + } list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue); } @@ -1205,7 +1219,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, err = net_xmit_eval(err); } if (!err && oskb) { - tcp_update_skb_after_send(tp, oskb); + tcp_update_skb_after_send(sk, oskb); tcp_rate_skb_sent(sk, oskb); } return err; @@ -2387,7 +2401,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) { /* "skb_mstamp" is used as a start point for the retransmit timer */ - tcp_update_skb_after_send(tp, skb); + tcp_update_skb_after_send(sk, skb); goto repair; /* Skip network transmission */ } @@ -2978,7 +2992,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) } tcp_skb_tsorted_restore(skb); if (!err) { - tcp_update_skb_after_send(tp, skb); + tcp_update_skb_after_send(sk, skb); tcp_rate_skb_sent(sk, skb); } } else { diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index ba60a8dd5bb3..41cb947917fc 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -491,11 +491,16 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch) } skb = f->head; - if (unlikely(skb && now < f->time_next_packet && - !skb_is_tcp_pure_ack(skb))) { - head->first = f->next; - fq_flow_set_throttled(q, f); - goto begin; + if (skb && !skb_is_tcp_pure_ack(skb)) { + u64 time_next_packet = max_t(u64, ktime_to_ns(skb->tstamp), + f->time_next_packet); + + if (now < time_next_packet) { + head->first = f->next; + f->time_next_packet = time_next_packet; + fq_flow_set_throttled(q, f); + goto begin; + } } skb = fq_dequeue_head(sch, f); @@ -513,11 +518,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch) prefetch(&skb->end); f->credit -= qdisc_pkt_len(skb); - if (!q->rate_enable) - goto out; - - /* Do not pace locally generated ack packets */ - if (skb_is_tcp_pure_ack(skb)) + if (ktime_to_ns(skb->tstamp) || !q->rate_enable) goto out; rate = q->flow_max_rate; -- 2.27.0

jiazhenyuan

6:51 p.m.

New subject: [PATCH openEuler-1.0-LTS 2/6] net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v5.3.0 commit 08e14fe429a07475ee9f29a283945d602e4a6d92 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------ When EDT conversion happened, fq lost the ability to enfore a maxrate for all flows. It kept it for non EDT flows. This commit restores the functionality. Tested: tc qd replace dev eth0 root fq maxrate 500Mbit netperf -P0 -H host -- -O THROUGHPUT 489.75 Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> --- net/sched/sch_fq.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 41cb947917fc..64d7f11b0240 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -516,22 +516,29 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch) goto begin; } prefetch(&skb->end); - f->credit -= qdisc_pkt_len(skb); + plen = qdisc_pkt_len(skb); + f->credit -= plen; - if (ktime_to_ns(skb->tstamp) || !q->rate_enable) + if (!q->rate_enable) goto out; rate = q->flow_max_rate; - if (skb->sk) - rate = min(skb->sk->sk_pacing_rate, rate); - if (rate <= q->low_rate_threshold) { - f->credit = 0; - plen = qdisc_pkt_len(skb); - } else { - plen = max(qdisc_pkt_len(skb), q->quantum); - if (f->credit > 0) - goto out; + /* If EDT time was provided for this skb, we need to + * update f->time_next_packet only if this qdisc enforces + * a flow max rate. + */ + if (!skb->tstamp) { + if (skb->sk) + rate = min(skb->sk->sk_pacing_rate, rate); + + if (rate <= q->low_rate_threshold) { + f->credit = 0; + } else { + plen = max(plen, q->quantum); + if (f->credit > 0) + goto out; + } } if (rate != ~0U) { u64 len = (u64)plen * NSEC_PER_SEC; -- 2.27.0

jiazhenyuan

6:51 p.m.

New subject: [PATCH openEuler-1.0-LTS 3/6] tcp: address problems caused by EDT misshaps

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v5.3.0 commit 9efdda4e3abed13f0903b7b6e4d4c2102019440a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------ When a qdisc setup including pacing FQ is dismantled and recreated, some TCP packets are sent earlier than instructed by TCP stack. TCP can be fooled when ACK comes back, because the following operation can return a negative value. tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; Some paths in TCP stack were not dealing properly with this, this patch addresses four of them. Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> --- net/ipv4/tcp_input.c | 17 ++++++++++------- net/ipv4/tcp_timer.c | 10 ++++++---- 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a058fb936c2a..993a975575f6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -581,10 +581,12 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk, u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; u32 delta_us; - if (!delta) - delta = 1; - delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); - tcp_rcv_rtt_update(tp, delta_us, 0); + if (likely(delta < INT_MAX / (USEC_PER_SEC / TCP_TS_HZ))) { + if (!delta) + delta = 1; + delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); + tcp_rcv_rtt_update(tp, delta_us, 0); + } } } @@ -2934,9 +2936,10 @@ static bool tcp_ack_update_rtt(struct sock *sk, const int flag, if (seq_rtt_us < 0 && tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr && flag & FLAG_ACKED) { u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; - u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); - - seq_rtt_us = ca_rtt_us = delta_us; + if (likely(delta < INT_MAX / (USEC_PER_SEC / TCP_TS_HZ))) { + seq_rtt_us = delta * (USEC_PER_SEC / TCP_TS_HZ); + ca_rtt_us = seq_rtt_us; + } } rs->rtt_us = ca_rtt_us; /* RTT of last (S)ACKed packet (or -1) */ if (seq_rtt_us < 0) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 681882a40968..7c7f7e6cd955 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -40,15 +40,17 @@ static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); u32 elapsed, start_ts; + s32 remaining; start_ts = tcp_retransmit_stamp(sk); if (!icsk->icsk_user_timeout || !start_ts) return icsk->icsk_rto; elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts; - if (elapsed >= icsk->icsk_user_timeout) + remaining = icsk->icsk_user_timeout - elapsed; + if (remaining <= 0) return 1; /* user timeout has passed; fire ASAP */ - else - return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(icsk->icsk_user_timeout - elapsed)); + + return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(remaining)); } /** @@ -210,7 +212,7 @@ static bool retransmits_timed_out(struct sock *sk, (boundary - linear_backoff_thresh) * TCP_RTO_MAX; timeout = jiffies_to_msecs(timeout); } - return (tcp_time_stamp(tcp_sk(sk)) - start_ts) >= timeout; + return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0; } /* A write timeout has occurred. Process the after effects. */ -- 2.27.0

jiazhenyuan

6:52 p.m.

New subject: [PATCH openEuler-1.0-LTS 4/6] tcp: always set retrans_stamp on recovery

From: Yuchung Cheng <ycheng@google.com> mainline inclusion from mainline-v5.3.0 commit 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------ Previously TCP socket's retrans_stamp is not set if the retransmission has failed to send. As a result if a socket is experiencing local issues to retransmit packets, determining when to abort a socket is complicated w/o knowning the starting time of the recovery since retrans_stamp may remain zero. This complication causes sub-optimal behavior that TCP may use the latest, instead of the first, retransmission time to compute the elapsed time of a stalling connection due to local issues. Then TCP may disrecard TCP retries settings and keep retrying until it finally succeed: not a good idea when the local host is already strained. The simple fix is to always timestamp the start of a recovery. It's worth noting that retrans_stamp is also used to compare echo timestamp values to detect spurious recovery. This patch does not break that because retrans_stamp is still later than when the original packet was sent. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech> --- net/ipv4/tcp_output.c | 9 ++++----- net/ipv4/tcp_timer.c | 23 +++-------------------- 2 files changed, 7 insertions(+), 25 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e6a939a3d936..35624786f639 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3025,13 +3025,12 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) #endif TCP_SKB_CB(skb)->sacked |= TCPCB_RETRANS; tp->retrans_out += tcp_skb_pcount(skb); - - /* Save stamp of the first retransmit. */ - if (!tp->retrans_stamp) - tp->retrans_stamp = tcp_skb_timestamp(skb); - } + /* Save stamp of the first (attempted) retransmit. */ + if (!tp->retrans_stamp) + tp->retrans_stamp = tcp_skb_timestamp(skb); + if (tp->undo_retrans < 0) tp->undo_retrans = 0; tp->undo_retrans += tcp_skb_pcount(skb); diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 7c7f7e6cd955..9a904bf8decc 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -22,28 +22,14 @@ #include <linux/gfp.h> #include <net/tcp.h> -static u32 tcp_retransmit_stamp(const struct sock *sk) -{ - u32 start_ts = tcp_sk(sk)->retrans_stamp; - - if (unlikely(!start_ts)) { - struct sk_buff *head = tcp_rtx_queue_head(sk); - - if (!head) - return 0; - start_ts = tcp_skb_timestamp(head); - } - return start_ts; -} - static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); u32 elapsed, start_ts; s32 remaining; - start_ts = tcp_retransmit_stamp(sk); - if (!icsk->icsk_user_timeout || !start_ts) + start_ts = tcp_sk(sk)->retrans_stamp; + if (!icsk->icsk_user_timeout) return icsk->icsk_rto; elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts; remaining = icsk->icsk_user_timeout - elapsed; @@ -198,10 +184,7 @@ static bool retransmits_timed_out(struct sock *sk, if (!inet_csk(sk)->icsk_retransmits) return false; - start_ts = tcp_retransmit_stamp(sk); - if (!start_ts) - return false; - + start_ts = tcp_sk(sk)->retrans_stamp; if (likely(timeout == 0)) { linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base); -- 2.27.0

jiazhenyuan

6:52 p.m.

New subject: [PATCH openEuler-1.0-LTS 5/6] tcp: create a helper to model exponential backoff

From: Yuchung Cheng <ycheng@google.com> mainline inclusion from mainline-v5.3.0 commit 01a523b071618abbc634d1958229fe3bd2dfa5fa category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------ Create a helper to model TCP exponential backoff for the next patch. This is pure refactor w no behavior change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> --- net/ipv4/tcp_timer.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 9a904bf8decc..c9e63bd68488 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -160,6 +160,20 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); } +static unsigned int tcp_model_timeout(struct sock *sk, + unsigned int boundary, + unsigned int rto_base) +{ + unsigned int linear_backoff_thresh, timeout; + + linear_backoff_thresh = ilog2(TCP_RTO_MAX / rto_base); + if (boundary <= linear_backoff_thresh) + timeout = ((2 << boundary) - 1) * rto_base; + else + timeout = ((2 << linear_backoff_thresh) - 1) * rto_base + + (boundary - linear_backoff_thresh) * TCP_RTO_MAX; + return jiffies_to_msecs(timeout); +} /** * retransmits_timed_out() - returns true if this connection has timed out @@ -178,23 +192,15 @@ static bool retransmits_timed_out(struct sock *sk, unsigned int boundary, unsigned int timeout) { - const unsigned int rto_base = TCP_RTO_MIN; - unsigned int linear_backoff_thresh, start_ts; + unsigned int start_ts; if (!inet_csk(sk)->icsk_retransmits) return false; start_ts = tcp_sk(sk)->retrans_stamp; - if (likely(timeout == 0)) { - linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base); - - if (boundary <= linear_backoff_thresh) - timeout = ((2 << boundary) - 1) * rto_base; - else - timeout = ((2 << linear_backoff_thresh) - 1) * rto_base + - (boundary - linear_backoff_thresh) * TCP_RTO_MAX; - timeout = jiffies_to_msecs(timeout); - } + if (likely(timeout == 0)) + timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN); + return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0; } -- 2.27.0

jiazhenyuan

6:52 p.m.

New subject: [PATCH openEuler-1.0-LTS 6/6] tcp: adjust rto_base in retransmits_timed_out()

From: Eric Dumazet <edumazet@google.com> mainline inclusion from mainline-v5.3.0 commit 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue CVE: NA ------------------------------------------------------------ The cited commit exposed an old retransmits_timed_out() bug which assumed it could call tcp_model_timeout() with TCP_RTO_MIN as rto_base for all states. But flows in SYN_SENT or SYN_RECV state uses a different RTO base (1 sec instead of 200 ms, unless BPF choses another value) This caused a reduction of SYN retransmits from 6 to 4 with the default /proc/sys/net/ipv4/tcp_syn_retries value. Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Marek Majkowski <marek@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> --- net/ipv4/tcp_timer.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index c9e63bd68488..657b95d8b256 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -198,8 +198,13 @@ static bool retransmits_timed_out(struct sock *sk, return false; start_ts = tcp_sk(sk)->retrans_stamp; - if (likely(timeout == 0)) - timeout = tcp_model_timeout(sk, boundary, TCP_RTO_MIN); + if (likely(timeout == 0)) { + unsigned int rto_base = TCP_RTO_MIN; + + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) + rto_base = tcp_timeout_init(sk); + timeout = tcp_model_timeout(sk, boundary, rto_base); + } return (s32)(tcp_time_stamp(tcp_sk(sk)) - start_ts - timeout) >= 0; } -- 2.27.0

chengjian (D)

25 Oct 25 Oct

3:35 p.m.

您好，感谢参与 openEuler 这组补丁已经提交 CI 进行检查。谢谢 -- 成坚(gatieme) 在 2021/10/22 18:51, jiazhenyuan 写道:

...

From: Jiazhenyuan <jiazhenyuan@uniontech.com>

issue: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue

jiazhenyuan (6): tcp: switch tcp and sch_fq to new earliest departure time net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows From 9efdda4e3abed13f0903b7b6e4d4c2102019440a Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@google.com> Date: Sat, 24 Nov 2018 09:12:24 -0800 Subject: [PATCH] tcp: address problems caused by EDT misshaps From 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c Mon Sep 17 00:00:00 2001 From: Yuchung Cheng <ycheng@google.com> Date: Wed, 16 Jan 2019 15:05:30 -0800 Subject: [PATCH] tcp: always set retrans_stamp on recovery From 01a523b071618abbc634d1958229fe3bd2dfa5fa Mon Sep 17 00:00:00 2001 From: Yuchung Cheng <ycheng@google.com> Date: Wed, 16 Jan 2019 15:05:32 -0800 Subject: [PATCH] tcp: create a helper to model exponential backoff From 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@google.com> Date: Mon, 30 Sep 2019 15:44:44 -0700 Subject: [PATCH] tcp: adjust rto_base in retransmits_timed_out()

net/ipv4/tcp_bbr.c | 7 +++-- net/ipv4/tcp_input.c | 17 +++++++----- net/ipv4/tcp_output.c | 29 ++++++++++++++------ net/ipv4/tcp_timer.c | 64 ++++++++++++++++++++----------------------- net/sched/sch_fq.c | 46 ++++++++++++++++++------------- 5 files changed, 92 insertions(+), 71 deletions(-)

1569

Age (days ago)

1572

Last active (days ago)

List overview

7 comments

2 participants

participants (2)

chengjian (D)
jiazhenyuan

[PATCH openEuler-1.0-LTS 0/6] Fix the problem that the number of tcp timeout retransmissions is lost

tags

participants (2)