Fix CVE-2024-41007
Eric Dumazet (3): tcp: refactor tcp_retransmit_timer() tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() tcp: avoid too many retransmit packets
Menglong Dong (1): net: tcp: fix unexcepted socket die when snd_wnd is 0
Neal Cardwell (1): tcp: fix incorrect undo caused by DSACK of TLP retransmit
net/ipv4/tcp_input.c | 11 ++++++++++- net/ipv4/tcp_timer.c | 45 ++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 51 insertions(+), 5 deletions(-)
From: Neal Cardwell ncardwell@google.com
stable inclusion from stable-v4.19.318 commit 83f5eb01c4beb9741bc1600bcd8b6e94a1774abe category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
[ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ]
Loss recovery undo_retrans bookkeeping had a long-standing bug where a DSACK from a spurious TLP retransmit packet could cause an erroneous undo of a fast recovery or RTO recovery that repaired a single really-lost packet (in a sequence range outside that of the TLP retransmit). Basically, because the loss recovery state machine didn't account for the fact that it sent a TLP retransmit, the DSACK for the TLP retransmit could erroneously be implicitly be interpreted as corresponding to the normal fast recovery or RTO recovery retransmit that plugged a real hole, thus resulting in an improper undo.
For example, consider the following buggy scenario where there is a real packet loss but the congestion control response is improperly undone because of this bug:
+ send packets P1, P2, P3, P4 + P1 is really lost + send TLP retransmit of P4 + receive SACK for original P2, P3, P4 + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1 + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!) + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
The fix: when we initialize undo machinery in tcp_init_undo(), if there is a TLP retransmit in flight, then increment tp->undo_retrans so that we make sure that we receive a DSACK corresponding to the TLP retransmit, as well as DSACKs for all later normal retransmits, before triggering a loss recovery undo. Note that we also have to move the line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO we remember the tp->tlp_high_seq value until tcp_init_undo() and clear it only afterward.
Also note that the bug dates back to the original 2013 TLP implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").
However, this patch will only compile and work correctly with kernels that have tp->tlp_retrans, which was added only in v5.8 in 2020 in commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight"). So we associate this fix with that later commit.
Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight") Signed-off-by: Neal Cardwell ncardwell@google.com Reviewed-by: Eric Dumazet edumazet@google.com Cc: Yuchung Cheng ycheng@google.com Cc: Kevin Yang yyd@google.com Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_input.c | 11 ++++++++++- net/ipv4/tcp_timer.c | 2 -- 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4e32f6f1808c..f41df46a54df 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1935,8 +1935,16 @@ void tcp_clear_retrans(struct tcp_sock *tp) static inline void tcp_init_undo(struct tcp_sock *tp) { tp->undo_marker = tp->snd_una; + /* Retransmission still in flight may cause DSACKs later. */ - tp->undo_retrans = tp->retrans_out ? : -1; + /* First, account for regular retransmits in flight: */ + tp->undo_retrans = tp->retrans_out; + /* Next, account for TLP retransmits in flight: */ + if (tp->tlp_high_seq && tp->tlp_retrans) + tp->undo_retrans++; + /* Finally, avoid 0, because undo_retrans==0 means "can undo now": */ + if (!tp->undo_retrans) + tp->undo_retrans = -1; }
static bool tcp_is_rack(const struct sock *sk) @@ -2015,6 +2023,7 @@ void tcp_enter_loss(struct sock *sk)
tcp_set_ca_state(sk, TCP_CA_Loss); tp->high_seq = tp->snd_nxt; + tp->tlp_high_seq = 0; tcp_ecn_queue_cwr(tp);
/* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index d8d28ba169b4..cebbac092f32 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -441,8 +441,6 @@ void tcp_retransmit_timer(struct sock *sk) if (!tp->packets_out || WARN_ON_ONCE(tcp_rtx_queue_empty(sk))) return;
- tp->tlp_high_seq = 0; - if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) && !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) { /* Receiver dastardly shrinks window. Our retransmits
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.318 commit e5a1f7427f97bde4bda73c02106c3bde87696f8f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 0d580fbd2db084a5c96ee9c00492236a279d5e0f upstream.
It appears linux-4.14 stable needs a backport of commit 88f8598d0a30 ("tcp: exit if nothing to retransmit on RTO timeout")
Since tcp_rtx_queue_empty() is not in pre 4.15 kernels, let's refactor tcp_retransmit_timer() to only use tcp_rtx_queue_head()
I will provide to stable teams the squashed patches.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Acked-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index cebbac092f32..8ddad8facec7 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -428,6 +428,7 @@ void tcp_retransmit_timer(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct inet_connection_sock *icsk = inet_csk(sk); + struct sk_buff *skb;
if (tp->fastopen_rsk) { WARN_ON_ONCE(sk->sk_state != TCP_SYN_RECV && @@ -438,7 +439,12 @@ void tcp_retransmit_timer(struct sock *sk) */ return; } - if (!tp->packets_out || WARN_ON_ONCE(tcp_rtx_queue_empty(sk))) + + if (!tp->packets_out) + return; + + skb = tcp_rtx_queue_head(sk); + if (WARN_ON_ONCE(!skb)) return;
if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) && @@ -470,7 +476,7 @@ void tcp_retransmit_timer(struct sock *sk) goto out; } tcp_enter_loss(sk); - tcp_retransmit_skb(sk, tcp_rtx_queue_head(sk), 1); + tcp_retransmit_skb(sk, skb, 1); __sk_dst_reset(sk); goto out_reset_timer; }
From: Menglong Dong imagedong@tencent.com
stable inclusion from stable-v4.19.318 commit faa0a1fc2a0bb510b2381a5c7aa5b46e9a83d64a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit e89688e3e97868451a5d05b38a9d2633d6785cd4 upstream.
In tcp_retransmit_timer(), a window shrunk connection will be regarded as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not right all the time.
The retransmits will become zero-window probes in tcp_retransmit_timer() if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to TCP_RTO_MAX sooner or later.
However, the timer can be delayed and be triggered after 122877ms, not TCP_RTO_MAX, as I tested.
Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true once the RTO come up to TCP_RTO_MAX, and the socket will die.
Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout', which is exact the timestamp of the timeout.
However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp could already be a long time (minutes or hours) in the past even on the first RTO. So we double check the timeout with the duration of the retransmission.
Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket dying too soon.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAM... Signed-off-by: Menglong Dong imagedong@tencent.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 8ddad8facec7..c59485fd8746 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -411,6 +411,22 @@ static void tcp_fastopen_synack_timer(struct sock *sk) TCP_TIMEOUT_INIT << req->num_timeout, TCP_RTO_MAX); }
+static bool tcp_rtx_probe0_timed_out(const struct sock *sk, + const struct sk_buff *skb) +{ + const struct tcp_sock *tp = tcp_sk(sk); + const int timeout = TCP_RTO_MAX * 2; + u32 rcv_delta, rtx_delta; + + rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + if (rcv_delta <= timeout) + return false; + + rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - + (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); + + return rtx_delta > timeout; +}
/** * tcp_retransmit_timer() - The TCP retransmit timeout handler @@ -471,7 +487,7 @@ void tcp_retransmit_timer(struct sock *sk) tp->snd_una, tp->snd_nxt); } #endif - if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) { + if (tcp_rtx_probe0_timed_out(sk, skb)) { tcp_write_err(sk); goto out; }
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.318 commit 0fe6516462392ffe355a45a1ada8d264a783430f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 36534d3c54537bf098224a32dc31397793d4594d upstream.
Due to timer wheel implementation, a timer will usually fire after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s has a granularity of 64ms. For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after inet_csk(sk)->icsk_timeout whenever the timer interrupt finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Menglong Dong imagedong@tencent.com Acked-by: Neal Cardwell ncardwell@google.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index c59485fd8746..12f0cbd0f8cc 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -416,8 +416,13 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, { const struct tcp_sock *tp = tcp_sk(sk); const int timeout = TCP_RTO_MAX * 2; - u32 rcv_delta, rtx_delta; + u32 rtx_delta; + s32 rcv_delta;
+ /* Note: timer interrupt might have been delayed by at least one jiffy, + * and tp->rcv_tstamp might very well have been written recently. + * rcv_delta can thus be negative. + */ rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false;
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.318 commit 7bb7670f92bfbd05fc41a8f9a8f358b7ffed65f4 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream.
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account.
Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits.
Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4.
Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Neal Cardwell ncardwell@google.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Jon Maxwell jmaxwell37@gmail.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 12f0cbd0f8cc..3b168faf032c 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -414,22 +414,34 @@ static void tcp_fastopen_synack_timer(struct sock *sk) static bool tcp_rtx_probe0_timed_out(const struct sock *sk, const struct sk_buff *skb) { + const struct inet_connection_sock *icsk = inet_csk(sk); + u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout); const struct tcp_sock *tp = tcp_sk(sk); - const int timeout = TCP_RTO_MAX * 2; + int timeout = TCP_RTO_MAX * 2; u32 rtx_delta; s32 rcv_delta;
+ rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - + (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); + + if (user_timeout) { + /* If user application specified a TCP_USER_TIMEOUT, + * it does not want win 0 packets to 'reset the timer' + * while retransmits are not making progress. + */ + if (rtx_delta > user_timeout) + return true; + timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout)); + } + /* Note: timer interrupt might have been delayed by at least one jiffy, * and tp->rcv_tstamp might very well have been written recently. * rcv_delta can thus be negative. */ - rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false;
- rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - - (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); - return rtx_delta > timeout; }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10240 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10240 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...