Fix CVE-2024-41007
Eric Dumazet (2): tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() tcp: avoid too many retransmit packets
Menglong Dong (1): net: tcp: fix unexcepted socket die when snd_wnd is 0
net/ipv4/tcp_timer.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-)
From: Menglong Dong imagedong@tencent.com
stable inclusion from stable-v5.10.195 commit 88bc7122dba23441c819eb1be2428ba47bfad394 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
[ Upstream commit e89688e3e97868451a5d05b38a9d2633d6785cd4 ]
In tcp_retransmit_timer(), a window shrunk connection will be regarded as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not right all the time.
The retransmits will become zero-window probes in tcp_retransmit_timer() if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to TCP_RTO_MAX sooner or later.
However, the timer can be delayed and be triggered after 122877ms, not TCP_RTO_MAX, as I tested.
Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true once the RTO come up to TCP_RTO_MAX, and the socket will die.
Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout', which is exact the timestamp of the timeout.
However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp could already be a long time (minutes or hours) in the past even on the first RTO. So we double check the timeout with the duration of the retransmission.
Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket dying too soon.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAM... Signed-off-by: Menglong Dong imagedong@tencent.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 888683f2ff3e..7a83ba4e2063 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -433,6 +433,22 @@ static void tcp_fastopen_synack_timer(struct sock *sk, struct request_sock *req) TCP_TIMEOUT_INIT << req->num_timeout, TCP_RTO_MAX); }
+static bool tcp_rtx_probe0_timed_out(const struct sock *sk, + const struct sk_buff *skb) +{ + const struct tcp_sock *tp = tcp_sk(sk); + const int timeout = TCP_RTO_MAX * 2; + u32 rcv_delta, rtx_delta; + + rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + if (rcv_delta <= timeout) + return false; + + rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - + (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); + + return rtx_delta > timeout; +}
/** * tcp_retransmit_timer() - The TCP retransmit timeout handler @@ -498,7 +514,7 @@ void tcp_retransmit_timer(struct sock *sk) tp->snd_una, tp->snd_nxt); } #endif - if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) { + if (tcp_rtx_probe0_timed_out(sk, skb)) { tcp_write_err(sk); goto out; }
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v5.10.222 commit 24b9fafe34641a6dcf8df36f5e0de17d9cdccce8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 36534d3c54537bf098224a32dc31397793d4594d upstream.
Due to timer wheel implementation, a timer will usually fire after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s has a granularity of 64ms. For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after inet_csk(sk)->icsk_timeout whenever the timer interrupt finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Menglong Dong imagedong@tencent.com Acked-by: Neal Cardwell ncardwell@google.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 7a83ba4e2063..da1793a2d3c5 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -438,8 +438,13 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, { const struct tcp_sock *tp = tcp_sk(sk); const int timeout = TCP_RTO_MAX * 2; - u32 rcv_delta, rtx_delta; + u32 rtx_delta; + s32 rcv_delta;
+ /* Note: timer interrupt might have been delayed by at least one jiffy, + * and tp->rcv_tstamp might very well have been written recently. + * rcv_delta can thus be negative. + */ rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false;
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v5.10.222 commit 5d7e64d70a11d988553a08239c810a658e841982 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAD4B4 CVE: CVE-2024-41007
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream.
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account.
Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits.
Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4.
Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Neal Cardwell ncardwell@google.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Jon Maxwell jmaxwell37@gmail.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com --- net/ipv4/tcp_timer.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index da1793a2d3c5..ae9d52722620 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -436,22 +436,34 @@ static void tcp_fastopen_synack_timer(struct sock *sk, struct request_sock *req) static bool tcp_rtx_probe0_timed_out(const struct sock *sk, const struct sk_buff *skb) { + const struct inet_connection_sock *icsk = inet_csk(sk); + u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout); const struct tcp_sock *tp = tcp_sk(sk); - const int timeout = TCP_RTO_MAX * 2; + int timeout = TCP_RTO_MAX * 2; u32 rtx_delta; s32 rcv_delta;
+ rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - + (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); + + if (user_timeout) { + /* If user application specified a TCP_USER_TIMEOUT, + * it does not want win 0 packets to 'reset the timer' + * while retransmits are not making progress. + */ + if (rtx_delta > user_timeout) + return true; + timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout)); + } + /* Note: timer interrupt might have been delayed by at least one jiffy, * and tp->rcv_tstamp might very well have been written recently. * rcv_delta can thus be negative. */ - rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false;
- rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - - (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); - return rtx_delta > timeout; }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10234 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/K...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10234 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/K...