Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com --- Also, the sk->sk_tx_skb_cache may be both changed by allocation and freeing side, I assume there may be some implicit protection here too, such as the NAPI protection for rx? --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e8b48df..226ddef 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -866,7 +866,7 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp, { struct sk_buff *skb;
- if (likely(!size)) { + if (static_branch_unlikely(&tcp_tx_skb_cache_key) && likely(!size)) { skb = sk->sk_tx_skb_cache; if (skb) { skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
On Wed, 2021-09-01 at 18:39 +0800, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Note that MPTCP is currently exploiting sk->sk_tx_skb_cache. If we get this patch goes in as-is, it will break mptcp.
One possible solution would be to let mptcp usage enable sk-
sk_tx_skb_cache, but that has relevant side effects on plain TCP.
Another options would be re-work once again the mptcp xmit path to avoid using sk->sk_tx_skb_cache.
Cheers,
Paolo
On Wed, Sep 1, 2021 at 3:52 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2021-09-01 at 18:39 +0800, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Note that MPTCP is currently exploiting sk->sk_tx_skb_cache. If we get this patch goes in as-is, it will break mptcp.
One possible solution would be to let mptcp usage enable sk-
sk_tx_skb_cache, but that has relevant side effects on plain TCP.
Another options would be re-work once again the mptcp xmit path to avoid using sk->sk_tx_skb_cache.
Hmmm, I actually wrote a revert of this feature but forgot to submit it last year.
commit c36cfbd791f62c0f7c6b32132af59dfdbe6be21b (HEAD -> listener_scale4) Author: Eric Dumazet edumazet@google.com Date: Wed May 20 06:38:38 2020 -0700
tcp: remove sk_{tr}x_skb_cache
This reverts the following patches :
2e05fcae83c41eb2df10558338dc600dc783af47 ("tcp: fix compile error if !CONFIG_SYSCTL") 4f661542a40217713f2cee0bb6678fbb30d9d367 ("tcp: fix zerocopy and notsent_lowat issues") 472c2e07eef045145bc1493cc94a01c87140780a ("tcp: add one skb cache for tx") 8b27dae5a2e89a61c46c6dbc76c040c0e6d0ed4c ("tcp: add one skb cache for rx")
Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed.
We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes.
Signed-off-by: Eric Dumazet edumazet@google.com
I will update this commit to also remove the part in MPTCP.
Let's remove this feature and replace it with something less costly.
On Wed, Sep 1, 2021 at 8:06 AM Eric Dumazet edumazet@google.com wrote:
On Wed, Sep 1, 2021 at 3:52 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2021-09-01 at 18:39 +0800, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Note that MPTCP is currently exploiting sk->sk_tx_skb_cache. If we get this patch goes in as-is, it will break mptcp.
One possible solution would be to let mptcp usage enable sk-
sk_tx_skb_cache, but that has relevant side effects on plain TCP.
Another options would be re-work once again the mptcp xmit path to avoid using sk->sk_tx_skb_cache.
Hmmm, I actually wrote a revert of this feature but forgot to submit it last year.
commit c36cfbd791f62c0f7c6b32132af59dfdbe6be21b (HEAD -> listener_scale4) Author: Eric Dumazet edumazet@google.com Date: Wed May 20 06:38:38 2020 -0700
tcp: remove sk_{tr}x_skb_cache This reverts the following patches : 2e05fcae83c41eb2df10558338dc600dc783af47 ("tcp: fix compile error
if !CONFIG_SYSCTL") 4f661542a40217713f2cee0bb6678fbb30d9d367 ("tcp: fix zerocopy and notsent_lowat issues") 472c2e07eef045145bc1493cc94a01c87140780a ("tcp: add one skb cache for tx") 8b27dae5a2e89a61c46c6dbc76c040c0e6d0ed4c ("tcp: add one skb cache for rx")
Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed. We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes. Signed-off-by: Eric Dumazet <edumazet@google.com>
I will update this commit to also remove the part in MPTCP.
Let's remove this feature and replace it with something less costly.
Paolo, can you work on MPTP side, so that my revert can be then applied ?
Thanks !
On Wed, 2021-09-01 at 08:16 -0700, Eric Dumazet wrote:
On Wed, Sep 1, 2021 at 8:06 AM Eric Dumazet edumazet@google.com wrote:
On Wed, Sep 1, 2021 at 3:52 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2021-09-01 at 18:39 +0800, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Note that MPTCP is currently exploiting sk->sk_tx_skb_cache. If we get this patch goes in as-is, it will break mptcp.
One possible solution would be to let mptcp usage enable sk-
sk_tx_skb_cache, but that has relevant side effects on plain TCP.
Another options would be re-work once again the mptcp xmit path to avoid using sk->sk_tx_skb_cache.
Hmmm, I actually wrote a revert of this feature but forgot to submit it last year.
commit c36cfbd791f62c0f7c6b32132af59dfdbe6be21b (HEAD -> listener_scale4) Author: Eric Dumazet edumazet@google.com Date: Wed May 20 06:38:38 2020 -0700
tcp: remove sk_{tr}x_skb_cache This reverts the following patches : 2e05fcae83c41eb2df10558338dc600dc783af47 ("tcp: fix compile error
if !CONFIG_SYSCTL") 4f661542a40217713f2cee0bb6678fbb30d9d367 ("tcp: fix zerocopy and notsent_lowat issues") 472c2e07eef045145bc1493cc94a01c87140780a ("tcp: add one skb cache for tx") 8b27dae5a2e89a61c46c6dbc76c040c0e6d0ed4c ("tcp: add one skb cache for rx")
Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed. We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes. Signed-off-by: Eric Dumazet <edumazet@google.com>
I will update this commit to also remove the part in MPTCP.
Let's remove this feature and replace it with something less costly.
Paolo, can you work on MPTP side, so that my revert can be then applied ?
You are way too fast, I was still replying to your previous email, asking if I could help :)
I'll a look ASAP. Please, allow for some latency: I'm way slower!
Cheers,
Paolo
On Wed, Sep 1, 2021 at 8:25 AM Paolo Abeni pabeni@redhat.com wrote:
You are way too fast, I was still replying to your previous email, asking if I could help :)
All I did was to resurrect this patch (rebase got no conflict) and send it :)
I'll a look ASAP. Please, allow for some latency: I'm way slower!
Thanks Paolo !
On Wed, 2021-09-01 at 17:25 +0200, Paolo Abeni wrote:
On Wed, 2021-09-01 at 08:16 -0700, Eric Dumazet wrote:
On Wed, Sep 1, 2021 at 8:06 AM Eric Dumazet edumazet@google.com wrote:
On Wed, Sep 1, 2021 at 3:52 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2021-09-01 at 18:39 +0800, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Note that MPTCP is currently exploiting sk->sk_tx_skb_cache. If we get this patch goes in as-is, it will break mptcp.
One possible solution would be to let mptcp usage enable sk-
sk_tx_skb_cache, but that has relevant side effects on plain TCP.
Another options would be re-work once again the mptcp xmit path to avoid using sk->sk_tx_skb_cache.
Hmmm, I actually wrote a revert of this feature but forgot to submit it last year.
commit c36cfbd791f62c0f7c6b32132af59dfdbe6be21b (HEAD -> listener_scale4) Author: Eric Dumazet edumazet@google.com Date: Wed May 20 06:38:38 2020 -0700
tcp: remove sk_{tr}x_skb_cache This reverts the following patches : 2e05fcae83c41eb2df10558338dc600dc783af47 ("tcp: fix compile error
if !CONFIG_SYSCTL") 4f661542a40217713f2cee0bb6678fbb30d9d367 ("tcp: fix zerocopy and notsent_lowat issues") 472c2e07eef045145bc1493cc94a01c87140780a ("tcp: add one skb cache for tx") 8b27dae5a2e89a61c46c6dbc76c040c0e6d0ed4c ("tcp: add one skb cache for rx")
Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed. We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes. Signed-off-by: Eric Dumazet <edumazet@google.com>
I will update this commit to also remove the part in MPTCP.
Let's remove this feature and replace it with something less costly.
Paolo, can you work on MPTP side, so that my revert can be then applied ?
You are way too fast, I was still replying to your previous email, asking if I could help :)
I'll a look ASAP. Please, allow for some latency: I'm way slower!
I think the easiest way and the one with less code duplication will require accessing the tcp_mark_push() and skb_entail() helpers from the MPTCP code, making them not static and exposing them e.g. in net/tcp.h. Would that be acceptable or should I look for other options?
Thanks!
Paolo
On Wed, Sep 1, 2021 at 9:01 AM Paolo Abeni pabeni@redhat.com wrote:
I think the easiest way and the one with less code duplication will require accessing the tcp_mark_push() and skb_entail() helpers from the MPTCP code, making them not static and exposing them e.g. in net/tcp.h. Would that be acceptable or should I look for other options?
I think this is fine, really.
On 2021/9/1 18:39, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Also, the sk->sk_tx_skb_cache may be both changed by allocation and freeing side, I assume there may be some implicit protection here too, such as the NAPI protection for rx?
Hi, Eric Is there any implicit protection for sk->sk_tx_skb_cache? As my understanding, sk_stream_alloc_skb() seems to be protected by lock_sock(), and the sk_wmem_free_skb() seems to be mostly happening in NAPI polling for TCP(when ack packet is received) without lock_sock(), so it seems there is no protection here?
On Wed, Sep 1, 2021 at 5:47 PM Yunsheng Lin linyunsheng@huawei.com wrote:
On 2021/9/1 18:39, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Also, the sk->sk_tx_skb_cache may be both changed by allocation and freeing side, I assume there may be some implicit protection here too, such as the NAPI protection for rx?
Hi, Eric Is there any implicit protection for sk->sk_tx_skb_cache? As my understanding, sk_stream_alloc_skb() seems to be protected by lock_sock(), and the sk_wmem_free_skb() seems to be mostly happening in NAPI polling for TCP(when ack packet is received) without lock_sock(), so it seems there is no protection here?
Please look again. This is protected by socket lock of course. Otherwise sk_mem_uncharge() would be very broken, sk->sk_forward_alloc is not an atomic field.
TCP stack has no direct relation with NAPI. It can run over loopback interface, no NAPI there.
On 2021/9/2 9:13, Eric Dumazet wrote:
On Wed, Sep 1, 2021 at 5:47 PM Yunsheng Lin linyunsheng@huawei.com wrote:
On 2021/9/1 18:39, Yunsheng Lin wrote:
Since tcp_tx_skb_cache is disabled by default in: commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to avoid possible branch-misses.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Also, the sk->sk_tx_skb_cache may be both changed by allocation and freeing side, I assume there may be some implicit protection here too, such as the NAPI protection for rx?
Hi, Eric Is there any implicit protection for sk->sk_tx_skb_cache? As my understanding, sk_stream_alloc_skb() seems to be protected by lock_sock(), and the sk_wmem_free_skb() seems to be mostly happening in NAPI polling for TCP(when ack packet is received) without lock_sock(), so it seems there is no protection here?
Please look again. This is protected by socket lock of course. Otherwise sk_mem_uncharge() would be very broken, sk->sk_forward_alloc is not an atomic field.
Thanks for clarifying. I have been looking for a point to implement the socket'pp_alloc_cache for page pool, and sk_wmem_free_skb() seems like the place to avoid the scalablity problem of ptr_ring in page pool.
The protection for sk_wmem_free_skb() is in tcp_v4_rcv(), right? https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_ipv4.c#L2081
TCP stack has no direct relation with NAPI. It can run over loopback interface, no NAPI there. .