On 2021/9/18 1:15, Eric Dumazet wrote:
On Wed, Sep 15, 2021 at 7:05 PM Yunsheng Lin linyunsheng@huawei.com wrote:
As memtioned before, Tx recycling is based on page_pool instance per socket. it shares the page_pool instance with rx.
Anyway, based on feedback from edumazet and dsahern, I am still trying to see if the page pool is meaningful for tx.
It is not for generic linux TCP stack, but perhaps for benchmarks.
I am not sure I understand what does above means, did you mean tx recycling only benefit the benchmark tool, such as iperf/netperf, but not the real usecase?
Unless you dedicate one TX/RX pair per TCP socket ?
TX/RX pair for netdev queue or TX/RX pair for recycling pool?
As the TX/RX pair for netdev queue, I am not dedicating one TX/RX pair netdev queue per TCP socket.
As the TX/RX pair for recycling pool, my initial thinking is each NAPI/socket context have a 'struct pp_alloc_cache', which provides last-in-first-out and lockless mini pool specific to each NAPI/socket context, and a central locked 'struct ptr_ring' pool based on queue for all the NAPI/socket mini pools, when a NAPI/socket context's mini pool is empty or full, it can refill some page from the central pool or flush some page to the central pool.
I am not sure if the locked central pool is needed or not, or the 'struct ptr_ring' of page pool is right one to be the locked central pool yet.
Most high performance TCP flows are using zerocopy, I am really not sure why we would need to 'optimize' the path that is wasting cpu cycles doing user->kernel copies anyway, at the cost of insane complexity.
As my understanding, zerocopy is mostly about big packet and non-IOMMU case.
As complexity, I am not convinced yet that it is that complex, as it is mostly using the existing infrastructure to support tx recycling.
The point is that most of skb is freed in the context of NAPI or socket, it seems we may utilize that to do batch allocating/freeing of skb/page_frag, or reusing of skb/page_frag/dma mapping to avoid (IO/CPU)TLB miss, cache miss, overhead of spinlock and dma mapping.
.