[PATCH openEuler-5.10 36/59] mvpp2: recycle buffers

11 Nov 2021

From: Matteo Croce <mcroce@microsoft.com>

mainline inclusion
from mainline-v5.14-rc1
commit 133637fcfab2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...

----------------------------------------------------------------------

Use the new recycling API for page_pool.
In a drop rate test, the packet rate is almost doubled,
from 1110 Kpps to 2128 Kpps.

perf top on a stock system shows:

Overhead  Shared Object     Symbol
  34.88%  [kernel]          [k] page_pool_release_page
   8.06%  [kernel]          [k] free_unref_page
   6.42%  [mvpp2]           [k] mvpp2_rx
   6.07%  [kernel]          [k] eth_type_trans
   5.18%  [kernel]          [k] __netif_receive_skb_core
   4.95%  [kernel]          [k] build_skb
   4.88%  [kernel]          [k] kmem_cache_free
   3.97%  [kernel]          [k] kmem_cache_alloc
   3.45%  [kernel]          [k] dev_gro_receive
   2.73%  [kernel]          [k] page_frag_free
   2.07%  [kernel]          [k] __alloc_pages_bulk
   1.99%  [kernel]          [k] arch_local_irq_save
   1.84%  [kernel]          [k] skb_release_data
   1.20%  [kernel]          [k] netif_receive_skb_list_internal

With packet rate stable at 1100 Kpps:

tx: 0 bps 0 pps rx: 532.7 Mbps 1110 Kpps
tx: 0 bps 0 pps rx: 532.6 Mbps 1110 Kpps
tx: 0 bps 0 pps rx: 532.4 Mbps 1109 Kpps
tx: 0 bps 0 pps rx: 532.1 Mbps 1109 Kpps
tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps
tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps

And this is the same output with recycling enabled:

Overhead  Shared Object     Symbol
  12.91%  [kernel]          [k] eth_type_trans
  12.54%  [mvpp2]           [k] mvpp2_rx
   9.67%  [kernel]          [k] build_skb
   9.63%  [kernel]          [k] __netif_receive_skb_core
   8.44%  [kernel]          [k] page_pool_put_page
   8.07%  [kernel]          [k] kmem_cache_free
   7.79%  [kernel]          [k] kmem_cache_alloc
   6.86%  [kernel]          [k] dev_gro_receive
   3.19%  [kernel]          [k] skb_release_data
   2.41%  [kernel]          [k] netif_receive_skb_list_internal
   2.18%  [kernel]          [k] page_pool_refill_alloc_cache
   1.76%  [kernel]          [k] napi_gro_receive
   1.61%  [kernel]          [k] kfree_skb
   1.20%  [kernel]          [k] dma_sync_single_for_device
   1.16%  [mvpp2]           [k] mvpp2_poll
   1.12%  [mvpp2]           [k] mvpp2_read

With packet rate above 2100 Kpps:

tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1021 Mbps 2127 Kpps
tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1022 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1022 Mbps 2129 Kpps

The major performance increase is explained by the fact that the most CPU
consuming functions (page_pool_release_page, page_frag_free and
free_unref_page) are no longer called on a per packet basis.

The test was done by sending to the macchiatobin 64 byte ethernet frames
with an invalid ethertype, so the packets are dropped early in the RX path.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Yongxin Li <liyongxin1@huawei.com>
Signed-off-by: Junxin Chen <chenjunxin1@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index bafdf287c478..e4360b1803bf 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3639,7 +3639,7 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
 		}
 
 		if (pp)
-			page_pool_release_page(pp, virt_to_page(data));
+			skb_mark_for_recycle(skb, virt_to_page(data), pp);
 		else
 			dma_unmap_single_attrs(dev->dev.parent, dma_addr,
 					       bm_pool->buf_size, DMA_FROM_DEVICE,
-- 
2.20.1