From: moyufeng Sent: Wednesday, June 2, 2021 1:13 PM To: tanhuazhong tanhuazhong@huawei.com; shenjian (K) shenjian15@huawei.com; lipeng (Y) lipeng321@huawei.com; Zhuangyuzeng (Yisen) yisen.zhuang@huawei.com; linyunsheng linyunsheng@huawei.com; zhangjiaran zhangjiaran@huawei.com; huangguangbin (A) huangguangbin2@huawei.com; chenhao (DY) chenhao288@hisilicon.com; moyufeng moyufeng@huawei.com; Salil Mehta salil.mehta@huawei.com Subject: [PATCH net-next 7/7] {topost} net: hns3: use bounce buffer when rx page can not be reused
From: Yunsheng Lin linyunsheng@huawei.com
Currently rx page will be reused to receive future packet when the stack releases the previous skb quickly. If the old page can not be reused, a new page will be allocated and mapped, which comsumes a lot of cpu when IOMMU is in the strict mode, especially when the application and irq/NAPI happens to run on the same cpu.
So allocate a new frag to memcpy the data to avoid the costly IOMMU unmapping/mapping operation, and add "frag_alloc_err" and "frag_alloc" stats in "ethtool -S ethX" cmd.
The throughput improves above 50% when running single thread of iperf using TCP when IOMMU is in strict mode and iperf shares the same cpu with irq/NAPI(rx_copybreak = 2048 and mtu = 1500).
Performance gains are quite good!
Few questions:
How we have ensured this will work efficiently in real world workloads and that there are no repercussions?
Also, is there any impact on end-to-end *latency* or *jitter* because of this vis-a-vis without this approach?
Also, have you checked why MLX5 driver has removed this copybreak concept for small packets but MLX4 did had this or why other latest drivers don't have this?
Hope I have not missed this anywhere but what are the default values for both {rx,tx}_copybreak?
Thanks