On 2021/3/23 14:37, Ahmad Fatoum wrote:
Hi,
On 22.03.21 10:09, Yunsheng Lin wrote:
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK flag set, but queue discipline by-pass does not work for lockless qdisc because skb is always enqueued to qdisc even when the qdisc is empty, see __dev_xmit_skb().
This patch calls sch_direct_xmit() to transmit the skb directly to the driver for empty lockless qdisc too, which aviod enqueuing and dequeuing operation. qdisc->empty is set to false whenever a skb is enqueued, see pfifo_fast_enqueue(), and is set to true when skb dequeuing return NULL, see pfifo_fast_dequeue().
There is a data race between enqueue/dequeue and qdisc->empty setting, qdisc->empty is only used as a hint, so we need to call sch_may_need_requeuing() to see if the queue is really empty and if there is requeued skb, which has higher priority than the current skb.
The performance for ip_forward test increases about 10% with this patch.
Signed-off-by: Yunsheng Lin linyunsheng@huawei.com
Hi, Vladimir and Ahmad Please give it a test to see if there is any out of order packet for this patch, which has removed the priv->lock added in RFC v2.
Overnight test (10h, 64 mil frames) didn't see any out-of-order frames between 2 FlexCANs on a dual core machine:
Tested-by: Ahmad Fatoum a.fatoum@pengutronix.de
No performance measurements taken.
Thanks for the testing. And I has done the performance measurement.
L3 forward testing improves from 1.09Mpps to 1.21Mpps, still about 10% improvement.
pktgen + dummy netdev:
threads without+this_patch with+this_patch delta 1 2.56Mpps 3.11Mpps +21% 2 3.76Mpps 4.31Mpps +14% 4 5.51Mpps 5.53Mpps +0.3% 8 2.81Mpps 2.72Mpps -3% 16 2.24Mpps 2.22Mpps -0.8%