On 2021/3/24 9:49, Cong Wang wrote:
On Sun, Mar 21, 2021 at 5:55 PM Yunsheng Lin linyunsheng@huawei.com wrote:
On 2021/3/20 2:15, Cong Wang wrote:
On Thu, Mar 18, 2021 at 12:33 AM Yunsheng Lin linyunsheng@huawei.com wrote:
On 2021/3/17 21:45, Jason A. Donenfeld wrote:
On 3/17/21, Toke Høiland-Jørgensen toke@redhat.com wrote:
Cong Wang xiyou.wangcong@gmail.com writes:
> On Mon, Mar 15, 2021 at 2:07 PM Jakub Kicinski kuba@kernel.org wrote: >> >> I thought pfifo was supposed to be "lockless" and this change >> re-introduces a lock between producer and consumer, no? > > It has never been truly lockless, it uses two spinlocks in the ring > buffer > implementation, and it introduced a q->seqlock recently, with this patch > now we have priv->lock, 4 locks in total. So our "lockless" qdisc ends > up having more locks than others. ;) I don't think we are going to a > right direction...
Just a thought, have you guys considered adopting the lockless MSPC ring buffer recently introduced into Wireguard in commit:
8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers")
Jason indicated he was willing to work on generalising it into a reusable library if there was a use case for it. I haven't quite though through the details of whether this would be such a use case, but figured I'd at least mention it :)
That offer definitely still stands. Generalization sounds like a lot of fun.
Keep in mind though that it's an eventually consistent queue, not an immediately consistent one, so that might not match all use cases. It works with wg because we always trigger the reader thread anew when it finishes, but that doesn't apply to everyone's queueing setup.
Thanks for mentioning this.
"multi-producer, single-consumer" seems to match the lockless qdisc's paradigm too, for now concurrent enqueuing/dequeuing to the pfifo_fast's queues() is not allowed, it is protected by producer_lock or consumer_lock.
So it would be good to has lockless concurrent enqueuing, while dequeuing can be protected by qdisc_lock() or q->seqlock, which meets the "multi-producer, single-consumer" paradigm.
I don't think so. Usually we have one queue for each CPU so we can expect each CPU has a lockless qdisc assigned, but we can not assume this in the code, so we still have to deal with multiple CPU's sharing a lockless qdisc, and we usually enqueue and dequeue in process context, so it means we could have multiple producers and multiple consumers.
For lockless qdisc, dequeuing is always within the qdisc_run_begin() and qdisc_run_end(), so multiple consumers is protected with each other by q->seqlock .
So are you saying you will never go lockless for lockless qdisc? I thought you really want to go lockless with Jason's proposal of MPMC ring buffer code.
I think we has different definition about lockless qdisc.
For my understanding, the dequeuing is within the qdisc_run_begin() and qdisc_run_end(), so it is always protected by q->seqlock for lockless qdisck currently, and by lockless qdisc, I never mean lockless dequeuing, and I am not proposing lockless dequeuing currently.
Current lockless qdisc for pfifo_fast only means there is no lock for protection between dequeuing and enqueuing, which also means when __qdisc_run() is dequeuing a skb while other cpu is enqueuing a skb.
But enqueuing is protected by producer_lock in skb_array_produce(), so only one cpu can do the enqueuing at the same time, so I am proposing to use Jason's proposal to enable multi cpus to do concurrent enqueuing without taking any lock.
For enqueuing, multiple consumers is protected by producer_lock, see pfifo_fast_enqueue() -> skb_array_produce() -> ptr_ring_produce().
I think you seriously misunderstand how we classify MPMC or MPSC, it is not about how we lock them, it is about whether we truly have a single or multiple consumers regardless of locks used, because the goal is to go lockless.
I think I am only relying on the MPSC(multi-produce & single-consumer), as explained above.
I am not sure if lockless MSPC can work with the process context, but even if not, the enqueuing is also protected by rcu_read_lock_bh(), which provides some kind of atomicity, so that producer_lock can be reomved when lockless MSPC is used.
Not sure if I can even understand what you are saying here, Jason's code only disables preemption with busy wait, I can't see why it can not be used in the process context.
I am saying q->enqeue() is protected by rcu_read_lock_bh(). rcu_read_lock_bh() will disable preemption for us for most configuation, otherwise it will break netdev_xmit_more() interface too, for it relies on the cpu not being prempted by using per cpu var(softnet_data.xmit.more).
Thanks.
.