On Thu, Mar 18, 2021 at 12:33 AM Yunsheng Lin linyunsheng@huawei.com wrote:
On 2021/3/17 21:45, Jason A. Donenfeld wrote:
On 3/17/21, Toke Høiland-Jørgensen toke@redhat.com wrote:
Cong Wang xiyou.wangcong@gmail.com writes:
On Mon, Mar 15, 2021 at 2:07 PM Jakub Kicinski kuba@kernel.org wrote:
I thought pfifo was supposed to be "lockless" and this change re-introduces a lock between producer and consumer, no?
It has never been truly lockless, it uses two spinlocks in the ring buffer implementation, and it introduced a q->seqlock recently, with this patch now we have priv->lock, 4 locks in total. So our "lockless" qdisc ends up having more locks than others. ;) I don't think we are going to a right direction...
Just a thought, have you guys considered adopting the lockless MSPC ring buffer recently introduced into Wireguard in commit:
8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers")
Jason indicated he was willing to work on generalising it into a reusable library if there was a use case for it. I haven't quite though through the details of whether this would be such a use case, but figured I'd at least mention it :)
That offer definitely still stands. Generalization sounds like a lot of fun.
Keep in mind though that it's an eventually consistent queue, not an immediately consistent one, so that might not match all use cases. It works with wg because we always trigger the reader thread anew when it finishes, but that doesn't apply to everyone's queueing setup.
Thanks for mentioning this.
"multi-producer, single-consumer" seems to match the lockless qdisc's paradigm too, for now concurrent enqueuing/dequeuing to the pfifo_fast's queues() is not allowed, it is protected by producer_lock or consumer_lock.
So it would be good to has lockless concurrent enqueuing, while dequeuing can be protected by qdisc_lock() or q->seqlock, which meets the "multi-producer, single-consumer" paradigm.
I don't think so. Usually we have one queue for each CPU so we can expect each CPU has a lockless qdisc assigned, but we can not assume this in the code, so we still have to deal with multiple CPU's sharing a lockless qdisc, and we usually enqueue and dequeue in process context, so it means we could have multiple producers and multiple consumers.
On the other hand, I don't think the problems we have been fixing are the ring buffer implementation itself, they are about the high-level qdisc state transitions.
Thanks.