[Linuxarm] Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

30 May 2021

      On 2021/5/30 2:49, Jakub Kicinski wrote:
...
On Sat, 29 May 2021 15:03:09 +0800 Yunsheng Lin wrote:
...
On 2021/5/29 12:32, Jakub Kicinski wrote:
...
On Sat, 29 May 2021 09:44:57 +0800 Yunsheng Lin wrote:
...
MISSED is only set when there is lock contention, which means it
is better not to do the qdisc bypass to avoid out of order packet
problem,
Avoid as in make less likely? Nothing guarantees other thread is not
interrupted after ->enqueue and before qdisc_run_begin().
TBH I'm not sure what out-of-order situation you're referring to,
there is no ordering guarantee between separate threads trying to
transmit AFAIU.  
A thread need to do the bypass checking before doing enqueuing, so
it means MISSED is set or the trylock fails for the bypass transmiting(
which will set the MISSED after the first trylock), so the MISSED will
always be set before a thread doing a enqueuing, and we ensure MISSED
only be cleared during the protection of q->seqlock, after clearing
MISSED, we do anther round of dequeuing within the protection of
q->seqlock.
The fact that MISSED is only cleared under q->seqlock does not matter,
because setting it and ->enqueue() are not under any lock. If the thread
gets interrupted between:
if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
      qdisc_run_begin(q)) {
and ->enqueue() we can't guarantee that something else won't come in,
take q->seqlock and clear MISSED.
thread1                thread2             thread3
# holds seqlock
                       qdisc_run_begin(q)
                       set(MISSED)
pfifo_fast_dequeue
  clear(MISSED)
  # recheck the queue
qdisc_run_end()
                       ->enqueue()
                                            q->flags & TCQ_F_CAN_BYPASS..
                                            qdisc_run_begin() # true
                                            sch_direct_xmit()
                       qdisc_run_begin()
                       set(MISSED)
Or am I missing something?
Re-checking nolock_qdisc_is_empty() may or may not help.
But it doesn't really matter because there is no ordering
requirement between thread2 and thread3 here.
I were more focued on explaining that using MISSED is reliable
as sch_may_need_requeuing() checking in RFCv3 [1] to indicate a
empty qdisc, and forgot to mention the data race described in
RFCv3, which is kind of like the one described above:

"There is a data race as below:

      CPU1                                   CPU2
qdisc_run_begin(q)                            .
        .                                q->enqueue()
sch_may_need_requeuing()                      .
    return true                               .
        .                                     .
        .                                     .
    q->enqueue()                              .

When above happen, the skb enqueued by CPU1 is dequeued after the
skb enqueued by CPU2 because sch_may_need_requeuing() return true.
If there is not qdisc bypass, the CPU1 has better chance to queue
the skb quicker than CPU2.

This patch does not take care of the above data race, because I
view this as similar as below:

Even at the same time CPU1 and CPU2 write the skb to two socket
which both heading to the same qdisc, there is no guarantee that
which skb will hit the qdisc first, becuase there is a lot of
factor like interrupt/softirq/cache miss/scheduling afffecting
that."

Does above make sense? Or any idea to avoid it?

1. https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-...
...
...
So if a thread has taken the q->seqlock and the MISSED is not set yet,
it is allowed to send the packet directly without going through the
qdisc enqueuing and dequeuing process.
...
IOW this check is not required for correctness, right?
if a thread has taken the q->seqlock and the MISSED is not set, it means
other thread has not set MISSED after the first trylock and before the
second trylock, which means the enqueuing is not done yet.
So I assume the this check is required for correctness if I understand
your question correctly.
...
...
another good thing is that we could also do the batch
dequeuing and transmiting of packets when there is lock contention.
No doubt, but did you see the flag get set significantly often here 
to warrant the double-checking?
No, that is just my guess:)