On Sat, 29 May 2021 09:44:57 +0800 Yunsheng Lin wrote:
@@ -3852,10 +3852,32 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, qdisc_calculate_pkt_len(skb, q);
if (q->flags & TCQ_F_NOLOCK) {
if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
qdisc_run_begin(q)) {
/* Retest nolock_qdisc_is_empty() within the protection
* of q->seqlock to ensure qdisc is indeed empty.
*/
if (unlikely(!nolock_qdisc_is_empty(q))) {
This is just for the DRAINING case right?
MISSED can be set at any moment, testing MISSED seems confusing.
MISSED is only set when there is lock contention, which means it is better not to do the qdisc bypass to avoid out of order packet problem,
Avoid as in make less likely? Nothing guarantees other thread is not interrupted after ->enqueue and before qdisc_run_begin().
TBH I'm not sure what out-of-order situation you're referring to, there is no ordering guarantee between separate threads trying to transmit AFAIU.
IOW this check is not required for correctness, right?
another good thing is that we could also do the batch dequeuing and transmiting of packets when there is lock contention.
No doubt, but did you see the flag get set significantly often here to warrant the double-checking?
Is it really worth the extra code?
Currently DRAINING is only set for the netdev queue stopped. We could only use DRAINING to indicate the non-empty of a qdisc, then we need to set the DRAINING evrywhere MISSED is set, that is why I use both DRAINING and MISSED to indicate a non-empty qdisc.
rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
__qdisc_run(q);
qdisc_run_end(q);
goto no_lock_out;
}
qdisc_bstats_cpu_update(q, skb);
if (sch_direct_xmit(skb, q, dev, txq, NULL, true) &&
!nolock_qdisc_is_empty(q))
__qdisc_run(q);
qdisc_run_end(q);
return NET_XMIT_SUCCESS;
}
- rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
if (likely(!netif_xmit_frozen_or_stopped(txq)))
qdisc_run(q);
qdisc_run(q);
+no_lock_out: if (unlikely(to_free)) kfree_skb_list(to_free); return rc;
@@ -164,9 +166,13 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
skb = next;
}
- if (lock)
- if (lock) { spin_unlock(lock);
- __netif_schedule(q);
set_bit(__QDISC_STATE_MISSED, &q->state);
- } else {
__netif_schedule(q);
Could we reorder qdisc_run_begin() with clear_bit(SCHED) in net_tx_action() and add SCHED to the NON_EMPTY mask?
Did you mean clearing the SCHED after the q->seqlock is taken?
The problem is that the SCHED is also used to indicate a qdisc is in sd->output_queue or not, and the qdisc_run_begin() called by net_tx_action() can not guarantee it will take the q->seqlock(we are using trylock for lockless qdisc)
Ah, right. We'd need to do some more flag juggling in net_tx_action() to get it right.
- }
}
static void try_bulk_dequeue_skb(struct Qdisc *q, @@ -409,7 +415,11 @@ void __qdisc_run(struct Qdisc *q) while (qdisc_restart(q, &packets)) { quota -= packets; if (quota <= 0) {
__netif_schedule(q);
if (q->flags & TCQ_F_NOLOCK)
set_bit(__QDISC_STATE_MISSED, &q->state);
else
__netif_schedule(q);
} }break;
@@ -680,13 +690,14 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) if (likely(skb)) { qdisc_update_stats_at_dequeue(qdisc, skb); } else if (need_retry &&
test_bit(__QDISC_STATE_MISSED, &qdisc->state)) {
READ_ONCE(qdisc->state) & QDISC_STATE_NON_EMPTY) {
Do we really need to retry based on DRAINING being set? Or is it just a convenient way of coding things up?
Yes, it is just a convenient way of coding things up. Only MISSED need retrying.