On Sat, 19 Jun 2021 10:30:09 +0000 Yunsheng Lin wrote:
When debugging pointed to the misordering between STATE_MISSED setting/clearing and STATE_MISSED checking, only _after_atomic() was added first, and it did not fix the misordering problem, when both _before_atomic() and _after_atomic() were added, the misordering problem disappeared.
I suppose _before_atomic() matters because the STATE_MISSED setting and the lock rechecking is only done when first check of STATE_MISSED returns false. _before_atomic() is used to make sure the first check returns correct result, if it does not return the correct result, then we may have misordering problem too.
cpu0 cpu1 clear MISSED _after_atomic() dequeue enqueue
first trylock() #false MISSED check #*true* ?
As above, even cpu1 has a _after_atomic() between clearing STATE_MISSED and dequeuing, we might stiil need a barrier to prevent cpu0 doing speculative MISSED checking before cpu1 clearing MISSED?
And the implicit load-acquire barrier contained in the first trylock() does not seems to prevent the above case too.
And there is no load-acquire barrier in pfifo_fast_dequeue() too, which possibly make the above case more likely to happen.
Ah, you're right. The test_bit() was not in the patch context, I forgot it's there... Both barriers are indeed needed.