On Wed, Jul 7, 2021 at 12:03 PM Ilias Apalodimas ilias.apalodimas@linaro.org wrote:
Hi, Alexander
Thanks for detailed reviewing.
Likewise! I'll have a look on the entire conversation in a few days...
So this isn't going to work with the current recycling logic. The expectation there is that we can safely unmap the entire page as soon as the reference count is greater than 1.
Yes, the expectation is changed to we can always recycle the page when the last user has dropped the refcnt that has given to it when the page is not pfmemalloced.
The above expectation is based on that the last user will always call page_pool_put_full_page() in order to do the recycling or do the resource cleanup(dma unmaping..etc).
As the skb_free_head() and skb_release_data() have both checked the skb->pp_recycle to call the page_pool_put_full_page() if needed, I think we are safe for most case, the one case I am not so sure above is the rx zero copy, which seems to also bump up the refcnt before mapping the page to user space, we might need to ensure rx zero copy is not the last user of the page or if it is the last user, make sure it calls page_pool_put_full_page() too.
Yes, but the skb->pp_recycle value is per skb, not per page. So my concern is that carrying around that value can be problematic as there are a number of possible cases where the pages might be unintentionally recycled. All it would take is for a packet to get cloned a few times and then somebody starts using pskb_expand_head and you would have multiple cases, possibly simultaneously, of entities trying to free the page. I just worry it opens us up to a number of possible races.
Maybe I missde something, but I thought the cloned SKBs would never trigger the recycling path, since they are protected by the atomic dataref check in skb_release_data(). What am I missing?
Are you talking about the head frag? So normally a clone wouldn't cause an issue because the head isn't changed. In the case of the head_frag we should be safe since pskb_expand_head will just kmalloc the new head and clears head_frag so it won't trigger page_pool_return_skb_page on the head_frag since the dataref just goes from 2 to 1.
The problem is that pskb_expand_head memcopies the page frags over and takes a reference on the pages. At that point you would have two skbs both pointing to the same set of pages and each one ready to call page_pool_return_skb_page on the pages at any time and possibly racing with the other.
I suspect if they both called it at roughly the same time one of them would trigger a NULL pointer dereference since they would both check pp_magic first, and then both set pp to NULL. If run on a system where dma_unmap_page_attrs takes a while it would be very likely to race since pp_magic doesn't get cleared until after the page is unmapped.