From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.10-rc1 commit 02faa73f174c4d1e11cb9a421f9a8eac0dd881f1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAE0PK
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "mm: mapcount for large folios + page_mapcount() cleanups".
This series tracks the mapcount of large folios in a single value, so it can be read efficiently and atomically, just like the mapcount of small folios.
folio_mapcount() is then used in a couple more places, most notably to reduce false negatives in folio_likely_mapped_shared(), and many users of page_mapcount() are cleaned up (that's maybe why you got CCed on the full series, sorry sh+xtensa folks! :) ).
The remaining s390x user and one KSM user of page_mapcount() are getting removed separately on the list right now. I have patches to handle the other KSM one, the khugepaged one and the kpagecount one; as they are not as "obvious", I will send them out separately in the future. Once that is all in place, I'm planning on moving page_mapcount() into fs/proc/task_mmu.c, the remaining user for the time being (and we can discuss at LSF/MM details on that :) ).
I proposed the mapcount for large folios (previously called total mapcount) originally in part of [1] and I later included it in [2] where it is a requirement. In the meantime, I changed the patch a bit so I dropped all RB's. During the discussion of [1], Peter Xu correctly raised that this additional tracking might affect the performance when PMD->PTE remapping THPs. In the meantime. I addressed that by batching RMAP operations during fork(), unmap/zap and when PMD->PTE remapping THPs.
Running some of my micro-benchmarks [3] (fork,munmap,cow-byte,remap) on 1 GiB of memory backed by folios with the same order, I observe the following on an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz tuned for reproducible results as much as possible:
Standard deviation is mostly < 1%, except for order-9, where it's < 2% for fork() and munmap().
(1) Small folios are not affected (< 1%) in all 4 microbenchmarks. (2) Order-4 folios are not affected (< 1%) in all 4 microbenchmarks. A bit weird comapred to the other orders ... (3) PMD->PTE remapping of order-9 THPs is not affected (< 1%) (4) COW-byte (COWing a single page by writing a single byte) is not affected for any order (< 1 %). The page copy_fault overhead dominates everything. (5) fork() is mostly not affected (< 1%), except order-2, where we have a slowdown of ~4%. Already for order-3 folios, we're down to a slowdown of < 1%. (6) munmap() sees a slowdown by < 3% for some orders (order-5, order-6, order-9), but less for others (< 1% for order-4 and order-8, < 2% for order-2, order-3, order-7).
Especially the fork() and munmap() benchmark are sensitive to each added instruction and other system noise, so I suspect some of the change and observed weirdness (order-4) is due to code layout changes and other factors, but not really due to the added atomics.
So in the common case where we can batch, the added atomics don't really make a big difference, especially in light of the recent improvements for large folios that we recently gained due to batching. Surprisingly, for some cases where we cannot batch (e.g., COW), the added atomics don't seem to matter, because other overhead dominates.
My fork and munmap micro-benchmarks don't cover cases where we cannot batch-process bigger parts of large folios. As this is not the common case, I'm not worrying about that right now.
Future work is batching RMAP operations during swapout and folio migration.
[1] https://lore.kernel.org/all/20230809083256.699513-1-david@redhat.com/ [2] https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/ [3] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio...
This patch (of 18):
Commit 53277bcf126d ("mm: support page_mapcount() on page_has_type() pages") made it impossible to detect mapcount underflows by treating any negative raw mapcount value as a mapcount of 0.
We perform such underflow checks in zap_present_folio_ptes() and zap_huge_pmd(), which would currently no longer trigger.
Let's check against PAGE_MAPCOUNT_RESERVE instead by using page_type_has_type(), like page_has_type() would, so we can still catch some underflows.
[david@redhat.com: make page_mapcount() slighly more efficient] Link: https://lkml.kernel.org/r/1af4fd61-7926-47c8-be45-833c0dbec08b@redhat.com Link: https://lkml.kernel.org/r/20240409192301.907377-1-david@redhat.com Link: https://lkml.kernel.org/r/20240409192301.907377-2-david@redhat.com Fixes: 53277bcf126d ("mm: support page_mapcount() on page_has_type() pages") Signed-off-by: David Hildenbrand david@redhat.com Cc: Chris Zankel chris@zankel.net Cc: Hugh Dickins hughd@google.com Cc: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Cc: Jonathan Corbet corbet@lwn.net Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Max Filippov jcmvbkbc@gmail.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Peter Xu peterx@redhat.com Cc: Richard Chang richardycc@google.com Cc: Rich Felker dalias@libc.org Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yang Shi shy828301@gmail.com Cc: Yin Fengwei fengwei.yin@intel.com Cc: Yoshinori Sato ysato@users.sourceforge.jp Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 9b71b877c8d3..7d485ce6f94d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1228,7 +1228,7 @@ static inline int page_mapcount(struct page *page) int mapcount = atomic_read(&page->_mapcount) + 1;
/* Handle page_has_type() pages */ - if (mapcount < 0) + if (mapcount < PAGE_MAPCOUNT_RESERVE + 1) mapcount = 0; if (unlikely(PageCompound(page))) mapcount += folio_entire_mapcount(page_folio(page));