From: Barry Song v-songbaohua@oppo.com
mainline inclusion from mainline-v6.11-rc1 commit 15bde4abab734c687c1f81704886aba3a70c268e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAJ5MT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()", v2.
This patchset is preparatory work for mTHP swapin.
folio_add_new_anon_rmap() assumes that new anon rmaps are always exclusive. However, this assumption doesn’t hold true for cases like do_swap_page(), where a new anon might be added to the swapcache and is not necessarily exclusive.
The patchset extends the rmap flags to allow folio_add_new_anon_rmap() to handle both exclusive and non-exclusive new anon folios. The do_swap_page() function is updated to use this extended API with rmap flags. Consequently, all new anon folios now consistently use folio_add_new_anon_rmap(). The special case for !folio_test_anon() in __folio_add_anon_rmap() can be safely removed.
In conclusion, new anon folios always use folio_add_new_anon_rmap(), regardless of exclusivity. Old anon folios continue to use __folio_add_anon_rmap() via folio_add_anon_rmap_pmd() and folio_add_anon_rmap_ptes().
This patch (of 3):
In the case of a swap-in, a new anonymous folio is not necessarily exclusive. This patch updates the rmap flags to allow a new anonymous folio to be treated as either exclusive or non-exclusive. To maintain the existing behavior, we always use EXCLUSIVE as the default setting.
[akpm@linux-foundation.org: cleanup and constifications per David and akpm] [v-songbaohua@oppo.com: fix missing doc for flags of folio_add_new_anon_rmap()] Link: https://lkml.kernel.org/r/20240619210641.62542-1-21cnbao@gmail.com [v-songbaohua@oppo.com: enhance doc for extend rmap flags arguments for folio_add_new_anon_rmap] Link: https://lkml.kernel.org/r/20240622030256.43775-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240617231137.80726-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240617231137.80726-2-21cnbao@gmail.com Signed-off-by: Barry Song v-songbaohua@oppo.com Suggested-by: David Hildenbrand david@redhat.com Tested-by: Shuai Yuan yuanshuai@oppo.com Acked-by: David Hildenbrand david@redhat.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: Chris Li chrisl@kernel.org Cc: "Huang, Ying" ying.huang@intel.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Suren Baghdasaryan surenb@google.com Cc: Yang Shi shy828301@gmail.com Cc: Yosry Ahmed yosryahmed@google.com Cc: Yu Zhao yuzhao@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: kernel/events/uprobes.c mm/khugepaged.c mm/memory.c mm/migrate_device.c mm/rmap.c mm/userswap.c [ Context conflicts in uprobes.c, khugepaged.c, memory.c migrate_device.c with commit 3a5a643c852a Context conflicts in memory.c due to miss commit f7842747d13d Context conflicts in rmap.c due to miss commit 05c5323b2a34 Fix folio_add_new_anon_rmap() in userswap.c ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- include/linux/rmap.h | 2 +- kernel/events/uprobes.c | 2 +- mm/huge_memory.c | 2 +- mm/khugepaged.c | 2 +- mm/memory.c | 10 +++++----- mm/migrate_device.c | 2 +- mm/rmap.c | 25 ++++++++++++++++--------- mm/swapfile.c | 2 +- mm/userfaultfd.c | 2 +- mm/userswap.c | 6 +++--- 10 files changed, 31 insertions(+), 24 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index c4092c494cd1e..c422af4855cfc 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -236,7 +236,7 @@ void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages, void folio_add_anon_rmap_pmd(struct folio *, struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, - unsigned long address); + unsigned long address, rmap_t flags); void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages, struct vm_area_struct *); #define folio_add_file_rmap_pte(folio, page, vma) \ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 88a6ad10dff06..8986c452ff079 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -182,7 +182,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, if (new_page) { folio_get(new_folio); add_reliable_folio_counter(new_folio, mm, folio_nr_pages(new_folio)); - folio_add_new_anon_rmap(new_folio, vma, addr); + folio_add_new_anon_rmap(new_folio, vma, addr, RMAP_EXCLUSIVE); folio_add_lru_vma(new_folio, vma); } else /* no new page, just dec_mm_counter for old_page */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fec3ee2c020b0..af6a5c840e276 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1187,7 +1187,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - folio_add_new_anon_rmap(folio, vma, haddr); + folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 13c5935e3a410..8006b13304de4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1258,7 +1258,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); add_reliable_page_counter(hpage, vma->vm_mm, HPAGE_PMD_NR); - folio_add_new_anon_rmap(folio, vma, address); + folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); diff --git a/mm/memory.c b/mm/memory.c index 72f575909f7fe..5c0b6d08b68f6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -923,7 +923,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma *prealloc = NULL; copy_user_highpage(&new_folio->page, page, addr, src_vma); __folio_mark_uptodate(new_folio); - folio_add_new_anon_rmap(new_folio, dst_vma, addr); + folio_add_new_anon_rmap(new_folio, dst_vma, addr, RMAP_EXCLUSIVE); folio_add_lru_vma(new_folio, dst_vma); rss[MM_ANONPAGES]++;
@@ -3363,7 +3363,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * some TLBs while the old PTE remains in others. */ ptep_clear_flush(vma, vmf->address, vmf->pte); - folio_add_new_anon_rmap(new_folio, vma, vmf->address); + folio_add_new_anon_rmap(new_folio, vma, vmf->address, RMAP_EXCLUSIVE); folio_add_lru_vma(new_folio, vma); /* * We call the notify macro here because, when using secondary @@ -4293,7 +4293,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */ if (unlikely(folio != swapcache && swapcache)) { - folio_add_new_anon_rmap(folio, vma, address); + folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); } else { folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address, @@ -4549,7 +4549,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC); #endif add_reliable_folio_counter(folio, vma->vm_mm, nr_pages); - folio_add_new_anon_rmap(folio, vma, addr); + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); setpte: if (vmf_orig_pte_uffd_wp(vmf)) @@ -4749,7 +4749,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, add_reliable_folio_counter(folio, vma->vm_mm, nr); if (write && !(vma->vm_flags & VM_SHARED)) { VM_BUG_ON_FOLIO(nr != 1, folio); - folio_add_new_anon_rmap(folio, vma, addr); + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); } else { folio_add_file_rmap_ptes(folio, page, nr, vma); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 5c9400931b74d..6998768a72973 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -657,7 +657,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
inc_mm_counter(mm, MM_ANONPAGES); add_reliable_page_counter(page, mm, 1); - folio_add_new_anon_rmap(folio, vma, addr); + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); if (!folio_is_zone_device(folio)) folio_add_lru_vma(folio, vma); folio_get(folio); diff --git a/mm/rmap.c b/mm/rmap.c index b1baaacd9f595..32ac6796113a5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1386,30 +1386,35 @@ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page, * @folio: The folio to add the mapping to. * @vma: the vm area in which the mapping is added * @address: the user virtual address mapped + * @flags: The rmap flags * * Like folio_add_anon_rmap_*() but must only be called on *new* folios. * This means the inc-and-test can be bypassed. - * The folio does not have to be locked. + * The folio doesn't necessarily need to be locked while it's exclusive + * unless two threads map it concurrently. However, the folio must be + * locked if it's shared. * - * If the folio is pmd-mappable, it is accounted as a THP. As the folio - * is new, it's assumed to be mapped exclusively by a single process. + * If the folio is pmd-mappable, it is accounted as a THP. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, - unsigned long address) + unsigned long address, rmap_t flags) { - int nr = folio_nr_pages(folio); + const int nr = folio_nr_pages(folio); + const bool exclusive = flags & RMAP_EXCLUSIVE; int nr_pmdmapped = 0;
VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + VM_WARN_ON_FOLIO(!exclusive && !folio_test_locked(folio), folio); VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); - __folio_set_anon(folio, vma, address, true); + __folio_set_anon(folio, vma, address, exclusive);
if (likely(!folio_test_large(folio))) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); - SetPageAnonExclusive(&folio->page); + if (exclusive) + SetPageAnonExclusive(&folio->page); } else if (!folio_test_pmd_mappable(folio)) { int i;
@@ -1418,7 +1423,8 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
/* increment count (starts at -1) */ atomic_set(&page->_mapcount, 0); - SetPageAnonExclusive(page); + if (exclusive) + SetPageAnonExclusive(page); }
atomic_set(&folio->_nr_pages_mapped, nr); @@ -1426,7 +1432,8 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED); - SetPageAnonExclusive(&folio->page); + if (exclusive) + SetPageAnonExclusive(&folio->page); nr_pmdmapped = nr; }
diff --git a/mm/swapfile.c b/mm/swapfile.c index 10c8670044880..1b2b3bea06c8d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1995,7 +1995,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags); } else { /* ksm created a completely new copy */ - folio_add_new_anon_rmap(folio, vma, addr); + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); } new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot)); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 588b2c4262f19..4ab24c56f6601 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -117,7 +117,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, folio_add_lru(folio); folio_add_file_rmap_pte(folio, page, dst_vma); } else { - folio_add_new_anon_rmap(folio, dst_vma, dst_addr); + folio_add_new_anon_rmap(folio, dst_vma, dst_addr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, dst_vma); }
diff --git a/mm/userswap.c b/mm/userswap.c index 4951a3f665829..22e3f147ce5f9 100644 --- a/mm/userswap.c +++ b/mm/userswap.c @@ -194,7 +194,7 @@ static unsigned long vm_insert_anon_page(struct vm_area_struct *vma,
inc_mm_counter(mm, MM_ANONPAGES); add_reliable_page_counter(page, mm, 1); - folio_add_new_anon_rmap(page_folio(page), vma, addr); + folio_add_new_anon_rmap(page_folio(page), vma, addr, RMAP_EXCLUSIVE); dst_pte = mk_pte(page, vma->vm_page_prot); if (vma->vm_flags & VM_WRITE) dst_pte = pte_mkwrite_novma(pte_mkdirty(dst_pte)); @@ -220,7 +220,7 @@ static void uswap_map_anon_page(struct mm_struct *mm, set_pte_at(mm, addr, pte, old_pte); inc_mm_counter(mm, MM_ANONPAGES); add_reliable_page_counter(page, mm, 1); - folio_add_new_anon_rmap(page_folio(page), vma, addr); + folio_add_new_anon_rmap(page_folio(page), vma, addr, RMAP_EXCLUSIVE); pte_unmap_unlock(pte, ptl); }
@@ -535,7 +535,7 @@ int mfill_atomic_pte_nocopy(struct mm_struct *mm, pmd_t *dst_pmd,
inc_mm_counter(mm, MM_ANONPAGES); add_reliable_page_counter(page, mm, 1); - folio_add_new_anon_rmap(page_folio(page), dst_vma, dst_addr); + folio_add_new_anon_rmap(page_folio(page), dst_vma, dst_addr, RMAP_EXCLUSIVE); set_pte_at(mm, dst_addr, pte, dst_pte);
/* No need to invalidate - it was non-present before */