The rmap interface overhaul is to speeds up the folio operations.
David Hildenbrand (43): mm/rmap: rename hugepage_add* to hugetlb_add* mm/rmap: introduce and use hugetlb_remove_rmap() mm/rmap: introduce and use hugetlb_add_file_rmap() mm/rmap: introduce and use hugetlb_try_dup_anon_rmap() mm/rmap: introduce and use hugetlb_try_share_anon_rmap() mm/rmap: add hugetlb sanity checks for anon rmap handling mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]() mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]() mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd() mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte() mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte() mm/rmap: remove page_add_file_rmap() mm/rmap: factor out adding folio mappings into __folio_add_rmap() mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]() mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd() mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/rmap: remove page_add_anon_rmap() mm/rmap: remove RMAP_COMPOUND mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]() kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte() mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd() mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte() mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte() mm/memory: page_remove_rmap() -> folio_remove_rmap_pte() mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte() mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte() Documentation: stop referring to page_remove_rmap() mm/rmap: remove page_remove_rmap() mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]() mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]() mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd() mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte() mm/rmap: remove page_try_dup_anon_rmap() mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED mm: remove one last reference to page_add_*_rmap() mm/rmap: silence VM_WARN_ON_FOLIO() in __folio_rmap_sanity_checks() mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty() mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range()
Kefeng Wang (1): mm: userswap: page_remove_rmap() -> folio_remove_rmap_pte()
Vishal Moola (Oracle) (5): mm/khugepaged: convert __collapse_huge_page_isolate() to use folios mm/khugepaged: convert hpage_collapse_scan_pmd() to use folios mm/khugepaged: convert is_refcount_suitable() to use folios mm/khugepaged: convert alloc_charge_hpage() to use folios mm/khugepaged: convert collapse_pte_mapped_thp() to use folios
Documentation/mm/transhuge.rst | 4 +- Documentation/mm/unevictable-lru.rst | 4 +- include/linux/memcontrol.h | 14 - include/linux/mm.h | 6 +- include/linux/rmap.h | 404 ++++++++++++++++++++----- kernel/events/uprobes.c | 2 +- mm/filemap.c | 10 +- mm/gup.c | 2 +- mm/huge_memory.c | 85 +++--- mm/hugetlb.c | 21 +- mm/internal.h | 14 +- mm/khugepaged.c | 154 +++++----- mm/ksm.c | 15 +- mm/memory-failure.c | 4 +- mm/memory.c | 60 ++-- mm/migrate.c | 12 +- mm/migrate_device.c | 41 +-- mm/mmu_gather.c | 2 +- mm/rmap.c | 433 ++++++++++++++++----------- mm/swapfile.c | 2 +- mm/userfaultfd.c | 2 +- mm/userswap.c | 2 +- 22 files changed, 815 insertions(+), 478 deletions(-)
From: "Vishal Moola (Oracle)" vishal.moola@gmail.com
mainline inclusion from mainline-v6.7-rc1 commit 8dd1e896735f6e5abf66525dfd39bbd7b8c0c6d6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Patch series "Some khugepaged folio conversions", v3.
This patchset converts a number of functions to use folios. This cleans up some khugepaged code and removes a large number of hidden compound_head() calls.
This patch (of 5):
Replaces 11 calls to compound_head() with 1, and removes 1348 bytes of kernel text.
Link: https://lkml.kernel.org/r/20231020183331.10770-1-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20231020183331.10770-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 8dd1e896735f6e5abf66525dfd39bbd7b8c0c6d6) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/khugepaged.c | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0a7af8f81b3..08096337ae8e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -550,6 +550,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, struct list_head *compound_pagelist) { struct page *page = NULL; + struct folio *folio = NULL; pte_t *_pte; int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; bool writable = false; @@ -584,7 +585,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, goto out; }
- VM_BUG_ON_PAGE(!PageAnon(page), page); + folio = page_folio(page); + VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
if (page_mapcount(page) > 1) { ++shared; @@ -596,16 +598,15 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } }
- if (PageCompound(page)) { - struct page *p; - page = compound_head(page); + if (folio_test_large(folio)) { + struct folio *f;
/* * Check if we have dealt with the compound page * already */ - list_for_each_entry(p, compound_pagelist, lru) { - if (page == p) + list_for_each_entry(f, compound_pagelist, lru) { + if (folio == f) goto next; } } @@ -616,7 +617,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * is needed to serialize against split_huge_page * when invoked from the VM. */ - if (!trylock_page(page)) { + if (!folio_trylock(folio)) { result = SCAN_PAGE_LOCK; goto out; } @@ -632,8 +633,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * but not from this process. The other process cannot write to * the page, only trigger CoW. */ - if (!is_refcount_suitable(page)) { - unlock_page(page); + if (!is_refcount_suitable(&folio->page)) { + folio_unlock(folio); result = SCAN_PAGE_COUNT; goto out; } @@ -642,27 +643,27 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * Isolate the page to avoid collapsing an hugepage * currently in use by the VM. */ - if (!isolate_lru_page(page)) { - unlock_page(page); + if (!folio_isolate_lru(folio)) { + folio_unlock(folio); result = SCAN_DEL_PAGE_LRU; goto out; } - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_is_file_lru(page), - compound_nr(page)); - VM_BUG_ON_PAGE(!PageLocked(page), page); - VM_BUG_ON_PAGE(PageLRU(page), page); + node_stat_mod_folio(folio, + NR_ISOLATED_ANON + folio_is_file_lru(folio), + folio_nr_pages(folio)); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
- if (PageCompound(page)) - list_add_tail(&page->lru, compound_pagelist); + if (folio_test_large(folio)) + list_add_tail(&folio->lru, compound_pagelist); next: /* * If collapse was initiated by khugepaged, check that there is * enough young pte to justify collapsing the page */ if (cc->is_khugepaged && - (pte_young(pteval) || page_is_young(page) || - PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + (pte_young(pteval) || folio_test_young(folio) || + folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, address))) referenced++;
@@ -676,13 +677,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; - trace_mm_collapse_huge_page_isolate(page, none_or_zero, + trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, referenced, writable, result); return result; } out: release_pte_pages(pte, _pte, compound_pagelist); - trace_mm_collapse_huge_page_isolate(page, none_or_zero, + trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, referenced, writable, result); return result; }
From: "Vishal Moola (Oracle)" vishal.moola@gmail.com
mainline inclusion from mainline-v6.7-rc1 commit 5c07ebb372d66423e508ecfb8e00324f8797f072 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Replaces 5 calls to compound_head(), and removes 1385 bytes of kernel text.
Link: https://lkml.kernel.org/r/20231020183331.10770-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Rik van Riel riel@surriel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 5c07ebb372d66423e508ecfb8e00324f8797f072) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/khugepaged.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 08096337ae8e..0e456ca09690 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1262,6 +1262,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, int result = SCAN_FAIL, referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; + struct folio *folio = NULL; unsigned long _address; spinlock_t *ptl; int node = NUMA_NO_NODE, unmapped = 0; @@ -1349,29 +1350,28 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, } }
- page = compound_head(page); - + folio = page_folio(page); /* * Record which node the original page is from and save this * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ - node = page_to_nid(page); + node = folio_nid(folio); if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } cc->node_load[node]++; - if (!PageLRU(page)) { + if (!folio_test_lru(folio)) { result = SCAN_PAGE_LRU; goto out_unmap; } - if (PageLocked(page)) { + if (folio_test_locked(folio)) { result = SCAN_PAGE_LOCK; goto out_unmap; } - if (!PageAnon(page)) { + if (!folio_test_anon(folio)) { result = SCAN_PAGE_ANON; goto out_unmap; } @@ -1391,7 +1391,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, * has excessive GUP pins (i.e. 512). Anyway the same check * will be done again later the risk seems low. */ - if (!is_refcount_suitable(page)) { + if (!is_refcount_suitable(&folio->page)) { result = SCAN_PAGE_COUNT; goto out_unmap; } @@ -1401,8 +1401,8 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, * enough young pte to justify collapsing the page */ if (cc->is_khugepaged && - (pte_young(pteval) || page_is_young(page) || - PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + (pte_young(pteval) || folio_test_young(folio) || + folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, address))) referenced++;
@@ -1427,7 +1427,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, *mmap_locked = false; } out: - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, + trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, none_or_zero, result, unmapped); return result; }
From: "Vishal Moola (Oracle)" vishal.moola@gmail.com
mainline inclusion from mainline-v6.7-rc1 commit dbf85c21e4aff90912b5d7755d2b25611f9191e9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Both callers of is_refcount_suitable() have been converted to use folios, so convert it to take in a folio. Both callers only operate on head pages of folios so mapcount/refcount conversions here are trivial.
Removes 3 calls to compound head, and removes 315 bytes of kernel text.
Link: https://lkml.kernel.org/r/20231020183331.10770-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit dbf85c21e4aff90912b5d7755d2b25611f9191e9) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/khugepaged.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0e456ca09690..06cab9f68d55 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -532,15 +532,15 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte, } }
-static bool is_refcount_suitable(struct page *page) +static bool is_refcount_suitable(struct folio *folio) { int expected_refcount;
- expected_refcount = total_mapcount(page); - if (PageSwapCache(page)) - expected_refcount += compound_nr(page); + expected_refcount = folio_mapcount(folio); + if (folio_test_swapcache(folio)) + expected_refcount += folio_nr_pages(folio);
- return page_count(page) == expected_refcount; + return folio_ref_count(folio) == expected_refcount; }
static int __collapse_huge_page_isolate(struct vm_area_struct *vma, @@ -633,7 +633,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * but not from this process. The other process cannot write to * the page, only trigger CoW. */ - if (!is_refcount_suitable(&folio->page)) { + if (!is_refcount_suitable(folio)) { folio_unlock(folio); result = SCAN_PAGE_COUNT; goto out; @@ -1391,7 +1391,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, * has excessive GUP pins (i.e. 512). Anyway the same check * will be done again later the risk seems low. */ - if (!is_refcount_suitable(&folio->page)) { + if (!is_refcount_suitable(folio)) { result = SCAN_PAGE_COUNT; goto out_unmap; }
From: "Vishal Moola (Oracle)" vishal.moola@gmail.com
mainline inclusion from mainline-v6.7-rc1 commit b455f39d228935f88eebcd1f7c1a6981093c6a3b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Also remove count_memcg_page_event now that its last caller no longer uses it and reword hpage_collapse_alloc_page() to hpage_collapse_alloc_folio().
This removes 1 call to compound_head() and helps convert khugepaged to use folios throughout.
Link: https://lkml.kernel.org/r/20231020183331.10770-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Rik van Riel riel@surriel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit b455f39d228935f88eebcd1f7c1a6981093c6a3b) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/memcontrol.h | 14 -------------- mm/khugepaged.c | 17 ++++++++++------- 2 files changed, 10 insertions(+), 21 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e86ba420159d..53adaf4e39be 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1213,15 +1213,6 @@ static inline void count_memcg_events(struct mem_cgroup *memcg, local_irq_restore(flags); }
-static inline void count_memcg_page_event(struct page *page, - enum vm_event_item idx) -{ - struct mem_cgroup *memcg = page_memcg(page); - - if (memcg) - count_memcg_events(memcg, idx, 1); -} - static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { @@ -1741,11 +1732,6 @@ static inline void __count_memcg_events(struct mem_cgroup *memcg, { }
-static inline void count_memcg_page_event(struct page *page, - int idx) -{ -} - static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 06cab9f68d55..c60baa5fcb8a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -898,16 +898,16 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc) } #endif
-static bool hpage_collapse_alloc_page(struct page **hpage, gfp_t gfp, int node, +static bool hpage_collapse_alloc_folio(struct folio **folio, gfp_t gfp, int node, nodemask_t *nmask) { - *hpage = __alloc_pages(gfp, HPAGE_PMD_ORDER, node, nmask); - if (unlikely(!*hpage)) { + *folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, nmask); + + if (unlikely(!*folio)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); return false; }
- folio_prep_large_rmappable((struct folio *)*hpage); count_vm_event(THP_COLLAPSE_ALLOC); return true; } @@ -1077,17 +1077,20 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, if (cc->reliable) gfp |= GFP_RELIABLE;
- if (!hpage_collapse_alloc_page(hpage, gfp, node, &cc->alloc_nmask)) + if (!hpage_collapse_alloc_folio(&folio, gfp, node, &cc->alloc_nmask)) { + *hpage = NULL; return SCAN_ALLOC_HUGE_PAGE_FAIL; + }
- folio = page_folio(*hpage); if (unlikely(mem_cgroup_charge(folio, mm, gfp))) { folio_put(folio); *hpage = NULL; return SCAN_CGROUP_CHARGE_FAIL; } - count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC);
+ count_memcg_folio_events(folio, THP_COLLAPSE_ALLOC, 1); + + *hpage = folio_page(folio, 0); return SCAN_SUCCEED; }
From: "Vishal Moola (Oracle)" vishal.moola@gmail.com
mainline inclusion from mainline-v6.7-rc1 commit 98b32d296d95d7aa0516c36b72406277412268cd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
This removes 2 calls to compound_head() and helps convert khugepaged to use folios throughout.
Previously, if the address passed to collapse_pte_mapped_thp() corresponded to a tail page, the scan would fail immediately. Using filemap_lock_folio() we get the corresponding folio back and try to operate on the folio instead.
Link: https://lkml.kernel.org/r/20231020183331.10770-6-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Rik van Riel riel@surriel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 98b32d296d95d7aa0516c36b72406277412268cd) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/khugepaged.c | 45 ++++++++++++++++++++------------------------- 1 file changed, 20 insertions(+), 25 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c60baa5fcb8a..37ff3679043c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1500,7 +1500,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool notified = false; unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = vma_lookup(mm, haddr); - struct page *hpage; + struct folio *folio; pte_t *start_pte, *pte; pmd_t *pmd, pgt_pmd; spinlock_t *pml = NULL, *ptl; @@ -1534,19 +1534,14 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, if (userfaultfd_wp(vma)) return SCAN_PTE_UFFD_WP;
- hpage = find_lock_page(vma->vm_file->f_mapping, + folio = filemap_lock_folio(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); - if (!hpage) + if (IS_ERR(folio)) return SCAN_PAGE_NULL;
- if (!PageHead(hpage)) { - result = SCAN_FAIL; - goto drop_hpage; - } - - if (compound_order(hpage) != HPAGE_PMD_ORDER) { + if (folio_order(folio) != HPAGE_PMD_ORDER) { result = SCAN_PAGE_COMPOUND; - goto drop_hpage; + goto drop_folio; }
result = find_pmd_or_thp_or_none(mm, haddr, &pmd); @@ -1560,13 +1555,13 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, */ goto maybe_install_pmd; default: - goto drop_hpage; + goto drop_folio; }
result = SCAN_FAIL; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); if (!start_pte) /* mmap_lock + page lock should prevent this */ - goto drop_hpage; + goto drop_folio;
/* step 1: check all mapped PTEs are to the right huge page */ for (i = 0, addr = haddr, pte = start_pte; @@ -1591,7 +1586,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, * Note that uprobe, debugger, or MAP_PRIVATE may change the * page table, but the new page will not be a subpage of hpage. */ - if (hpage + i != page) + if (folio_page(folio, i) != page) goto abort; }
@@ -1606,7 +1601,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, * page_table_lock) ptl nests inside pml. The less time we hold pml, * the better; but userfaultfd's mfill_atomic_pte() on a private VMA * inserts a valid as-if-COWed PTE without even looking up page cache. - * So page lock of hpage does not protect from it, so we must not drop + * So page lock of folio does not protect from it, so we must not drop * ptl before pgt_pmd is removed, so uffd private needs pml taken now. */ if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) @@ -1630,7 +1625,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, continue; /* * We dropped ptl after the first scan, to do the mmu_notifier: - * page lock stops more PTEs of the hpage being faulted in, but + * page lock stops more PTEs of the folio being faulted in, but * does not stop write faults COWing anon copies from existing * PTEs; and does not stop those being swapped out or migrated. */ @@ -1639,7 +1634,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto abort; } page = vm_normal_page(vma, addr, ptent); - if (hpage + i != page) + if (folio_page(folio, i) != page) goto abort;
/* @@ -1659,8 +1654,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
/* step 3: set proper refcount and mm_counters. */ if (nr_ptes) { - page_ref_sub(hpage, nr_ptes); - add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); + folio_ref_sub(folio, nr_ptes); + add_mm_counter(mm, mm_counter_file(&folio->page), -nr_ptes); }
/* step 4: remove empty page table */ @@ -1684,14 +1679,14 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, maybe_install_pmd: /* step 5: install pmd entry */ result = install_pmd - ? set_huge_pmd(vma, haddr, pmd, hpage) + ? set_huge_pmd(vma, haddr, pmd, &folio->page) : SCAN_SUCCEED; - goto drop_hpage; + goto drop_folio; abort: if (nr_ptes) { flush_tlb_mm(mm); - page_ref_sub(hpage, nr_ptes); - add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); + folio_ref_sub(folio, nr_ptes); + add_mm_counter(mm, mm_counter_file(&folio->page), -nr_ptes); } if (start_pte) pte_unmap_unlock(start_pte, ptl); @@ -1699,9 +1694,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, spin_unlock(pml); if (notified) mmu_notifier_invalidate_range_end(&range); -drop_hpage: - unlock_page(hpage); - put_page(hpage); +drop_folio: + folio_unlock(folio); + folio_put(folio); return result; }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 9d5fafd5d882446999366f673ab06edba453f862 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Patch series "mm/rmap: interface overhaul", v2.
This series overhauls the rmap interface, to get rid of the "bool compound" / RMAP_COMPOUND parameter with the goal of making the interface less error prone, more future proof, and more natural to extend to "batching". Also, this converts the interface to always consume folio+subpage, which speeds up operations on large folios.
Further, this series adds PTE-batching variants for 4 rmap functions, whereby only folio_add_anon_rmap_ptes() is used for batching in this series when PTE-remapping a PMD-mapped THP. folio_remove_rmap_ptes(), folio_try_dup_anon_rmap_ptes() and folio_dup_file_rmap_ptes() will soon come in handy[1,2].
This series performs a lot of folio conversion along the way. Most of the added LOC in the diff are only due to documentation.
As we're moving to a pte/pmd interface where we clearly express the mapping granularity we are dealing with, we first get the remainder of hugetlb out of the way, as it is special and expected to remain special: it treats everything as a "single logical PTE" and only currently allows entire mappings.
Even if we'd ever support partial mappings, I strongly assume the interface and implementation will still differ heavily: hopefull we can avoid working on subpages/subpage mapcounts completely and only add a "count" parameter for them to enable batching.
New (extended) hugetlb interface that operates on entire folio: * hugetlb_add_new_anon_rmap() -> Already existed * hugetlb_add_anon_rmap() -> Already existed * hugetlb_try_dup_anon_rmap() * hugetlb_try_share_anon_rmap() * hugetlb_add_file_rmap() * hugetlb_remove_rmap()
New "ordinary" interface for small folios / THP:: * folio_add_new_anon_rmap() -> Already existed * folio_add_anon_rmap_[pte|ptes|pmd]() * folio_try_dup_anon_rmap_[pte|ptes|pmd]() * folio_try_share_anon_rmap_[pte|pmd]() * folio_add_file_rmap_[pte|ptes|pmd]() * folio_dup_file_rmap_[pte|ptes|pmd]() * folio_remove_rmap_[pte|ptes|pmd]()
folio_add_new_anon_rmap() will always map at the largest granularity possible (currently, a single PMD to cover a PMD-sized THP). Could be extended if ever required.
In the future, we might want "_pud" variants and eventually "_pmds" variants for batching.
I ran some simple microbenchmarks on an Intel(R) Xeon(R) Silver 4210R: measuring munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE remapping PMD-mapped THPs on 1 GiB of memory.
For small folios, there is barely a change (< 1% improvement for me).
For PTE-mapped THP: * PTE-remapping a PMD-mapped THP is more than 10% faster. * fork() is more than 4% faster. * MADV_DONTNEED is 2% faster * COW when writing only a single byte on a COW-shared PTE is 1% faster * munmap() barely changes (< 1%).
[1] https://lkml.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com [2] https://lkml.kernel.org/r/20231204105440.61448-1-ryan.roberts@arm.com
This patch (of 40):
Let's just call it "hugetlb_".
Yes, it's all already inconsistent and confusing because we have a lot of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly be confused with transparent huge pages, and it matches "hugetlb.c" and "folio_test_hugetlb()". So let's minimize confusion in rmap code.
Link: https://lkml.kernel.org/r/20231220224504.646757-1-david@redhat.com Link: https://lkml.kernel.org/r/20231220224504.646757-2-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 9d5fafd5d882446999366f673ab06edba453f862) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 4 ++-- mm/hugetlb.c | 8 ++++---- mm/migrate.c | 4 ++-- mm/rmap.c | 8 ++++---- 4 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 3c2fc291b071..456b8b82249e 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -203,9 +203,9 @@ void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound);
-void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *, +void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address, rmap_t flags); -void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *, +void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address);
static inline void __page_dup_rmap(struct page *page, bool compound) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d3648572898..4a7b1260721c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5230,7 +5230,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
__folio_mark_uptodate(new_folio); - hugepage_add_new_anon_rmap(new_folio, vma, addr); + hugetlb_add_new_anon_rmap(new_folio, vma, addr); if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) newpte = huge_pte_mkuffd_wp(newpte); set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); @@ -5940,7 +5940,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); page_remove_rmap(&old_folio->page, vma, true); - hugepage_add_new_anon_rmap(new_folio, vma, haddr); + hugetlb_add_new_anon_rmap(new_folio, vma, haddr); if (huge_pte_uffd_wp(pte)) newpte = huge_pte_mkuffd_wp(newpte); set_huge_pte_at(mm, haddr, ptep, newpte, huge_page_size(h)); @@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, goto backout;
if (anon_rmap) - hugepage_add_new_anon_rmap(folio, vma, haddr); + hugetlb_add_new_anon_rmap(folio, vma, haddr); else page_dup_file_rmap(&folio->page, true); new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) @@ -6656,7 +6656,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, if (folio_in_pagecache) page_dup_file_rmap(&folio->page, true); else - hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr); + hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);
/* * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY diff --git a/mm/migrate.c b/mm/migrate.c index 1e88f81d2369..84f1e46355a8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -249,8 +249,8 @@ static bool remove_migration_pte(struct folio *folio,
pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) - hugepage_add_anon_rmap(folio, vma, pvmw.address, - rmap_flags); + hugetlb_add_anon_rmap(folio, vma, pvmw.address, + rmap_flags); else page_dup_file_rmap(new, true); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte, diff --git a/mm/rmap.c b/mm/rmap.c index 5c3d20916949..2796c07dcde1 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2602,8 +2602,8 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) * * RMAP_COMPOUND is ignored. */ -void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, - unsigned long address, rmap_t flags) +void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, + unsigned long address, rmap_t flags) { VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
@@ -2614,8 +2614,8 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, PageAnonExclusive(&folio->page), folio); }
-void hugepage_add_new_anon_rmap(struct folio *folio, - struct vm_area_struct *vma, unsigned long address) +void hugetlb_add_new_anon_rmap(struct folio *folio, + struct vm_area_struct *vma, unsigned long address) { BUG_ON(address < vma->vm_start || address >= vma->vm_end); /* increment count (starts at -1) */
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit e135826b2da0cf25305086dc9ac1e91718a148e1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface.
Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb code from page_remove_rmap(). This effectively removes one check on the small-folio path as well.
Add sanity checks that we end up with the right folios in the right functions.
Note: all possible candidates that need care are page_remove_rmap() that pass compound=true.
Link: https://lkml.kernel.org/r/20231220224504.646757-3-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Hugh Dickins hughd@google.com Cc: Muchun Song muchun.song@linux.dev Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit e135826b2da0cf25305086dc9ac1e91718a148e1) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 7 +++++++ mm/hugetlb.c | 4 ++-- mm/rmap.c | 18 +++++++++--------- 3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 456b8b82249e..556366dafa8f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -208,6 +208,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address);
+static inline void hugetlb_remove_rmap(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + + atomic_dec(&folio->_entire_mapcount); +} + static inline void __page_dup_rmap(struct page *page, bool compound) { if (compound) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4a7b1260721c..c6e9b3f7d542 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5628,7 +5628,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, make_pte_marker(PTE_MARKER_UFFD_WP), sz); hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_remove_rmap(page_folio(page));
spin_unlock(ptl); tlb_remove_page_size(tlb, page, huge_page_size(h)); @@ -5939,7 +5939,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
/* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); - page_remove_rmap(&old_folio->page, vma, true); + hugetlb_remove_rmap(old_folio); hugetlb_add_new_anon_rmap(new_folio, vma, haddr); if (huge_pte_uffd_wp(pte)) newpte = huge_pte_mkuffd_wp(newpte); diff --git a/mm/rmap.c b/mm/rmap.c index 2796c07dcde1..460c589b7720 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1450,15 +1450,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, bool last; enum node_stat_item idx;
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); VM_BUG_ON_PAGE(compound && !PageHead(page), page);
- /* Hugetlb pages are not counted in NR_*MAPPED */ - if (unlikely(folio_test_hugetlb(folio))) { - /* hugetlb pages are always mapped with pmds */ - atomic_dec(&folio->_entire_mapcount); - return; - } - /* Is page being unmapped by PTE? Is this its last map to be removed? */ if (likely(!compound)) { last = atomic_add_negative(-1, &page->_mapcount); @@ -1821,7 +1815,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, add_reliable_folio_counter(folio, mm, -1); } discard: - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (unlikely(folio_test_hugetlb(folio))) + hugetlb_remove_rmap(folio); + else + page_remove_rmap(subpage, vma, false); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); @@ -2176,7 +2173,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ }
- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (unlikely(folio_test_hugetlb(folio))) + hugetlb_remove_rmap(folio); + else + page_remove_rmap(subpage, vma, false); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 44887f39945519fa8405133b1acd098fda9c9746 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface.
Right now we're using page_dup_file_rmap() in some cases where "ordinary" rmap code would have used page_add_file_rmap(). So let's introduce and use hugetlb_add_file_rmap() instead. We won't be adding a "hugetlb_dup_file_rmap()" functon for the fork() case, as it would be doing the same: "dup" is just an optimization for "add".
What remains is a single page_dup_file_rmap() call in fork() code.
Add sanity checks that we end up with the right folios in the right functions.
Link: https://lkml.kernel.org/r/20231220224504.646757-4-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 44887f39945519fa8405133b1acd098fda9c9746) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 8 ++++++++ mm/hugetlb.c | 6 +++--- mm/migrate.c | 2 +- mm/rmap.c | 1 + 4 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 556366dafa8f..74a5f4d3979f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -208,6 +208,14 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address);
+static inline void hugetlb_add_file_rmap(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); + + atomic_inc(&folio->_entire_mapcount); +} + static inline void hugetlb_remove_rmap(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c6e9b3f7d542..0b3d6af04079 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5353,7 +5353,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * sleep during the process. */ if (!folio_test_anon(pte_folio)) { - page_dup_file_rmap(&pte_folio->page, true); + hugetlb_add_file_rmap(pte_folio); } else if (page_try_dup_anon_rmap(&pte_folio->page, true, src_vma)) { pte_t src_pte_old = entry; @@ -6230,7 +6230,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, if (anon_rmap) hugetlb_add_new_anon_rmap(folio, vma, haddr); else - page_dup_file_rmap(&folio->page, true); + hugetlb_add_file_rmap(folio); new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); /* @@ -6654,7 +6654,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out_release_unlock;
if (folio_in_pagecache) - page_dup_file_rmap(&folio->page, true); + hugetlb_add_file_rmap(folio); else hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);
diff --git a/mm/migrate.c b/mm/migrate.c index 84f1e46355a8..49270a60deb5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio, hugetlb_add_anon_rmap(folio, vma, pvmw.address, rmap_flags); else - page_dup_file_rmap(new, true); + hugetlb_add_file_rmap(folio); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte, psize); } else diff --git a/mm/rmap.c b/mm/rmap.c index 460c589b7720..ef7979695ef3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1365,6 +1365,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, unsigned int nr_pmdmapped = 0, first; int nr = 0;
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
/* Is page being mapped by PTE? Is this its first map to be added? */
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit ebe2e35ec0f256372c158a18de459fb60070b313 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface.
So let's introduce and use hugetlb_try_dup_anon_rmap() to make all hugetlb handling use dedicated hugetlb_* rmap functions.
Add sanity checks that we end up with the right folios in the right functions.
Note that is_device_private_page() does not apply to hugetlb.
Link: https://lkml.kernel.org/r/20231220224504.646757-5-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit ebe2e35ec0f256372c158a18de459fb60070b313) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/mm.h | 12 +++++++++--- include/linux/rmap.h | 18 ++++++++++++++++++ mm/hugetlb.c | 3 +-- 3 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index ffa00d1cd17d..3f2fe3b85dc0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1980,15 +1980,21 @@ static inline bool page_maybe_dma_pinned(struct page *page) * * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq. */ -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, - struct page *page) +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma, + struct folio *folio) { VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));
if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)) return false;
- return page_maybe_dma_pinned(page); + return folio_maybe_dma_pinned(folio); +} + +static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, + struct page *page) +{ + return folio_needs_cow_for_dma(vma, page_folio(page)); }
/** diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 74a5f4d3979f..9091895117e5 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -208,6 +208,22 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address);
+/* See page_try_dup_anon_rmap() */ +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, + struct vm_area_struct *vma) +{ + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + + if (PageAnonExclusive(&folio->page)) { + if (unlikely(folio_needs_cow_for_dma(vma, folio))) + return -EBUSY; + ClearPageAnonExclusive(&folio->page); + } + atomic_inc(&folio->_entire_mapcount); + return 0; +} + static inline void hugetlb_add_file_rmap(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); @@ -225,6 +241,8 @@ static inline void hugetlb_remove_rmap(struct folio *folio)
static inline void __page_dup_rmap(struct page *page, bool compound) { + VM_WARN_ON(folio_test_hugetlb(page_folio(page))); + if (compound) { struct folio *folio = (struct folio *)page;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0b3d6af04079..3a4a06f5e836 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5354,8 +5354,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (!folio_test_anon(pte_folio)) { hugetlb_add_file_rmap(pte_folio); - } else if (page_try_dup_anon_rmap(&pte_folio->page, - true, src_vma)) { + } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) { pte_t src_pte_old = entry; struct folio *new_folio;
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 0c2ec32bf0b2f0d7ccb98c53ee5d255d68e73595 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
hugetlb rmap handling differs quite a lot from "ordinary" rmap code. For example, hugetlb currently only supports entire mappings, and treats any mapping as mapped using a single "logical PTE". Let's move it out of the way so we can overhaul our "ordinary" rmap. implementation/interface.
So let's introduce and use hugetlb_try_dup_anon_rmap() to make all hugetlb handling use dedicated hugetlb_* rmap functions.
Add sanity checks that we end up with the right folios in the right functions.
Note that try_to_unmap_one() does not need care. Easy to spot because among all that nasty hugetlb special-casing in that function, we're not using set_huge_pte_at() on the anon path -- well, and that code assumes that we would want to swapout.
Link: https://lkml.kernel.org/r/20231220224504.646757-6-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 0c2ec32bf0b2f0d7ccb98c53ee5d255d68e73595) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 25 +++++++++++++++++++++++++ mm/rmap.c | 15 ++++++++++----- 2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9091895117e5..315671c3b227 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -224,6 +224,30 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, return 0; }
+/* See page_try_share_anon_rmap() */ +static inline int hugetlb_try_share_anon_rmap(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio); + + /* Paired with the memory barrier in try_grab_folio(). */ + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) + smp_mb(); + + if (unlikely(folio_maybe_dma_pinned(folio))) + return -EBUSY; + ClearPageAnonExclusive(&folio->page); + + /* + * This is conceptually a smp_wmb() paired with the smp_rmb() in + * gup_must_unshare(). + */ + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) + smp_mb__after_atomic(); + return 0; +} + static inline void hugetlb_add_file_rmap(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); @@ -328,6 +352,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, */ static inline int page_try_share_anon_rmap(struct page *page) { + VM_WARN_ON(folio_test_hugetlb(page_folio(page))); VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page);
/* device private pages cannot get pinned via GUP. */ diff --git a/mm/rmap.c b/mm/rmap.c index ef7979695ef3..386b760d0a46 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2126,13 +2126,18 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage);
/* See page_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && - page_try_share_anon_rmap(subpage)) { - if (folio_test_hugetlb(folio)) + if (folio_test_hugetlb(folio)) { + if (anon_exclusive && + hugetlb_try_share_anon_rmap(folio)) { set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); - else - set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; + } + } else if (anon_exclusive && + page_try_share_anon_rmap(subpage)) { + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break;
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit a4ea18641d8330a97d7d66f0ab017b690099ffce category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's make sure we end up with the right folios in the right functions when adding an anon rmap, just like we already do in the other rmap functions.
Link: https://lkml.kernel.org/r/20231220224504.646757-7-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit a4ea18641d8330a97d7d66f0ab017b690099ffce) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/rmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/mm/rmap.c b/mm/rmap.c index 386b760d0a46..b3db82943d31 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1232,6 +1232,8 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, bool compound = flags & RMAP_COMPOUND; bool first;
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + /* Is page being mapped by PTE? Is this its first map to be added? */ if (likely(!compound)) { first = atomic_inc_and_test(&page->_mapcount); @@ -1313,6 +1315,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, { int nr = folio_nr_pages(folio);
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); @@ -2611,6 +2614,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
atomic_inc(&folio->_entire_mapcount); @@ -2623,6 +2627,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, void hugetlb_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + BUG_ON(address < vma->vm_start || address >= vma->vm_end); /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 68f0320824fa59c5429cbc811e6c46e7a30ea32c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's get rid of the compound parameter and instead define explicitly which mappings we're adding. That is more future proof, easier to read and harder to mess up.
Use an enum to express the granularity internally. Make the compiler always special-case on the granularity by using __always_inline. Replace the "compound" check by a switch-case that will be removed by the compiler completely.
Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the folio_test_pmd_mappable() check by a config check in the caller and sanity checks. Convert the single user of folio_add_file_rmap_range().
While at it, consistently use "int" instead of "unisgned int" in rmap code when dealing with mapcounts and the number of pages.
This function design can later easily be extended to PUDs and to batch PMDs. Note that for now we don't support anything bigger than PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks will catch if that ever changes.
Next up is removing page_remove_rmap() along with its "compound" parameter and smilarly converting all other rmap functions.
Link: https://lkml.kernel.org/r/20231220224504.646757-8-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 68f0320824fa59c5429cbc811e6c46e7a30ea32c) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 46 ++++++++++++++++++++++++-- mm/memory.c | 2 +- mm/rmap.c | 79 ++++++++++++++++++++++++++++---------------- 3 files changed, 95 insertions(+), 32 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 315671c3b227..44d9160e22c3 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -186,6 +186,44 @@ typedef int __bitwise rmap_t; */ #define RMAP_COMPOUND ((__force rmap_t)BIT(1))
+/* + * Internally, we're using an enum to specify the granularity. We make the + * compiler emit specialized code for each granularity. + */ +enum rmap_level { + RMAP_LEVEL_PTE = 0, + RMAP_LEVEL_PMD, +}; + +static inline void __folio_rmap_sanity_checks(struct folio *folio, + struct page *page, int nr_pages, enum rmap_level level) +{ + /* hugetlb folios are handled separately. */ + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + VM_WARN_ON_FOLIO(folio_test_large(folio) && + !folio_test_large_rmappable(folio), folio); + + VM_WARN_ON_ONCE(nr_pages <= 0); + VM_WARN_ON_FOLIO(page_folio(page) != folio, folio); + VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio); + + switch (level) { + case RMAP_LEVEL_PTE: + break; + case RMAP_LEVEL_PMD: + /* + * We don't support folios larger than a single PMD yet. So + * when RMAP_LEVEL_PMD is set, we assume that we are creating + * a single "entire" mapping of the folio. + */ + VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio); + VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio); + break; + default: + VM_WARN_ON_ONCE(true); + } +} + /* * rmap interfaces called when adding or removing pte of page */ @@ -198,8 +236,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); -void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, - struct vm_area_struct *, bool compound); +void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages, + struct vm_area_struct *); +#define folio_add_file_rmap_pte(folio, page, vma) \ + folio_add_file_rmap_ptes(folio, page, 1, vma) +void folio_add_file_rmap_pmd(struct folio *, struct page *, + struct vm_area_struct *); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound);
diff --git a/mm/memory.c b/mm/memory.c index 13012c3294b4..bc10d5e40954 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4533,7 +4533,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, folio_add_lru_vma(folio, vma); } else { add_mm_counter(vma->vm_mm, mm_counter_file(page), nr); - folio_add_file_rmap_range(folio, page, nr, vma, false); + folio_add_file_rmap_ptes(folio, page, nr, vma); } set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
diff --git a/mm/rmap.c b/mm/rmap.c index b3db82943d31..c4d32ad0138f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1348,31 +1348,18 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); }
-/** - * folio_add_file_rmap_range - add pte mapping to page range of a folio - * @folio: The folio to add the mapping to - * @page: The first page to add - * @nr_pages: The number of pages which will be mapped - * @vma: the vm area in which the mapping is added - * @compound: charge the page as compound or small page - * - * The page range of folio is defined by [first_page, first_page + nr_pages) - * - * The caller needs to hold the pte lock. - */ -void folio_add_file_rmap_range(struct folio *folio, struct page *page, - unsigned int nr_pages, struct vm_area_struct *vma, - bool compound) +static __always_inline void __folio_add_file_rmap(struct folio *folio, + struct page *page, int nr_pages, struct vm_area_struct *vma, + enum rmap_level level) { atomic_t *mapped = &folio->_nr_pages_mapped; - unsigned int nr_pmdmapped = 0, first; - int nr = 0; + int nr = 0, nr_pmdmapped = 0, first;
- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); - VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); + __folio_rmap_sanity_checks(folio, page, nr_pages, level);
- /* Is page being mapped by PTE? Is this its first map to be added? */ - if (likely(!compound)) { + switch (level) { + case RMAP_LEVEL_PTE: do { first = atomic_inc_and_test(&page->_mapcount); if (first && folio_test_large(folio)) { @@ -1383,9 +1370,8 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, if (first) nr++; } while (page++, --nr_pages > 0); - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ - + break; + case RMAP_LEVEL_PMD: first = atomic_inc_and_test(&folio->_entire_mapcount); if (first) { nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); @@ -1400,6 +1386,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, nr = 0; } } + break; }
if (nr_pmdmapped) @@ -1413,6 +1400,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, mlock_vma_folio(folio, vma); }
+/** + * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio + * @folio: The folio to add the mappings to + * @page: The first page to add + * @nr_pages: The number of pages that will be mapped using PTEs + * @vma: The vm area in which the mappings are added + * + * The page range of the folio is defined by [page, page + nr_pages) + * + * The caller needs to hold the page table lock. + */ +void folio_add_file_rmap_ptes(struct folio *folio, struct page *page, + int nr_pages, struct vm_area_struct *vma) +{ + __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_LEVEL_PTE); +} + +/** + * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio + * @folio: The folio to add the mapping to + * @page: The first page to add + * @vma: The vm area in which the mapping is added + * + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) + * + * The caller needs to hold the page table lock. + */ +void folio_add_file_rmap_pmd(struct folio *folio, struct page *page, + struct vm_area_struct *vma) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_LEVEL_PMD); +#else + WARN_ON_ONCE(true); +#endif +} + /** * page_add_file_rmap - add pte mapping to a file page * @page: the page to add the mapping to @@ -1425,16 +1449,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, bool compound) { struct folio *folio = page_folio(page); - unsigned int nr_pages;
VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
if (likely(!compound)) - nr_pages = 1; + folio_add_file_rmap_pte(folio, page, vma); else - nr_pages = folio_nr_pages(folio); - - folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); + folio_add_file_rmap_pmd(folio, page, vma); }
/**
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit ef37b2ea08ace7b5fbcd569d703be1903afd12f9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert insert_page_into_pte_locked() and do_set_pmd(). While at it, perform some folio conversion.
Link: https://lkml.kernel.org/r/20231220224504.646757-9-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit ef37b2ea08ace7b5fbcd569d703be1903afd12f9) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/memory.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index bc10d5e40954..5b8214c729bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1859,12 +1859,14 @@ static int validate_page_before_insert(struct page *page) static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte, unsigned long addr, struct page *page, pgprot_t prot) { + struct folio *folio = page_folio(page); + if (!pte_none(ptep_get(pte))) return -EBUSY; /* Ok, finally just insert the thing.. */ - get_page(page); + folio_get(folio); inc_mm_counter(vma->vm_mm, mm_counter_file(page)); - page_add_file_rmap(page, vma, false); + folio_add_file_rmap_pte(folio, page, vma); set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot)); return 0; } @@ -4425,6 +4427,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)
vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) { + struct folio *folio = page_folio(page); struct vm_area_struct *vma = vmf->vma; bool write = vmf->flags & FAULT_FLAG_WRITE; unsigned long haddr = vmf->address & HPAGE_PMD_MASK; @@ -4434,8 +4437,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) return ret;
- page = compound_head(page); - if (compound_order(page) != HPAGE_PMD_ORDER) + if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER) return ret;
/* @@ -4444,7 +4446,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) * check. This kind of THP just can be PTE mapped. Access to * the corrupted subpage should trigger SIGBUS as expected. */ - if (unlikely(PageHasHWPoisoned(page))) + if (unlikely(folio_test_has_hwpoisoned(folio))) return ret;
/* @@ -4469,7 +4471,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR); add_reliable_page_counter(page, vma->vm_mm, HPAGE_PMD_NR); - page_add_file_rmap(page, vma, true); + folio_add_file_rmap_pmd(folio, page, vma);
/* * deposit and withdraw with pmd lock held
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 14d85a6e88a658e29d9c8d6c521e7f824f2f2c6c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert remove_migration_pmd() and while at it, perform some folio conversion.
Link: https://lkml.kernel.org/r/20231220224504.646757-10-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 14d85a6e88a658e29d9c8d6c521e7f824f2f2c6c) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/huge_memory.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5a2131be4540..83d249b4433b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3439,6 +3439,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) { + struct folio *folio = page_folio(new); struct vm_area_struct *vma = pvmw->vma; struct mm_struct *mm = vma->vm_mm; unsigned long address = pvmw->address; @@ -3450,7 +3451,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) return;
entry = pmd_to_swp_entry(*pvmw->pmd); - get_page(new); + folio_get(folio); pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde = pmd_mksoft_dirty(pmde); @@ -3461,10 +3462,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) if (!is_migration_entry_young(entry)) pmde = pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ - if (PageDirty(new) && is_migration_entry_dirty(entry)) + if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pmde = pmd_mkdirty(pmde);
- if (PageAnon(new)) { + if (folio_test_anon(folio)) { rmap_t rmap_flags = RMAP_COMPOUND;
if (!is_readable_migration_entry(entry)) @@ -3472,9 +3473,9 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
page_add_anon_rmap(new, vma, haddr, rmap_flags); } else { - page_add_file_rmap(new, vma, true); + folio_add_file_rmap_pmd(folio, new, vma); } - VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new)); + VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); set_pmd_at(mm, haddr, pvmw->pmd, pmde);
/* No need to invalidate - it was non-present before */
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit c4dffb0bc237d5e3b51adf947062e65ed34ac3c3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert remove_migration_pte().
Link: https://lkml.kernel.org/r/20231220224504.646757-11-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit c4dffb0bc237d5e3b51adf947062e65ed34ac3c3) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 49270a60deb5..8bb34689124c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -262,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio, page_add_anon_rmap(new, vma, pvmw.address, rmap_flags); else - page_add_file_rmap(new, vma, false); + folio_add_file_rmap_pte(folio, new, vma); set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } if (vma->vm_flags & VM_LOCKED)
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 7123e19c3c9d1539c899ac8d919498e3393bb288 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert mfill_atomic_install_pte().
Link: https://lkml.kernel.org/r/20231220224504.646757-12-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 7123e19c3c9d1539c899ac8d919498e3393bb288) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/userfaultfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 32903238db3a..f8449f506af2 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -115,7 +115,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, /* Usually, cache pages are already added to LRU */ if (newly_allocated) folio_add_lru(folio); - page_add_file_rmap(page, dst_vma, false); + folio_add_file_rmap_pte(folio, page, dst_vma); } else { page_add_new_anon_rmap(page, dst_vma, dst_addr); folio_add_lru_vma(folio, dst_vma);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit be6e57cfabe99a5d3b3869103c4ea0ed4a9692d4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
All users are gone, let's remove it.
Link: https://lkml.kernel.org/r/20231220224504.646757-13-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit be6e57cfabe99a5d3b3869103c4ea0ed4a9692d4) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 2 -- mm/rmap.c | 21 --------------------- 2 files changed, 23 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 44d9160e22c3..5a83f2d04085 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -234,8 +234,6 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address); void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); -void page_add_file_rmap(struct page *, struct vm_area_struct *, - bool compound); void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages, struct vm_area_struct *); #define folio_add_file_rmap_pte(folio, page, vma) \ diff --git a/mm/rmap.c b/mm/rmap.c index c4d32ad0138f..1163cf4fa0c5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1437,27 +1437,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page, #endif }
-/** - * page_add_file_rmap - add pte mapping to a file page - * @page: the page to add the mapping to - * @vma: the vm area in which the mapping is added - * @compound: charge the page as compound or small page - * - * The caller needs to hold the pte lock. - */ -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, - bool compound) -{ - struct folio *folio = page_folio(page); - - VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); - - if (likely(!compound)) - folio_add_file_rmap_pte(folio, page, vma); - else - folio_add_file_rmap_pmd(folio, page, vma); -} - /** * page_remove_rmap - take down pte mapping from a page * @page: page to remove mapping from
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 96fd74958c558d6976bbc303dda0efa389182fab category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's factor it out to prepare for reuse as we convert page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().
Make the compiler always special-case on the granularity by using __always_inline.
Link: https://lkml.kernel.org/r/20231220224504.646757-14-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 96fd74958c558d6976bbc303dda0efa389182fab) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/rmap.c | 78 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 44 insertions(+), 34 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index 1163cf4fa0c5..dddb038c33b5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1127,6 +1127,48 @@ int folio_total_mapcount(struct folio *folio) return mapcount; }
+static __always_inline unsigned int __folio_add_rmap(struct folio *folio, + struct page *page, int nr_pages, enum rmap_level level, + int *nr_pmdmapped) +{ + atomic_t *mapped = &folio->_nr_pages_mapped; + int first, nr = 0; + + __folio_rmap_sanity_checks(folio, page, nr_pages, level); + + switch (level) { + case RMAP_LEVEL_PTE: + do { + first = atomic_inc_and_test(&page->_mapcount); + if (first && folio_test_large(folio)) { + first = atomic_inc_return_relaxed(mapped); + first = (first < COMPOUND_MAPPED); + } + + if (first) + nr++; + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: + first = atomic_inc_and_test(&folio->_entire_mapcount); + if (first) { + nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); + if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { + *nr_pmdmapped = folio_nr_pages(folio); + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); + /* Raced ahead of a remove and another add? */ + if (unlikely(nr < 0)) + nr = 0; + } else { + /* Raced ahead of a remove of COMPOUND_MAPPED */ + nr = 0; + } + } + break; + } + return nr; +} + /** * folio_move_anon_rmap - move a folio to our anon_vma * @folio: The folio to move to our anon_vma @@ -1352,43 +1394,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, enum rmap_level level) { - atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0, first; + int nr, nr_pmdmapped = 0;
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); - __folio_rmap_sanity_checks(folio, page, nr_pages, level); - - switch (level) { - case RMAP_LEVEL_PTE: - do { - first = atomic_inc_and_test(&page->_mapcount); - if (first && folio_test_large(folio)) { - first = atomic_inc_return_relaxed(mapped); - first = (first < COMPOUND_MAPPED); - } - - if (first) - nr++; - } while (page++, --nr_pages > 0); - break; - case RMAP_LEVEL_PMD: - first = atomic_inc_and_test(&folio->_entire_mapcount); - if (first) { - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { - nr_pmdmapped = folio_nr_pages(folio); - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); - /* Raced ahead of a remove and another add? */ - if (unlikely(nr < 0)) - nr = 0; - } else { - /* Raced ahead of a remove of COMPOUND_MAPPED */ - nr = 0; - } - } - break; - }
+ nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 8bd5130070fbf2247a97c5361427a810522ac98a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's mimic what we did with folio_add_file_rmap_*() so we can similarly replace page_add_anon_rmap() next.
Make the compiler always special-case on the granularity by using __always_inline.
For the PageAnonExclusive sanity checks, when adding a PMD mapping, we're now also checking each individual subpage covered by that PMD, instead of only the head page.
Note that the new functions ignore the RMAP_COMPOUND flag, which we will remove as soon as page_add_anon_rmap() is gone.
Link: https://lkml.kernel.org/r/20231220224504.646757-15-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 8bd5130070fbf2247a97c5361427a810522ac98a) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 6 +++ mm/rmap.c | 120 +++++++++++++++++++++++++++++-------------- 2 files changed, 88 insertions(+), 38 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 5a83f2d04085..aac160e604fe 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -228,6 +228,12 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio, * rmap interfaces called when adding or removing pte of page */ void folio_move_anon_rmap(struct folio *, struct vm_area_struct *); +void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages, + struct vm_area_struct *, unsigned long address, rmap_t flags); +#define folio_add_anon_rmap_pte(folio, page, vma, address, flags) \ + folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags) +void folio_add_anon_rmap_pmd(struct folio *, struct page *, + struct vm_area_struct *, unsigned long address, rmap_t flags); void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, diff --git a/mm/rmap.c b/mm/rmap.c index dddb038c33b5..1ec0387e3425 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1269,40 +1269,20 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { struct folio *folio = page_folio(page); - atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; - bool compound = flags & RMAP_COMPOUND; - bool first; - - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- /* Is page being mapped by PTE? Is this its first map to be added? */ - if (likely(!compound)) { - first = atomic_inc_and_test(&page->_mapcount); - nr = first; - if (first && folio_test_large(folio)) { - nr = atomic_inc_return_relaxed(mapped); - nr = (nr < COMPOUND_MAPPED); - } - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ + if (likely(!(flags & RMAP_COMPOUND))) + folio_add_anon_rmap_pte(folio, page, vma, address, flags); + else + folio_add_anon_rmap_pmd(folio, page, vma, address, flags); +}
- first = atomic_inc_and_test(&folio->_entire_mapcount); - if (first) { - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { - nr_pmdmapped = folio_nr_pages(folio); - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); - /* Raced ahead of a remove and another add? */ - if (unlikely(nr < 0)) - nr = 0; - } else { - /* Raced ahead of a remove of COMPOUND_MAPPED */ - nr = 0; - } - } - } +static __always_inline void __folio_add_anon_rmap(struct folio *folio, + struct page *page, int nr_pages, struct vm_area_struct *vma, + unsigned long address, rmap_t flags, enum rmap_level level) +{ + int i, nr, nr_pmdmapped = 0;
+ nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped); if (nr) @@ -1316,18 +1296,34 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, * folio->index right when not given the address of the head * page. */ - VM_WARN_ON_FOLIO(folio_test_large(folio) && !compound, folio); + VM_WARN_ON_FOLIO(folio_test_large(folio) && + level != RMAP_LEVEL_PMD, folio); __folio_set_anon(folio, vma, address, !!(flags & RMAP_EXCLUSIVE)); } else if (likely(!folio_test_ksm(folio))) { __page_check_anon_rmap(folio, page, vma, address); } - if (flags & RMAP_EXCLUSIVE) - SetPageAnonExclusive(page); - /* While PTE-mapping a THP we have a PMD and a PTE mapping. */ - VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 || - (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) && - PageAnonExclusive(page), folio); + + if (flags & RMAP_EXCLUSIVE) { + switch (level) { + case RMAP_LEVEL_PTE: + for (i = 0; i < nr_pages; i++) + SetPageAnonExclusive(page + i); + break; + case RMAP_LEVEL_PMD: + SetPageAnonExclusive(page); + break; + } + } + for (i = 0; i < nr_pages; i++) { + struct page *cur_page = page + i; + + /* While PTE-mapping a THP we have a PMD and a PTE mapping. */ + VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 || + (folio_test_large(folio) && + folio_entire_mapcount(folio) > 1)) && + PageAnonExclusive(cur_page), folio); + }
/* * For large folio, only mlock it if it's fully mapped to VMA. It's @@ -1339,6 +1335,54 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, mlock_vma_folio(folio, vma); }
+/** + * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio + * @folio: The folio to add the mappings to + * @page: The first page to add + * @nr_pages: The number of pages which will be mapped + * @vma: The vm area in which the mappings are added + * @address: The user virtual address of the first page to map + * @flags: The rmap flags + * + * The page range of folio is defined by [first_page, first_page + nr_pages) + * + * The caller needs to hold the page table lock, and the page must be locked in + * the anon_vma case: to serialize mapping,index checking after setting, + * and to ensure that an anon folio is not being upgraded racily to a KSM folio + * (but KSM folios are never downgraded). + */ +void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page, + int nr_pages, struct vm_area_struct *vma, unsigned long address, + rmap_t flags) +{ + __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags, + RMAP_LEVEL_PTE); +} + +/** + * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio + * @folio: The folio to add the mapping to + * @page: The first page to add + * @vma: The vm area in which the mapping is added + * @address: The user virtual address of the first page to map + * @flags: The rmap flags + * + * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR) + * + * The caller needs to hold the page table lock, and the page must be locked in + * the anon_vma case: to serialize mapping,index checking after setting. + */ +void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page, + struct vm_area_struct *vma, unsigned long address, rmap_t flags) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags, + RMAP_LEVEL_PMD); +#else + WARN_ON_ONCE(true); +#endif +} + /** * folio_add_new_anon_rmap - Add mapping to a new anonymous folio. * @folio: The folio to add the mapping to.
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 91b2978a348073db0e47b380fa66c865eb25f3d8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
While at it, use more folio operations (but only in the code branch we're touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of manually setting PageAnonExclusive.
We should never see non-anon pages on that branch: otherwise, the existing page_add_anon_rmap() call would have been flawed already.
Link: https://lkml.kernel.org/r/20231220224504.646757-16-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 91b2978a348073db0e47b380fa66c865eb25f3d8) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/huge_memory.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 83d249b4433b..1f2e9eec7cef 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2253,6 +2253,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, unsigned long haddr, bool freeze) { struct mm_struct *mm = vma->vm_mm; + struct folio *folio; struct page *page; pgtable_t pgtable; pmd_t old_pmd, _pmd; @@ -2349,16 +2350,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { page = pmd_page(old_pmd); + folio = page_folio(page); if (pmd_dirty(old_pmd)) { dirty = true; - SetPageDirty(page); + folio_set_dirty(folio); } write = pmd_write(old_pmd); young = pmd_young(old_pmd); soft_dirty = pmd_soft_dirty(old_pmd); uffd_wp = pmd_uffd_wp(old_pmd);
- VM_BUG_ON_PAGE(!page_count(page), page); + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
/* * Without "freeze", we'll simply split the PMD, propagating the @@ -2375,11 +2378,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * * See page_try_share_anon_rmap(): invalidate PMD first. */ - anon_exclusive = PageAnon(page) && PageAnonExclusive(page); + anon_exclusive = PageAnonExclusive(page); if (freeze && anon_exclusive && page_try_share_anon_rmap(page)) freeze = false; - if (!freeze) - page_ref_add(page, HPAGE_PMD_NR - 1); + if (!freeze) { + rmap_t rmap_flags = RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags |= RMAP_EXCLUSIVE; + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } }
/* @@ -2422,8 +2432,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); if (write) entry = pte_mkwrite(entry, vma); - if (anon_exclusive) - SetPageAnonExclusive(page + i); if (!young) entry = pte_mkold(entry); /* NOTE: this may set soft-dirty too on some archs */ @@ -2433,7 +2441,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = pte_mksoft_dirty(entry); if (uffd_wp) entry = pte_mkuffd_wp(entry); - page_add_anon_rmap(page + i, vma, addr, RMAP_NONE); } VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 395db7b190892f1ca8d31e1fc83198e2531335f6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert remove_migration_pmd(). No need to set RMAP_COMPOUND, that we will remove soon.
Link: https://lkml.kernel.org/r/20231220224504.646757-17-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 395db7b190892f1ca8d31e1fc83198e2531335f6) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1f2e9eec7cef..0b433469ebc0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3473,12 +3473,12 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) pmde = pmd_mkdirty(pmde);
if (folio_test_anon(folio)) { - rmap_t rmap_flags = RMAP_COMPOUND; + rmap_t rmap_flags = RMAP_NONE;
if (!is_readable_migration_entry(entry)) rmap_flags |= RMAP_EXCLUSIVE;
- page_add_anon_rmap(new, vma, haddr, rmap_flags); + folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); } else { folio_add_file_rmap_pmd(folio, new, vma); }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit a15dc4785c98f360bdca78483455e0aff30242cb category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert remove_migration_pte().
Link: https://lkml.kernel.org/r/20231220224504.646757-18-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit a15dc4785c98f360bdca78483455e0aff30242cb) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/migrate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 8bb34689124c..29cb76170378 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -259,8 +259,8 @@ static bool remove_migration_pte(struct folio *folio, #endif { if (folio_test_anon(folio)) - page_add_anon_rmap(new, vma, pvmw.address, - rmap_flags); + folio_add_anon_rmap_pte(folio, new, vma, + pvmw.address, rmap_flags); else folio_add_file_rmap_pte(folio, new, vma); set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 977295349eb7826c50e2841915de96eab3a502c2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert replace_page(). While at it, perform some folio conversion.
Link: https://lkml.kernel.org/r/20231220224504.646757-19-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 977295349eb7826c50e2841915de96eab3a502c2) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/ksm.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c index da8416e2debe..fc4fc1f30fe4 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1186,6 +1186,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page, static int replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage, pte_t orig_pte) { + struct folio *kfolio = page_folio(kpage); struct mm_struct *mm = vma->vm_mm; struct folio *folio; pmd_t *pmd; @@ -1225,16 +1226,17 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, goto out_mn; } VM_BUG_ON_PAGE(PageAnonExclusive(page), page); - VM_BUG_ON_PAGE(PageAnon(kpage) && PageAnonExclusive(kpage), kpage); + VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage), + kfolio);
/* * No need to check ksm_use_zero_pages here: we can only have a * zero_page here if ksm_use_zero_pages was enabled already. */ if (!is_zero_pfn(page_to_pfn(kpage))) { - get_page(kpage); + folio_get(kfolio); add_reliable_page_counter(kpage, mm, 1); - page_add_anon_rmap(kpage, vma, addr, RMAP_NONE); + folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE); newpte = mk_pte(kpage, vma->vm_page_prot); } else { /*
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit da7dc0afe243874b6ad25f5070aa728349e4e0fd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert unuse_pte().
Link: https://lkml.kernel.org/r/20231220224504.646757-20-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit da7dc0afe243874b6ad25f5070aa728349e4e0fd) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c index b12b1dd7e151..91f48df47952 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1896,7 +1896,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, if (pte_swp_exclusive(old_pte)) rmap_flags |= RMAP_EXCLUSIVE;
- page_add_anon_rmap(page, vma, addr, rmap_flags); + folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags); } else { /* ksm created a completely new copy */ page_add_new_anon_rmap(page, vma, addr); lru_cache_add_inactive_or_unevictable(page, vma);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit b832a354d787bfbdea5c226f0d77cc1a222d09f8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert restore_exclusive_pte() and do_swap_page(). While at it, perform some folio conversion in restore_exclusive_pte().
Link: https://lkml.kernel.org/r/20231220224504.646757-21-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit b832a354d787bfbdea5c226f0d77cc1a222d09f8) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index 5b8214c729bc..54fc81a39772 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -713,6 +713,7 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, struct page *page, unsigned long address, pte_t *ptep) { + struct folio *folio = page_folio(page); pte_t orig_pte; pte_t pte; swp_entry_t entry; @@ -728,14 +729,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, else if (is_writable_device_exclusive_entry(entry)) pte = maybe_mkwrite(pte_mkdirty(pte), vma);
- VM_BUG_ON(pte_write(pte) && !(PageAnon(page) && PageAnonExclusive(page))); + VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && + PageAnonExclusive(page)), folio);
/* * No need to take a page reference as one was already * created when the swap entry was made. */ - if (PageAnon(page)) - page_add_anon_rmap(page, vma, address, RMAP_NONE); + if (folio_test_anon(folio)) + folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE); else /* * Currently device exclusive access only supports anonymous @@ -4085,7 +4087,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) page_add_new_anon_rmap(page, vma, vmf->address); folio_add_lru_vma(folio, vma); } else { - page_add_anon_rmap(page, vma, vmf->address, rmap_flags); + folio_add_anon_rmap_pte(folio, page, vma, vmf->address, + rmap_flags); }
VM_BUG_ON(!folio_test_anon(folio) ||
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 84f0169e6c8a613012722e0d63302f9da4a72099 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
All users are gone, remove it and all traces.
Link: https://lkml.kernel.org/r/20231220224504.646757-22-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 84f0169e6c8a613012722e0d63302f9da4a72099) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 2 -- mm/rmap.c | 31 ++++--------------------------- 2 files changed, 4 insertions(+), 29 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index aac160e604fe..afb8e8bdab37 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -234,8 +234,6 @@ void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages, folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags) void folio_add_anon_rmap_pmd(struct folio *, struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); -void page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address); void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, diff --git a/mm/rmap.c b/mm/rmap.c index 1ec0387e3425..0443cb9b8376 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1240,7 +1240,7 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page, * The page's anon-rmap details (mapping and index) are guaranteed to * be set up correctly at this point. * - * We have exclusion against page_add_anon_rmap because the caller + * We have exclusion against folio_add_anon_rmap_*() because the caller * always holds the page locked. * * We have exclusion against page_add_new_anon_rmap because those pages @@ -1253,29 +1253,6 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page, page); }
-/** - * page_add_anon_rmap - add pte mapping to an anonymous page - * @page: the page to add the mapping to - * @vma: the vm area in which the mapping is added - * @address: the user virtual address mapped - * @flags: the rmap flags - * - * The caller needs to hold the pte lock, and the page must be locked in - * the anon_vma case: to serialize mapping,index checking after setting, - * and to ensure that PageAnon is not being upgraded racily to PageKsm - * (but PageKsm is never downgraded to PageAnon). - */ -void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, - unsigned long address, rmap_t flags) -{ - struct folio *folio = page_folio(page); - - if (likely(!(flags & RMAP_COMPOUND))) - folio_add_anon_rmap_pte(folio, page, vma, address, flags); - else - folio_add_anon_rmap_pmd(folio, page, vma, address, flags); -} - static __always_inline void __folio_add_anon_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, unsigned long address, rmap_t flags, enum rmap_level level) @@ -1389,7 +1366,7 @@ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page, * @vma: the vm area in which the mapping is added * @address: the user virtual address mapped * - * Like page_add_anon_rmap() but must only be called on *new* folios. + * Like folio_add_anon_rmap_*() but must only be called on *new* folios. * This means the inc-and-test can be bypassed. * The folio does not have to be locked. * @@ -1449,7 +1426,7 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio, if (nr) __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr);
- /* See comments in page_add_anon_rmap() */ + /* See comments in folio_add_anon_rmap_*() */ if (!folio_test_large(folio)) mlock_vma_folio(folio, vma); } @@ -1563,7 +1540,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
/* * It would be tidy to reset folio_test_anon mapping when fully - * unmapped, but that might overwrite a racing page_add_anon_rmap + * unmapped, but that might overwrite a racing folio_add_anon_rmap_*() * which increments mapcount after us but sets mapping before us: * so leave the reset to free_pages_prepare, and remember that * it's only reliable while mapped.
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 0cae959e3abf19ba62805f6e6a8b42b6cd9ed3e3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
No longer used, let's remove it and clarify RMAP_NONE/RMAP_EXCLUSIVE a bit.
Link: https://lkml.kernel.org/r/20231220224504.646757-23-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 0cae959e3abf19ba62805f6e6a8b42b6cd9ed3e3) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 12 +++--------- mm/rmap.c | 2 -- 2 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index afb8e8bdab37..9075ac964c7e 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -172,20 +172,14 @@ struct anon_vma *folio_get_anon_vma(struct folio *folio); typedef int __bitwise rmap_t;
/* - * No special request: if the page is a subpage of a compound page, it is - * mapped via a PTE. The mapped (sub)page is possibly shared between processes. + * No special request: A mapped anonymous (sub)page is possibly shared between + * processes. */ #define RMAP_NONE ((__force rmap_t)0)
-/* The (sub)page is exclusive to a single process. */ +/* The anonymous (sub)page is exclusive to a single process. */ #define RMAP_EXCLUSIVE ((__force rmap_t)BIT(0))
-/* - * The compound page is not mapped via PTEs, but instead via a single PMD and - * should be accounted accordingly. - */ -#define RMAP_COMPOUND ((__force rmap_t)BIT(1)) - /* * Internally, we're using an enum to specify the granularity. We make the * compiler emit specialized code for each granularity. diff --git a/mm/rmap.c b/mm/rmap.c index 0443cb9b8376..4ae00c8560c6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2639,8 +2639,6 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) * The following two functions are for anonymous (private mapped) hugepages. * Unlike common anonymous pages, anonymous hugepages have no accounting code * and no lru code, because we handle hugepages differently from common pages. - * - * RMAP_COMPOUND is ignored. */ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address, rmap_t flags)
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit b06dc281aa9901076898d4d0a7bde588f11bc204 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's mimic what we did with folio_add_file_rmap_*() and folio_add_anon_rmap_*() so we can similarly replace page_remove_rmap() next.
Make the compiler always special-case on the granularity by using __always_inline.
We're adding folio_remove_rmap_ptes() handling right away, as we want to use that soon for batching rmap operations when unmapping PTE-mapped large folios.
Link: https://lkml.kernel.org/r/20231220224504.646757-24-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit b06dc281aa9901076898d4d0a7bde588f11bc204) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 6 ++++ mm/rmap.c | 82 +++++++++++++++++++++++++++++++++++--------- 2 files changed, 72 insertions(+), 16 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9075ac964c7e..179772439145 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -240,6 +240,12 @@ void folio_add_file_rmap_pmd(struct folio *, struct page *, struct vm_area_struct *); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); +void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages, + struct vm_area_struct *); +#define folio_remove_rmap_pte(folio, page, vma) \ + folio_remove_rmap_ptes(folio, page, 1, vma) +void folio_remove_rmap_pmd(struct folio *, struct page *, + struct vm_area_struct *);
void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address, rmap_t flags); diff --git a/mm/rmap.c b/mm/rmap.c index 4ae00c8560c6..d3b4259cb9b3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1480,25 +1480,37 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, bool compound) { struct folio *folio = page_folio(page); + + if (likely(!compound)) + folio_remove_rmap_pte(folio, page, vma); + else + folio_remove_rmap_pmd(folio, page, vma); +} + +static __always_inline void __folio_remove_rmap(struct folio *folio, + struct page *page, int nr_pages, struct vm_area_struct *vma, + enum rmap_level level) +{ atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; - bool last; + int last, nr = 0, nr_pmdmapped = 0; enum node_stat_item idx;
- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); - VM_BUG_ON_PAGE(compound && !PageHead(page), page); - - /* Is page being unmapped by PTE? Is this its last map to be removed? */ - if (likely(!compound)) { - last = atomic_add_negative(-1, &page->_mapcount); - nr = last; - if (last && folio_test_large(folio)) { - nr = atomic_dec_return_relaxed(mapped); - nr = (nr < COMPOUND_MAPPED); - } - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ + __folio_rmap_sanity_checks(folio, page, nr_pages, level); + + switch (level) { + case RMAP_LEVEL_PTE: + do { + last = atomic_add_negative(-1, &page->_mapcount); + if (last && folio_test_large(folio)) { + last = atomic_dec_return_relaxed(mapped); + last = (last < COMPOUND_MAPPED); + }
+ if (last) + nr++; + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: last = atomic_add_negative(-1, &folio->_entire_mapcount); if (last) { nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped); @@ -1513,6 +1525,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, nr = 0; } } + break; }
if (nr_pmdmapped) { @@ -1534,7 +1547,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, * is still mapped. */ if (folio_test_large(folio) && folio_test_anon(folio)) - if (!compound || nr < nr_pmdmapped) + if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) deferred_split_folio(folio); }
@@ -1549,6 +1562,43 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, munlock_vma_folio(folio, vma); }
+/** + * folio_remove_rmap_ptes - remove PTE mappings from a page range of a folio + * @folio: The folio to remove the mappings from + * @page: The first page to remove + * @nr_pages: The number of pages that will be removed from the mapping + * @vma: The vm area from which the mappings are removed + * + * The page range of the folio is defined by [page, page + nr_pages) + * + * The caller needs to hold the page table lock. + */ +void folio_remove_rmap_ptes(struct folio *folio, struct page *page, + int nr_pages, struct vm_area_struct *vma) +{ + __folio_remove_rmap(folio, page, nr_pages, vma, RMAP_LEVEL_PTE); +} + +/** + * folio_remove_rmap_pmd - remove a PMD mapping from a page range of a folio + * @folio: The folio to remove the mapping from + * @page: The first page to remove + * @vma: The vm area from which the mapping is removed + * + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) + * + * The caller needs to hold the page table lock. + */ +void folio_remove_rmap_pmd(struct folio *folio, struct page *page, + struct vm_area_struct *vma) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + __folio_remove_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_LEVEL_PMD); +#else + WARN_ON_ONCE(true); +#endif +} + /* * @arg: enum ttu_flags will be passed to this argument */
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 5cc9695f06b065168f5c893c8e006b6a8a2c9c91 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert __replace_page().
Link: https://lkml.kernel.org/r/20231220224504.646757-25-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 5cc9695f06b065168f5c893c8e006b6a8a2c9c91) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index cf52555455a3..3103cd259383 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -200,7 +200,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, mk_pte(new_page, vma->vm_page_prot));
add_reliable_page_counter(old_page, mm, -1); - page_remove_rmap(old_page, vma, false); + folio_remove_rmap_pte(old_folio, old_page, vma); if (!folio_mapped(old_folio)) folio_free_swap(old_folio); page_vma_mapped_walk_done(&pvmw);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit a8e61d584eda0d5532b0bbfe3c2427d2688d3c83 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert zap_huge_pmd() and set_pmd_migration_entry(). While at it, perform some more folio conversion.
Link: https://lkml.kernel.org/r/20231220224504.646757-26-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit a8e61d584eda0d5532b0bbfe3c2427d2688d3c83) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/huge_memory.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0b433469ebc0..a4947418480f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1874,7 +1874,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (pmd_present(orig_pmd)) { page = pmd_page(orig_pmd); add_reliable_page_counter(page, tlb->mm, -HPAGE_PMD_NR); - page_remove_rmap(page, vma, true); + folio_remove_rmap_pmd(page_folio(page), page, vma); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); } else if (thp_migration_supported()) { @@ -2288,13 +2288,14 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_swap_entry_to_page(entry); } else { page = pmd_page(old_pmd); - if (!PageDirty(page) && pmd_dirty(old_pmd)) - set_page_dirty(page); - if (!PageReferenced(page) && pmd_young(old_pmd)) - SetPageReferenced(page); + folio = page_folio(page); + if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) + folio_set_dirty(folio); + if (!folio_test_referenced(folio) && pmd_young(old_pmd)) + folio_set_referenced(folio); add_reliable_page_counter(page, mm, -HPAGE_PMD_NR); - page_remove_rmap(page, vma, true); - put_page(page); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); } add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; @@ -2449,7 +2450,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, pte_unmap(pte - 1);
if (!pmd_migration) - page_remove_rmap(page, vma, true); + folio_remove_rmap_pmd(folio, page, vma); if (freeze) put_page(page);
@@ -3398,6 +3399,7 @@ late_initcall(split_huge_pages_debugfs); int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page) { + struct folio *folio = page_folio(page); struct vm_area_struct *vma = pvmw->vma; struct mm_struct *mm = vma->vm_mm; unsigned long address = pvmw->address; @@ -3413,14 +3415,14 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
/* See page_try_share_anon_rmap(): invalidate PMD first. */ - anon_exclusive = PageAnon(page) && PageAnonExclusive(page); + anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); if (anon_exclusive && page_try_share_anon_rmap(page)) { set_pmd_at(mm, address, pvmw->pmd, pmdval); return -EBUSY; }
if (pmd_dirty(pmdval)) - set_page_dirty(page); + folio_set_dirty(folio); if (pmd_write(pmdval)) entry = make_writable_migration_entry(page_to_pfn(page)); else if (anon_exclusive) @@ -3437,8 +3439,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, if (pmd_uffd_wp(pmdval)) pmdswp = pmd_swp_mkuffd_wp(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); - page_remove_rmap(page, vma, true); - put_page(page); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); trace_set_migration_pmd(address, pmd_val(pmdswp));
return 0;
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 35668a4321461505dcc39b56a0d97b0ba2c99668 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert __collapse_huge_page_copy_succeeded() and collapse_pte_mapped_thp(). While at it, perform some more folio conversion in __collapse_huge_page_copy_succeeded().
We can get rid of release_pte_page().
Link: https://lkml.kernel.org/r/20231220224504.646757-27-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 35668a4321461505dcc39b56a0d97b0ba2c99668) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/khugepaged.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 37ff3679043c..89b47a6e24af 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -501,11 +501,6 @@ static void release_pte_folio(struct folio *folio) folio_putback_lru(folio); }
-static void release_pte_page(struct page *page) -{ - release_pte_folio(page_folio(page)); -} - static void release_pte_pages(pte_t *pte, pte_t *_pte, struct list_head *compound_pagelist) { @@ -694,6 +689,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, spinlock_t *ptl, struct list_head *compound_pagelist) { + struct folio *src_folio; struct page *src_page; struct page *tmp; pte_t *_pte; @@ -715,17 +711,18 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, } } else { src_page = pte_page(pteval); - if (!PageCompound(src_page)) - release_pte_page(src_page); + src_folio = page_folio(src_page); + if (!folio_test_large(src_folio)) + release_pte_folio(src_folio); /* * ptl mostly unnecessary, but preempt has to * be disabled to update the per-cpu stats - * inside page_remove_rmap(). + * inside folio_remove_rmap_pte(). */ spin_lock(ptl); ptep_clear(vma->vm_mm, address, _pte); add_reliable_page_counter(src_page, vma->vm_mm, 1); - page_remove_rmap(src_page, vma, false); + folio_remove_rmap_pte(src_folio, src_page, vma); spin_unlock(ptl); free_page_and_swap_cache(src_page); } @@ -1643,7 +1640,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, * PTE dirty? Shmem page is already dirty; file is read-only. */ ptep_clear(mm, addr, pte); - page_remove_rmap(page, vma, false); + folio_remove_rmap_pte(folio, page, vma); add_reliable_page_counter(page, mm, -1); nr_ptes++; }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 18e8612e56244c6db3254d435a22344856a9c55b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert replace_page().
Link: https://lkml.kernel.org/r/20231220224504.646757-28-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 18e8612e56244c6db3254d435a22344856a9c55b) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/ksm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/ksm.c b/mm/ksm.c index fc4fc1f30fe4..3d3db4111262 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1268,7 +1268,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, add_reliable_page_counter(page, mm, -1);
folio = page_folio(page); - page_remove_rmap(page, vma, false); + folio_remove_rmap_pte(folio, page, vma); if (!folio_mapped(folio)) folio_free_swap(folio); folio_put(folio);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit c46265030b0f400ef89833bb51da62676d2f855a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert zap_pte_range() and closely-related tlb_flush_rmap_batch(). While at it, perform some more folio conversion in zap_pte_range().
Link: https://lkml.kernel.org/r/20231220224504.646757-29-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit c46265030b0f400ef89833bb51da62676d2f855a) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/memory.c | 23 +++++++++++++---------- mm/mmu_gather.c | 2 +- 2 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index 54fc81a39772..d60144b822a1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1433,6 +1433,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, arch_enter_lazy_mmu_mode(); do { pte_t ptent = ptep_get(pte); + struct folio *folio; struct page *page;
if (pte_none(ptent)) @@ -1458,22 +1459,23 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; }
+ folio = page_folio(page); delay_rmap = 0; - if (!PageAnon(page)) { + if (!folio_test_anon(folio)) { if (pte_dirty(ptent)) { - set_page_dirty(page); + folio_set_dirty(folio); if (tlb_delay_rmap(tlb)) { delay_rmap = 1; force_flush = 1; } } if (pte_young(ptent) && likely(vma_has_recency(vma))) - mark_page_accessed(page); + folio_mark_accessed(folio); } rss[mm_counter(page)]--; add_reliable_page_counter(page, mm, -1); if (!delay_rmap) { - page_remove_rmap(page, vma, false); + folio_remove_rmap_pte(folio, page, vma); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); } @@ -1489,6 +1491,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (is_device_private_entry(entry) || is_device_exclusive_entry(entry)) { page = pfn_swap_entry_to_page(entry); + folio = page_folio(page); if (unlikely(!should_zap_page(details, page))) continue; /* @@ -1501,8 +1504,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, rss[mm_counter(page)]--; add_reliable_page_counter(page, mm, -1); if (is_device_private_entry(entry)) - page_remove_rmap(page, vma, false); - put_page(page); + folio_remove_rmap_pte(folio, page, vma); + folio_put(folio); } else if (!non_swap_entry(entry)) { /* Genuine swap entry, hence a private anon page */ if (!should_zap_cows(details)) @@ -3227,10 +3230,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * threads. * * The critical issue is to order this - * page_remove_rmap with the ptp_clear_flush above. - * Those stores are ordered by (if nothing else,) + * folio_remove_rmap_pte() with the ptp_clear_flush + * above. Those stores are ordered by (if nothing else,) * the barrier present in the atomic_add_negative - * in page_remove_rmap. + * in folio_remove_rmap_pte(); * * Then the TLB flush in ptep_clear_flush ensures that * no process can access the old page before the @@ -3239,7 +3242,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * mapcount is visible. So transitively, TLBs to * old page will be flushed before it can be reused. */ - page_remove_rmap(vmf->page, vma, false); + folio_remove_rmap_pte(old_folio, vmf->page, vma); }
/* Free the old page.. */ diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 4f559f4ddd21..604ddf08affe 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -55,7 +55,7 @@ static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_
if (encoded_page_flags(enc)) { struct page *page = encoded_page_ptr(enc); - page_remove_rmap(page, vma, false); + folio_remove_rmap_pte(page_folio(page), page, vma); } } }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 5b205c7f2684764c8a9cc3442986623d4d6e87f1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert migrate_vma_collect_pmd(). While at it, perform more folio conversion.
Link: https://lkml.kernel.org/r/20231220224504.646757-30-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 5b205c7f2684764c8a9cc3442986623d4d6e87f1) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/migrate_device.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 2e5bce2f1cb9..cee7957cecb8 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -107,6 +107,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
for (; addr < end; addr += PAGE_SIZE, ptep++) { unsigned long mpfn = 0, pfn; + struct folio *folio; struct page *page; swp_entry_t entry; pte_t pte; @@ -168,41 +169,43 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, }
/* - * By getting a reference on the page we pin it and that blocks + * By getting a reference on the folio we pin it and that blocks * any kind of migration. Side effect is that it "freezes" the * pte. * - * We drop this reference after isolating the page from the lru - * for non device page (device page are not on the lru and thus + * We drop this reference after isolating the folio from the lru + * for non device folio (device folio are not on the lru and thus * can't be dropped from it). */ - get_page(page); + folio = page_folio(page); + folio_get(folio);
/* - * We rely on trylock_page() to avoid deadlock between + * We rely on folio_trylock() to avoid deadlock between * concurrent migrations where each is waiting on the others - * page lock. If we can't immediately lock the page we fail this + * folio lock. If we can't immediately lock the folio we fail this * migration as it is only best effort anyway. * - * If we can lock the page it's safe to set up a migration entry - * now. In the common case where the page is mapped once in a + * If we can lock the folio it's safe to set up a migration entry + * now. In the common case where the folio is mapped once in a * single process setting up the migration entry now is an * optimisation to avoid walking the rmap later with * try_to_migrate(). */ - if (trylock_page(page)) { + if (folio_trylock(folio)) { bool anon_exclusive; pte_t swp_pte;
flush_cache_page(vma, addr, pte_pfn(pte)); - anon_exclusive = PageAnon(page) && PageAnonExclusive(page); + anon_exclusive = folio_test_anon(folio) && + PageAnonExclusive(page); if (anon_exclusive) { pte = ptep_clear_flush(vma, addr, ptep);
if (page_try_share_anon_rmap(page)) { set_pte_at(mm, addr, ptep, pte); - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); mpfn = 0; goto next; } @@ -214,7 +217,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
/* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pte)) - folio_mark_dirty(page_folio(page)); + folio_mark_dirty(folio);
/* Setup special migration page table entry */ if (mpfn & MIGRATE_PFN_WRITE) @@ -248,16 +251,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
/* * This is like regular unmap: we remove the rmap and - * drop page refcount. Page won't be freed, as we took - * a reference just above. + * drop the folio refcount. The folio won't be freed, as + * we took a reference just above. */ - page_remove_rmap(page, vma, false); - put_page(page); + folio_remove_rmap_pte(folio, page, vma); + folio_put(folio);
if (pte_present(pte)) unmapped++; } else { - put_page(page); + folio_put(folio); mpfn = 0; }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit ca1a0746182c3c059573d7e4554d335cae5306dc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert try_to_unmap_one() and try_to_migrate_one().
Link: https://lkml.kernel.org/r/20231220224504.646757-31-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit ca1a0746182c3c059573d7e4554d335cae5306dc) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/rmap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index d3b4259cb9b3..2372dd838dcb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1617,7 +1617,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
/* * When racing against e.g. zap_pte_range() on another cpu, - * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), * try_to_unmap() may return before page_mapped() has become false, * if page table locking is skipped: use TTU_SYNC to wait for that. */ @@ -1903,7 +1903,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (unlikely(folio_test_hugetlb(folio))) hugetlb_remove_rmap(folio); else - page_remove_rmap(subpage, vma, false); + folio_remove_rmap_pte(folio, subpage, vma); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); @@ -1971,7 +1971,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
/* * When racing against e.g. zap_pte_range() on another cpu, - * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), * try_to_migrate() may return before page_mapped() has become false, * if page table locking is skipped: use TTU_SYNC to wait for that. */ @@ -2266,7 +2266,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, if (unlikely(folio_test_hugetlb(folio))) hugetlb_remove_rmap(folio); else - page_remove_rmap(subpage, vma, false); + folio_remove_rmap_pte(folio, subpage, vma); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); @@ -2405,7 +2405,7 @@ static bool page_make_device_exclusive_one(struct folio *folio, * There is a reference on the page for the swap entry which has * been removed, so shouldn't take another. */ - page_remove_rmap(subpage, vma, false); + folio_remove_rmap_pte(folio, subpage, vma); }
mmu_notifier_invalidate_range_end(&range);
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE
-------------------------------------------------
Let's convert uswap_unmap_anon_page().
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/userswap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/userswap.c b/mm/userswap.c index e76e9d7a40de..20ee53dc5eea 100644 --- a/mm/userswap.c +++ b/mm/userswap.c @@ -163,7 +163,7 @@ static int uswap_unmap_anon_page(struct mm_struct *mm,
dec_mm_counter(mm, MM_ANONPAGES); add_reliable_page_counter(page, mm, -1); - page_remove_rmap(page, vma, false); + folio_remove_rmap_pte(page_folio(page), page, vma); page->mapping = NULL;
out_release_unlock:
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 5a0033f0285e0bb29f6e4d1593d4519c91ed882a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Refer to folio_remove_rmap_*() instaed.
Link: https://lkml.kernel.org/r/20231220224504.646757-32-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 5a0033f0285e0bb29f6e4d1593d4519c91ed882a) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- Documentation/mm/transhuge.rst | 2 +- Documentation/mm/unevictable-lru.rst | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst index 9a607059ea11..cf81272a6b8b 100644 --- a/Documentation/mm/transhuge.rst +++ b/Documentation/mm/transhuge.rst @@ -156,7 +156,7 @@ Partial unmap and deferred_split_folio()
Unmapping part of THP (with munmap() or other way) is not going to free memory immediately. Instead, we detect that a subpage of THP is not in use -in page_remove_rmap() and queue the THP for splitting if memory pressure +in folio_remove_rmap_*() and queue the THP for splitting if memory pressure comes. Splitting will free up unused subpages.
Splitting the page right away is not an option due to locking context in diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst index 67f1338440a5..b6a07a26b10d 100644 --- a/Documentation/mm/unevictable-lru.rst +++ b/Documentation/mm/unevictable-lru.rst @@ -486,7 +486,7 @@ munlock the pages if we're removing the last VM_LOCKED VMA that maps the pages. Before the unevictable/mlock changes, mlocking did not mark the pages in any way, so unmapping them required no processing.
-For each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls +For each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED (unless it was a PTE mapping of a part of a transparent huge page).
@@ -511,7 +511,7 @@ userspace; truncation even unmaps and deletes any private anonymous pages which had been Copied-On-Write from the file pages now being truncated.
Mlocked pages can be munlocked and deleted in this way: like with munmap(), -for each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls +for each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED (unless it was a PTE mapping of a part of a transparent huge page).
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 4d8f7418e8ba36036c8486d92d9591c368ab9b85 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
All callers are gone, let's remove it and some leftover traces.
Link: https://lkml.kernel.org/r/20231220224504.646757-33-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 4d8f7418e8ba36036c8486d92d9591c368ab9b85) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 4 +--- mm/filemap.c | 10 +++++----- mm/internal.h | 2 +- mm/memory-failure.c | 4 ++-- mm/rmap.c | 23 ++--------------------- 5 files changed, 11 insertions(+), 32 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 179772439145..4741c964866e 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -238,8 +238,6 @@ void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages, folio_add_file_rmap_ptes(folio, page, 1, vma) void folio_add_file_rmap_pmd(struct folio *, struct page *, struct vm_area_struct *); -void page_remove_rmap(struct page *, struct vm_area_struct *, - bool compound); void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages, struct vm_area_struct *); #define folio_remove_rmap_pte(folio, page, vma) \ @@ -386,7 +384,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, * * This is similar to page_try_dup_anon_rmap(), however, not used during fork() * to duplicate a mapping, but instead to prepare for KSM or temporarily - * unmapping a page (swap, migration) via page_remove_rmap(). + * unmapping a page (swap, migration) via folio_remove_rmap_*(). * * Marking the page shared can only fail if the page may be pinned; device * private pages cannot get pinned and consequently this function cannot fail. diff --git a/mm/filemap.c b/mm/filemap.c index db830ebe0f3f..12d73aa8487d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -113,11 +113,11 @@ * ->i_pages lock (try_to_unmap_one) * ->lruvec->lru_lock (follow_page->mark_page_accessed) * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) - * ->private_lock (page_remove_rmap->set_page_dirty) - * ->i_pages lock (page_remove_rmap->set_page_dirty) - * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) - * ->inode->i_lock (page_remove_rmap->set_page_dirty) - * ->memcg->move_lock (page_remove_rmap->folio_memcg_lock) + * ->private_lock (folio_remove_rmap_pte->set_page_dirty) + * ->i_pages lock (folio_remove_rmap_pte->set_page_dirty) + * bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty) + * ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty) + * ->memcg->move_lock (folio_remove_rmap_pte->folio_memcg_lock) * bdi.wb->list_lock (zap_pte_range->set_page_dirty) * ->inode->i_lock (zap_pte_range->set_page_dirty) * ->private_lock (zap_pte_range->block_dirty_folio) diff --git a/mm/internal.h b/mm/internal.h index eab2199d16c8..83cdb6ea72e3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -648,7 +648,7 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma) * under page table lock for the pte/pmd being added or removed. * * mlock is usually called at the end of page_add_*_rmap(), munlock at - * the end of page_remove_rmap(); but new anon folios are managed by + * the end of folio_remove_rmap_*(); but new anon folios are managed by * folio_add_lru_vma() calling mlock_new_folio(). */ void mlock_folio(struct folio *folio); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f5ec4750feef..c9372bae3524 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2328,8 +2328,8 @@ int memory_failure(unsigned long pfn, int flags) * We use page flags to determine what action should be taken, but * the flags can be modified by the error containment action. One * example is an mlocked page, where PG_mlocked is cleared by - * page_remove_rmap() in try_to_unmap_one(). So to determine page status - * correctly, we save a copy of the page flags at this time. + * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page + * status correctly, we save a copy of the page flags at this time. */ page_flags = p->flags;
diff --git a/mm/rmap.c b/mm/rmap.c index 2372dd838dcb..efbe0708adeb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -470,7 +470,7 @@ void __init anon_vma_init(void) /* * Getting a lock on a stable anon_vma from a page off the LRU is tricky! * - * Since there is no serialization what so ever against page_remove_rmap() + * Since there is no serialization what so ever against folio_remove_rmap_*() * the best this function can do is return a refcount increased anon_vma * that might have been relevant to this page. * @@ -487,7 +487,7 @@ void __init anon_vma_init(void) * [ something equivalent to page_mapped_in_vma() ]. * * Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from - * page_remove_rmap() that the anon_vma pointer from page->mapping is valid + * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid * if there is a mapcount, we can dereference the anon_vma after observing * those. */ @@ -1468,25 +1468,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page, #endif }
-/** - * page_remove_rmap - take down pte mapping from a page - * @page: page to remove mapping from - * @vma: the vm area from which the mapping is removed - * @compound: uncharge the page as compound or small page - * - * The caller needs to hold the pte lock. - */ -void page_remove_rmap(struct page *page, struct vm_area_struct *vma, - bool compound) -{ - struct folio *folio = page_folio(page); - - if (likely(!compound)) - folio_remove_rmap_pte(folio, page, vma); - else - folio_remove_rmap_pmd(folio, page, vma); -} - static __always_inline void __folio_remove_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, enum rmap_level level)
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit d8ef5e311d7bfde54b60ab45026f206eff31b2d2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert page_dup_file_rmap() like the other rmap functions. As there is only a single caller, convert that single caller right away and remove page_dup_file_rmap().
Add folio_dup_file_rmap_ptes() right away, we want to perform rmap baching during fork() soon.
Link: https://lkml.kernel.org/r/20231220224504.646757-34-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit d8ef5e311d7bfde54b60ab45026f206eff31b2d2) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 59 ++++++++++++++++++++++++++++++++++++++++---- mm/memory.c | 2 +- 2 files changed, 55 insertions(+), 6 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 4741c964866e..8f369fb3ed71 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -305,6 +305,60 @@ static inline void hugetlb_remove_rmap(struct folio *folio) atomic_dec(&folio->_entire_mapcount); }
+static __always_inline void __folio_dup_file_rmap(struct folio *folio, + struct page *page, int nr_pages, enum rmap_level level) +{ + __folio_rmap_sanity_checks(folio, page, nr_pages, level); + + switch (level) { + case RMAP_LEVEL_PTE: + do { + atomic_inc(&page->_mapcount); + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: + atomic_inc(&folio->_entire_mapcount); + break; + } +} + +/** + * folio_dup_file_rmap_ptes - duplicate PTE mappings of a page range of a folio + * @folio: The folio to duplicate the mappings of + * @page: The first page to duplicate the mappings of + * @nr_pages: The number of pages of which the mapping will be duplicated + * + * The page range of the folio is defined by [page, page + nr_pages) + * + * The caller needs to hold the page table lock. + */ +static inline void folio_dup_file_rmap_ptes(struct folio *folio, + struct page *page, int nr_pages) +{ + __folio_dup_file_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE); +} +#define folio_dup_file_rmap_pte(folio, page) \ + folio_dup_file_rmap_ptes(folio, page, 1) + +/** + * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio + * @folio: The folio to duplicate the mapping of + * @page: The first page to duplicate the mapping of + * + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) + * + * The caller needs to hold the page table lock. + */ +static inline void folio_dup_file_rmap_pmd(struct folio *folio, + struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + __folio_dup_file_rmap(folio, page, HPAGE_PMD_NR, RMAP_LEVEL_PTE); +#else + WARN_ON_ONCE(true); +#endif +} + static inline void __page_dup_rmap(struct page *page, bool compound) { VM_WARN_ON(folio_test_hugetlb(page_folio(page))); @@ -319,11 +373,6 @@ static inline void __page_dup_rmap(struct page *page, bool compound) } }
-static inline void page_dup_file_rmap(struct page *page, bool compound) -{ - __page_dup_rmap(page, compound); -} - /** * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped * anonymous page diff --git a/mm/memory.c b/mm/memory.c index d60144b822a1..3beb0ef1688d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -968,7 +968,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, rss[MM_ANONPAGES]++; } else if (page) { folio_get(folio); - page_dup_file_rmap(page, false); + folio_dup_file_rmap_pte(folio, page); rss[mm_counter_file(page)]++; add_reliable_folio_counter(folio, dst_vma->vm_mm, 1); }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 61d90309b7156d54c5d358cb5d8bf55b33d233d2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone, remove them.
Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap baching during fork() soon.
Link: https://lkml.kernel.org/r/20231220224504.646757-35-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 61d90309b7156d54c5d358cb5d8bf55b33d233d2) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/mm.h | 6 -- include/linux/rmap.h | 150 ++++++++++++++++++++++++++++++------------- 2 files changed, 106 insertions(+), 50 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 3f2fe3b85dc0..3452aa356a71 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1991,12 +1991,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma, return folio_maybe_dma_pinned(folio); }
-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, - struct page *page) -{ - return folio_needs_cow_for_dma(vma, page_folio(page)); -} - /** * is_zero_page - Query if a page is a zero page * @page: The page to query diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 8f369fb3ed71..697cce4b5a93 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -359,68 +359,130 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio, #endif }
-static inline void __page_dup_rmap(struct page *page, bool compound) +static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio, + struct page *page, int nr_pages, struct vm_area_struct *src_vma, + enum rmap_level level) { - VM_WARN_ON(folio_test_hugetlb(page_folio(page))); + bool maybe_pinned; + int i; + + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + __folio_rmap_sanity_checks(folio, page, nr_pages, level);
- if (compound) { - struct folio *folio = (struct folio *)page; + /* + * If this folio may have been pinned by the parent process, + * don't allow to duplicate the mappings but instead require to e.g., + * copy the subpage immediately for the child so that we'll always + * guarantee the pinned folio won't be randomly replaced in the + * future on write faults. + */ + maybe_pinned = likely(!folio_is_device_private(folio)) && + unlikely(folio_needs_cow_for_dma(src_vma, folio));
- VM_BUG_ON_PAGE(compound && !PageHead(page), page); + /* + * No need to check+clear for already shared PTEs/PMDs of the + * folio. But if any page is PageAnonExclusive, we must fallback to + * copying if the folio maybe pinned. + */ + switch (level) { + case RMAP_LEVEL_PTE: + if (unlikely(maybe_pinned)) { + for (i = 0; i < nr_pages; i++) + if (PageAnonExclusive(page + i)) + return -EBUSY; + } + do { + if (PageAnonExclusive(page)) + ClearPageAnonExclusive(page); + atomic_inc(&page->_mapcount); + } while (page++, --nr_pages > 0); + break; + case RMAP_LEVEL_PMD: + if (PageAnonExclusive(page)) { + if (unlikely(maybe_pinned)) + return -EBUSY; + ClearPageAnonExclusive(page); + } atomic_inc(&folio->_entire_mapcount); - } else { - atomic_inc(&page->_mapcount); + break; } + return 0; }
/** - * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped - * anonymous page - * @page: the page to duplicate the mapping for - * @compound: the page is mapped as compound or as a small page - * @vma: the source vma + * folio_try_dup_anon_rmap_ptes - try duplicating PTE mappings of a page range + * of a folio + * @folio: The folio to duplicate the mappings of + * @page: The first page to duplicate the mappings of + * @nr_pages: The number of pages of which the mapping will be duplicated + * @src_vma: The vm area from which the mappings are duplicated * - * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq. + * The page range of the folio is defined by [page, page + nr_pages) * - * Duplicating the mapping can only fail if the page may be pinned; device - * private pages cannot get pinned and consequently this function cannot fail. + * The caller needs to hold the page table lock and the + * vma->vma_mm->write_protect_seq. + * + * Duplicating the mappings can only fail if the folio may be pinned; device + * private folios cannot get pinned and consequently this function cannot fail + * for them. + * + * If duplicating the mappings succeeded, the duplicated PTEs have to be R/O in + * the parent and the child. They must *not* be writable after this call + * succeeded. + * + * Returns 0 if duplicating the mappings succeeded. Returns -EBUSY otherwise. + */ +static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio, + struct page *page, int nr_pages, struct vm_area_struct *src_vma) +{ + return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma, + RMAP_LEVEL_PTE); +} +#define folio_try_dup_anon_rmap_pte(folio, page, vma) \ + folio_try_dup_anon_rmap_ptes(folio, page, 1, vma) + +/** + * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range + * of a folio + * @folio: The folio to duplicate the mapping of + * @page: The first page to duplicate the mapping of + * @src_vma: The vm area from which the mapping is duplicated + * + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) * - * If duplicating the mapping succeeds, the page has to be mapped R/O into - * the parent and the child. It must *not* get mapped writable after this call. + * The caller needs to hold the page table lock and the + * vma->vma_mm->write_protect_seq. + * + * Duplicating the mapping can only fail if the folio may be pinned; device + * private folios cannot get pinned and consequently this function cannot fail + * for them. + * + * If duplicating the mapping succeeds, the duplicated PMD has to be R/O in + * the parent and the child. They must *not* be writable after this call + * succeeded. * * Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise. */ +static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio, + struct page *page, struct vm_area_struct *src_vma) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + return __folio_try_dup_anon_rmap(folio, page, HPAGE_PMD_NR, src_vma, + RMAP_LEVEL_PMD); +#else + WARN_ON_ONCE(true); + return -EBUSY; +#endif +} + static inline int page_try_dup_anon_rmap(struct page *page, bool compound, struct vm_area_struct *vma) { - VM_BUG_ON_PAGE(!PageAnon(page), page); - - /* - * No need to check+clear for already shared pages, including KSM - * pages. - */ - if (!PageAnonExclusive(page)) - goto dup; - - /* - * If this page may have been pinned by the parent process, - * don't allow to duplicate the mapping but instead require to e.g., - * copy the page immediately for the child so that we'll always - * guarantee the pinned page won't be randomly replaced in the - * future on write faults. - */ - if (likely(!is_device_private_page(page)) && - unlikely(page_needs_cow_for_dma(vma, page))) - return -EBUSY; + struct folio *folio = page_folio(page);
- ClearPageAnonExclusive(page); - /* - * It's okay to share the anon page between both processes, mapping - * the page R/O into both processes. - */ -dup: - __page_dup_rmap(page, compound); - return 0; + if (likely(!compound)) + return folio_try_dup_anon_rmap_pte(folio, page, vma); + return folio_try_dup_anon_rmap_pmd(folio, page, vma); }
/**
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 96c772c25c89f35091ce924117602d04de82a0fe category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert copy_huge_pmd() and fixup the comment in copy_huge_pud(). While at it, perform more folio conversion in copy_huge_pmd().
Link: https://lkml.kernel.org/r/20231220224504.646757-36-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 96c772c25c89f35091ce924117602d04de82a0fe) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/huge_memory.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a4947418480f..2938998d4933 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1249,6 +1249,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, { spinlock_t *dst_ptl, *src_ptl; struct page *src_page; + struct folio *src_folio; pmd_t pmd; pgtable_t pgtable = NULL; int ret = -ENOMEM; @@ -1315,11 +1316,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
src_page = pmd_page(pmd); VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + src_folio = page_folio(src_page);
- get_page(src_page); - if (unlikely(page_try_dup_anon_rmap(src_page, true, src_vma))) { + folio_get(src_folio); + if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { /* Page maybe pinned: split and retry the fault on PTEs. */ - put_page(src_page); + folio_put(src_folio); pte_free(dst_mm, pgtable); spin_unlock(src_ptl); spin_unlock(dst_ptl); @@ -1429,8 +1431,8 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, }
/* - * TODO: once we support anonymous pages, use page_try_dup_anon_rmap() - * and split if duplicating fails. + * TODO: once we support anonymous pages, use + * folio_try_dup_anon_rmap_*() and split if duplicating fails. */ pudp_set_wrprotect(src_mm, addr, src_pud); pud = pud_mkold(pud_wrprotect(pud));
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 08e7795e2444c3df9292f4ac7092be6168166a46 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert copy_nonpresent_pte(). While at it, perform some more folio conversion.
Link: https://lkml.kernel.org/r/20231220224504.646757-37-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 08e7795e2444c3df9292f4ac7092be6168166a46) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/memory.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index 3beb0ef1688d..6755d8addbb3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -788,6 +788,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, unsigned long vm_flags = dst_vma->vm_flags; pte_t orig_pte = ptep_get(src_pte); pte_t pte = orig_pte; + struct folio *folio; struct page *page; swp_entry_t entry = pte_to_swp_entry(orig_pte);
@@ -832,6 +833,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, } } else if (is_device_private_entry(entry)) { page = pfn_swap_entry_to_page(entry); + folio = page_folio(page);
/* * Update rss count even for unaddressable pages, as @@ -842,10 +844,10 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, * for unaddressable pages, at some point. But for now * keep things as they are. */ - get_page(page); + folio_get(folio); rss[mm_counter(page)]++; /* Cannot fail as these pages cannot get pinned. */ - BUG_ON(page_try_dup_anon_rmap(page, false, src_vma)); + folio_try_dup_anon_rmap_pte(folio, page, src_vma);
/* * We do not preserve soft-dirty information, because so @@ -959,7 +961,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, * future. */ folio_get(folio); - if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { + if (unlikely(folio_try_dup_anon_rmap_pte(folio, page, src_vma))) { /* Page may be pinned, we have to copy. */ folio_put(folio); return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit a13d096471ec0ac5c6fc90fbcd57e8430024046a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
All users are gone, remove page_try_dup_anon_rmap() and any remaining traces.
Link: https://lkml.kernel.org/r/20231220224504.646757-38-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit a13d096471ec0ac5c6fc90fbcd57e8430024046a) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 697cce4b5a93..7f7eda68a7cf 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -250,7 +250,7 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address);
-/* See page_try_dup_anon_rmap() */ +/* See folio_try_dup_anon_rmap_*() */ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, struct vm_area_struct *vma) { @@ -475,16 +475,6 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio, #endif }
-static inline int page_try_dup_anon_rmap(struct page *page, bool compound, - struct vm_area_struct *vma) -{ - struct folio *folio = page_folio(page); - - if (likely(!compound)) - return folio_try_dup_anon_rmap_pte(folio, page, vma); - return folio_try_dup_anon_rmap_pmd(folio, page, vma); -} - /** * page_try_share_anon_rmap - try marking an exclusive anonymous page possibly * shared to prepare for KSM or temporary unmapping @@ -493,8 +483,8 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, * The caller needs to hold the PT lock and has to have the page table entry * cleared/invalidated. * - * This is similar to page_try_dup_anon_rmap(), however, not used during fork() - * to duplicate a mapping, but instead to prepare for KSM or temporarily + * This is similar to folio_try_dup_anon_rmap_*(), however, not used during + * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily * unmapping a page (swap, migration) via folio_remove_rmap_*(). * * Marking the page shared can only fail if the page may be pinned; device
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit e3b4b1374f87c71e9309efc6149f113cdd17af72 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's convert it like we converted all the other rmap functions. Don't introduce folio_try_share_anon_rmap_ptes() for now, as we don't have a user that wants rmap batching in sight. Pretty easy to add later.
All users are easy to convert -- only ksm.c doesn't use folios yet but that is left for future work -- so let's just do it in a single shot.
While at it, turn the BUG_ON into a WARN_ON_ONCE.
Note that page_try_share_anon_rmap() so far didn't care about pte/pmd mappings (no compound parameter). We're changing that so we can perform better sanity checks and make the code actually more readable/consistent. For example, __folio_rmap_sanity_checks() will make sure that a PMD range actually falls completely into the folio.
Link: https://lkml.kernel.org/r/20231220224504.646757-39-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit e3b4b1374f87c71e9309efc6149f113cdd17af72) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 96 ++++++++++++++++++++++++++++++++------------ mm/gup.c | 2 +- mm/huge_memory.c | 9 +++-- mm/internal.h | 4 +- mm/ksm.c | 5 ++- mm/migrate_device.c | 2 +- mm/rmap.c | 11 ++--- 7 files changed, 89 insertions(+), 40 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 7f7eda68a7cf..7cf3b6a37766 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -266,7 +266,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, return 0; }
-/* See page_try_share_anon_rmap() */ +/* See folio_try_share_anon_rmap_*() */ static inline int hugetlb_try_share_anon_rmap(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); @@ -475,31 +475,15 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio, #endif }
-/** - * page_try_share_anon_rmap - try marking an exclusive anonymous page possibly - * shared to prepare for KSM or temporary unmapping - * @page: the exclusive anonymous page to try marking possibly shared - * - * The caller needs to hold the PT lock and has to have the page table entry - * cleared/invalidated. - * - * This is similar to folio_try_dup_anon_rmap_*(), however, not used during - * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily - * unmapping a page (swap, migration) via folio_remove_rmap_*(). - * - * Marking the page shared can only fail if the page may be pinned; device - * private pages cannot get pinned and consequently this function cannot fail. - * - * Returns 0 if marking the page possibly shared succeeded. Returns -EBUSY - * otherwise. - */ -static inline int page_try_share_anon_rmap(struct page *page) +static __always_inline int __folio_try_share_anon_rmap(struct folio *folio, + struct page *page, int nr_pages, enum rmap_level level) { - VM_WARN_ON(folio_test_hugetlb(page_folio(page))); - VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio); + __folio_rmap_sanity_checks(folio, page, nr_pages, level);
- /* device private pages cannot get pinned via GUP. */ - if (unlikely(is_device_private_page(page))) { + /* device private folios cannot get pinned via GUP. */ + if (unlikely(folio_is_device_private(folio))) { ClearPageAnonExclusive(page); return 0; } @@ -550,7 +534,7 @@ static inline int page_try_share_anon_rmap(struct page *page) if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) smp_mb();
- if (unlikely(page_maybe_dma_pinned(page))) + if (unlikely(folio_maybe_dma_pinned(folio))) return -EBUSY; ClearPageAnonExclusive(page);
@@ -563,6 +547,68 @@ static inline int page_try_share_anon_rmap(struct page *page) return 0; }
+/** + * folio_try_share_anon_rmap_pte - try marking an exclusive anonymous page + * mapped by a PTE possibly shared to prepare + * for KSM or temporary unmapping + * @folio: The folio to share a mapping of + * @page: The mapped exclusive page + * + * The caller needs to hold the page table lock and has to have the page table + * entries cleared/invalidated. + * + * This is similar to folio_try_dup_anon_rmap_pte(), however, not used during + * fork() to duplicate mappings, but instead to prepare for KSM or temporarily + * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pte(). + * + * Marking the mapped page shared can only fail if the folio maybe pinned; + * device private folios cannot get pinned and consequently this function cannot + * fail. + * + * Returns 0 if marking the mapped page possibly shared succeeded. Returns + * -EBUSY otherwise. + */ +static inline int folio_try_share_anon_rmap_pte(struct folio *folio, + struct page *page) +{ + return __folio_try_share_anon_rmap(folio, page, 1, RMAP_LEVEL_PTE); +} + +/** + * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page + * range mapped by a PMD possibly shared to + * prepare for temporary unmapping + * @folio: The folio to share the mapping of + * @page: The first page to share the mapping of + * + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) + * + * The caller needs to hold the page table lock and has to have the page table + * entries cleared/invalidated. + * + * This is similar to folio_try_dup_anon_rmap_pmd(), however, not used during + * fork() to duplicate a mapping, but instead to prepare for temporarily + * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pmd(). + * + * Marking the mapped pages shared can only fail if the folio maybe pinned; + * device private folios cannot get pinned and consequently this function cannot + * fail. + * + * Returns 0 if marking the mapped pages possibly shared succeeded. Returns + * -EBUSY otherwise. + */ +static inline int folio_try_share_anon_rmap_pmd(struct folio *folio, + struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + return __folio_try_share_anon_rmap(folio, page, HPAGE_PMD_NR, + RMAP_LEVEL_PMD); +#else + WARN_ON_ONCE(true); + return -EBUSY; +#endif +} + /* * Called from mm/vmscan.c to handle paging out */ diff --git a/mm/gup.c b/mm/gup.c index 813add0d3a74..5332f37a79ee 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -177,7 +177,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) /* * Adjust the pincount before re-checking the PTE for changes. * This is essentially a smp_mb() and is paired with a memory - * barrier in page_try_share_anon_rmap(). + * barrier in folio_try_share_anon_rmap_*(). */ smp_mb__after_atomic();
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2938998d4933..152243e30561 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2379,10 +2379,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * In case we cannot clear PageAnonExclusive(), split the PMD * only and let try_to_migrate_one() fail later. * - * See page_try_share_anon_rmap(): invalidate PMD first. + * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ anon_exclusive = PageAnonExclusive(page); - if (freeze && anon_exclusive && page_try_share_anon_rmap(page)) + if (freeze && anon_exclusive && + folio_try_share_anon_rmap_pmd(folio, page)) freeze = false; if (!freeze) { rmap_t rmap_flags = RMAP_NONE; @@ -3416,9 +3417,9 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
- /* See page_try_share_anon_rmap(): invalidate PMD first. */ + /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); - if (anon_exclusive && page_try_share_anon_rmap(page)) { + if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { set_pmd_at(mm, address, pvmw->pmd, pmdval); return -EBUSY; } diff --git a/mm/internal.h b/mm/internal.h index 83cdb6ea72e3..2af059499f60 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1033,7 +1033,7 @@ enum { * * Ordinary GUP: Using the PT lock * * GUP-fast and fork(): mm->write_protect_seq * * GUP-fast and KSM or temporary unmapping (swap, migration): see - * page_try_share_anon_rmap() + * folio_try_share_anon_rmap_*() * * Must be called with the (sub)page that's actually referenced via the * page table entry, which might not necessarily be the head page for a @@ -1076,7 +1076,7 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma, return is_cow_mapping(vma->vm_flags); }
- /* Paired with a memory barrier in page_try_share_anon_rmap(). */ + /* Paired with a memory barrier in folio_try_share_anon_rmap_*(). */ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) smp_rmb();
diff --git a/mm/ksm.c b/mm/ksm.c index 3d3db4111262..94207293e6fc 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1148,8 +1148,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page, goto out_unlock; }
- /* See page_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && page_try_share_anon_rmap(page)) { + /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ + if (anon_exclusive && + folio_try_share_anon_rmap_pte(page_folio(page), page)) { set_pte_at(mm, pvmw.address, pvmw.pte, entry); goto out_unlock; } diff --git a/mm/migrate_device.c b/mm/migrate_device.c index cee7957cecb8..40032b85ab4b 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -202,7 +202,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (anon_exclusive) { pte = ptep_clear_flush(vma, addr, ptep);
- if (page_try_share_anon_rmap(page)) { + if (folio_try_share_anon_rmap_pte(folio, page)) { set_pte_at(mm, addr, ptep, pte); folio_unlock(folio); folio_put(folio); diff --git a/mm/rmap.c b/mm/rmap.c index efbe0708adeb..6bb33498e9e3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1839,9 +1839,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; }
- /* See page_try_share_anon_rmap(): clear PTE first. */ + /* See folio_try_share_anon_rmap(): clear PTE first. */ if (anon_exclusive && - page_try_share_anon_rmap(subpage)) { + folio_try_share_anon_rmap_pte(folio, subpage)) { swap_free(entry); set_pte_at(mm, address, pvmw.pte, pteval); ret = false; @@ -2117,7 +2117,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte;
if (anon_exclusive) - BUG_ON(page_try_share_anon_rmap(subpage)); + WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio, + subpage));
/* * Store the pfn of the page in a special migration @@ -2190,7 +2191,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) && !anon_exclusive, subpage);
- /* See page_try_share_anon_rmap(): clear PTE first. */ + /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { if (anon_exclusive && hugetlb_try_share_anon_rmap(folio)) { @@ -2201,7 +2202,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } else if (anon_exclusive && - page_try_share_anon_rmap(subpage)) { + folio_try_share_anon_rmap_pte(folio, subpage)) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit e78a13fd16bb9d9712f61be2bd6612a092ce66ea category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
We removed all "bool compound" and RMAP_COMPOUND parameters. Let's remove the remaining "compound" terminology by making COMPOUND_MAPPED match the "folio->_entire_mapcount" terminology, renaming it to ENTIRELY_MAPPED.
ENTIRELY_MAPPED is only used when the whole folio is mapped using a single page table entry (e.g., a single PMD mapping a PMD-sized THP). For now, we don't support mapping any THP bigger than that, so ENTIRELY_MAPPED only applies to PMD-mapped PMD-sized THP only.
Link: https://lkml.kernel.org/r/20231220224504.646757-40-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit e78a13fd16bb9d9712f61be2bd6612a092ce66ea) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- Documentation/mm/transhuge.rst | 2 +- mm/internal.h | 6 +++--- mm/rmap.c | 18 +++++++++--------- 3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst index cf81272a6b8b..93c9239b9ebe 100644 --- a/Documentation/mm/transhuge.rst +++ b/Documentation/mm/transhuge.rst @@ -117,7 +117,7 @@ pages:
- map/unmap of a PMD entry for the whole THP increment/decrement folio->_entire_mapcount and also increment/decrement - folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount + folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount goes from -1 to 0 or 0 to -1.
- map/unmap of individual pages with PTE entry increment/decrement diff --git a/mm/internal.h b/mm/internal.h index 2af059499f60..fc27ce997638 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -54,12 +54,12 @@ void page_writeback_init(void);
/* * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages, - * its nr_pages_mapped would be 0x400000: choose the COMPOUND_MAPPED bit + * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE). Hugetlb currently * leaves nr_pages_mapped at 0, but avoid surprise if it participates later. */ -#define COMPOUND_MAPPED 0x800000 -#define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1) +#define ENTIRELY_MAPPED 0x800000 +#define FOLIO_PAGES_MAPPED (ENTIRELY_MAPPED - 1)
/* * Flags passed to __show_mem() and show_free_areas() to suppress output in diff --git a/mm/rmap.c b/mm/rmap.c index 6bb33498e9e3..73f2b3a33158 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1142,7 +1142,7 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio, first = atomic_inc_and_test(&page->_mapcount); if (first && folio_test_large(folio)) { first = atomic_inc_return_relaxed(mapped); - first = (first < COMPOUND_MAPPED); + first = (first < ENTIRELY_MAPPED); }
if (first) @@ -1152,15 +1152,15 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio, case RMAP_LEVEL_PMD: first = atomic_inc_and_test(&folio->_entire_mapcount); if (first) { - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { + nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped); + if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) { *nr_pmdmapped = folio_nr_pages(folio); nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); /* Raced ahead of a remove and another add? */ if (unlikely(nr < 0)) nr = 0; } else { - /* Raced ahead of a remove of COMPOUND_MAPPED */ + /* Raced ahead of a remove of ENTIRELY_MAPPED */ nr = 0; } } @@ -1403,7 +1403,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); - atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); + atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED); SetPageAnonExclusive(&folio->page); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } @@ -1484,7 +1484,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, last = atomic_add_negative(-1, &page->_mapcount); if (last && folio_test_large(folio)) { last = atomic_dec_return_relaxed(mapped); - last = (last < COMPOUND_MAPPED); + last = (last < ENTIRELY_MAPPED); }
if (last) @@ -1494,15 +1494,15 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, case RMAP_LEVEL_PMD: last = atomic_add_negative(-1, &folio->_entire_mapcount); if (last) { - nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED)) { + nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped); + if (likely(nr < ENTIRELY_MAPPED)) { nr_pmdmapped = folio_nr_pages(folio); nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); /* Raced ahead of another remove and an add? */ if (unlikely(nr < 0)) nr = 0; } else { - /* An add of COMPOUND_MAPPED raced ahead */ + /* An add of ENTIRELY_MAPPED raced ahead */ nr = 0; } }
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 4a8ffab02db55c8a70063c57519cadf72d480ed4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Let's fixup one remaining comment. Note that the only trace remaining of the old rmap interface is in an example in Documentation/trace/ftrace.rst, that we'll just leave alone.
Link: https://lkml.kernel.org/r/20231220224504.646757-41-david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 4a8ffab02db55c8a70063c57519cadf72d480ed4) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/internal.h b/mm/internal.h index fc27ce997638..ef68eddaa5d3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -647,7 +647,7 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma) * should be called with vma's mmap_lock held for read or write, * under page table lock for the pte/pmd being added or removed. * - * mlock is usually called at the end of page_add_*_rmap(), munlock at + * mlock is usually called at the end of folio_add_*_rmap_*(), munlock at * the end of folio_remove_rmap_*(); but new anon folios are managed by * folio_add_lru_vma() calling mlock_new_folio(). */
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc1 commit 9c5938694cd0e9e00bdfb7e60900673263daf4d5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
Unfortunately, vm_insert_page() and friends and up passing driver-allocated folios into folio_add_file_rmap_pte() using insert_page_into_pte_locked().
While these driver-allocated folios can be compound pages (large folios), they are not proper "rmappable" folios.
In these VM_MIXEDMAP VMAs, there isn't really the concept of a reverse mapping, so long-term, we should clean that up and not call into rmap code.
For the time being, document how we can end up in rmap code with large folios that are not marked rmappable.
Link: https://lkml.kernel.org/r/793c5cee-d5fc-4eb1-86a2-39e05686233d@redhat.com Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()") Reported-by: syzbot+50ef73537bbc393a25bb@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000014174060e09316e@google.com Signed-off-by: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Ryan Roberts ryan.roberts@arm.com Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 9c5938694cd0e9e00bdfb7e60900673263daf4d5) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/rmap.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 7cf3b6a37766..5f750cdc458b 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -194,8 +194,15 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio, { /* hugetlb folios are handled separately. */ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); - VM_WARN_ON_FOLIO(folio_test_large(folio) && - !folio_test_large_rmappable(folio), folio); + + /* + * TODO: we get driver-allocated folios that have nothing to do with + * the rmap using vm_insert_page(); therefore, we cannot assume that + * folio_test_large_rmappable() holds for large folios. We should + * handle any desired mapcount+stats accounting for these folios in + * VM_MIXEDMAP VMAs separately, and then sanity-check here that + * we really only get rmappable folios. + */
VM_WARN_ON_ONCE(nr_pages <= 0); VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc3 commit db44c658f798ad907219f15e033229b8d1aadb93 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
The correct folio replacement for "set_page_dirty()" is "folio_mark_dirty()", not "folio_set_dirty()". Using the latter won't properly inform the FS using the dirty_folio() callback.
This has been found by code inspection, but likely this can result in some real trouble.
Link: https://lkml.kernel.org/r/20240122175407.307992-1-david@redhat.com Fixes: a8e61d584eda0 ("mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()") Signed-off-by: David Hildenbrand david@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit db44c658f798ad907219f15e033229b8d1aadb93) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 152243e30561..2d325daab411 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2292,7 +2292,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pmd_page(old_pmd); folio = page_folio(page); if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) - folio_set_dirty(folio); + folio_mark_dirty(folio); if (!folio_test_referenced(folio) && pmd_young(old_pmd)) folio_set_referenced(folio); add_reliable_page_counter(page, mm, -HPAGE_PMD_NR); @@ -3425,7 +3425,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, }
if (pmd_dirty(pmdval)) - folio_set_dirty(folio); + folio_mark_dirty(folio); if (pmd_write(pmdval)) entry = make_writable_migration_entry(page_to_pfn(page)); else if (anon_exclusive)
From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v6.8-rc3 commit e4e3df290f65da6cb27dac1309389c916f27db1a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CESE CVE: NA
-------------------------------------------------
The correct folio replacement for "set_page_dirty()" is "folio_mark_dirty()", not "folio_set_dirty()". Using the latter won't properly inform the FS using the dirty_folio() callback.
This has been found by code inspection, but likely this can result in some real trouble when zapping dirty PTEs that point at clean pagecache folios.
Yuezhang Mo said: "Without this fix, testing the latest exfat with xfstests, test cases generic/029 and generic/030 will fail."
Link: https://lkml.kernel.org/r/20240122171751.272074-1-david@redhat.com Fixes: c46265030b0f ("mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Ryan Roberts ryan.roberts@arm.com Closes: https://lkml.kernel.org/r/2445cedb-61fb-422c-8bfb-caf0a2beed62@arm.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: Yuezhang Mo Yuezhang.Mo@sony.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit e4e3df290f65da6cb27dac1309389c916f27db1a) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c index 6755d8addbb3..031ff37a91fb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1465,7 +1465,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, delay_rmap = 0; if (!folio_test_anon(folio)) { if (pte_dirty(ptent)) { - folio_set_dirty(folio); + folio_mark_dirty(folio); if (tlb_delay_rmap(tlb)) { delay_rmap = 1; force_flush = 1;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5613 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/P...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5613 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/P...
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5615 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/P...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5615 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/P...