hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAFONL CVE: NA
--------------------------------
In the NUMA balancing scenario, support for PMD-level THP migration, follow the flow of do_huge_pmd_numa_page function in the AutoNuma: Acquire ptl Prepare for migration (elevate page refcount) Release ptl Isolate page from lru and elevate page refcount Migrate the misplaced THP
The page refcount elevation when holding ptl should prevent from THP split.
In conclusion, like what AutoNuma did, these pages would not migrate: 1. Read-only files, see do_numa_access(). 2. Nonnormal pages like huge zero pages and devmap pages, see vm_normal_page_pmd(). 3. Shared libraries and dirty pages, see migrate_misplaced_page(). 4. THP mapped by multiple processes, see numamigrate_isolate_page().
Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ze Zuo zuoze1@huawei.com --- mm/mem_sampling.c | 78 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 2 deletions(-)
diff --git a/mm/mem_sampling.c b/mm/mem_sampling.c index 0eaea2680d83..e0470052ae9c 100644 --- a/mm/mem_sampling.c +++ b/mm/mem_sampling.c @@ -144,6 +144,79 @@ static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, return mpol_misplaced(page, vma, addr); }
+static inline void do_thp_numa_access(struct mm_struct *mm, + struct vm_area_struct *vma, + u64 vaddr, struct page *page) +{ + int page_nid = NUMA_NO_NODE; + int target_nid, last_cpupid = -1; + bool migrated = false; + int flags = 0; + struct page *hpage = NULL; + u64 haddr = vaddr & HPAGE_PMD_MASK; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd, pmde; + spinlock_t *ptl; + + pgd = pgd_offset(mm, vaddr); + if (!pgd_present(*pgd)) + return; + + p4d = p4d_offset(pgd, vaddr); + if (!p4d_present(*p4d)) + return; + + pud = pud_offset(p4d, vaddr); + if (!pud_present(*pud)) + return; + + pmd = pmd_offset(pud, vaddr); + pmde = READ_ONCE(*pmd); + /* TODO: handle PTE-mapped THP */ + if (!pmd_trans_huge(pmde)) + return; + + ptl = pmd_lock(mm, pmd); + pmde = READ_ONCE(*pmd); + if (unlikely(!pmd_trans_huge(pmde))) + goto out_unlock; + + hpage = vm_normal_page_pmd(vma, haddr, pmde); + if (!hpage || hpage != compound_head(page)) + goto out_unlock; + + page_nid = page_to_nid(hpage); + last_cpupid = page_cpupid_last(hpage); + target_nid = numa_migrate_prep(hpage, vma, haddr, page_nid, + &flags); + spin_unlock(ptl); + if (target_nid == NUMA_NO_NODE) { + put_page(hpage); + goto out; + } + + migrated = migrate_misplaced_page(hpage, vma, target_nid); + if (migrated) { + flags |= TNF_MIGRATED; + page_nid = target_nid; + } else { + flags |= TNF_MIGRATE_FAIL; + } + +out: + trace_mm_numa_migrating(haddr, page_nid, target_nid, flags&TNF_MIGRATED); + if (page_nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, + flags); + + return; + +out_unlock: + spin_unlock(ptl); +} + /* * Called from task_work context to act upon the page access. * @@ -190,9 +263,10 @@ static void do_numa_access(struct task_struct *p, u64 vaddr, u64 paddr) if (unlikely(!PageLRU(page))) goto out_unlock;
- /* TODO: handle PTE-mapped THP or PMD-mapped THP*/ - if (PageCompound(page)) + if (PageCompound(page)) { + do_thp_numa_access(mm, vma, vaddr, page); goto out_unlock; + }
/* * Flag if the page is shared between multiple address spaces. This