From: "Vishal Moola (Oracle)" vishal.moola@gmail.com
mainline inclusion from mainline-v6.9-rc1 commit 997f0ecb11da15602a3d34e10f9ca8418db794d0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9R3AY CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "Handle hugetlb faults under the VMA lock", v2.
It is generally safe to handle hugetlb faults under the VMA lock. The only time this is unsafe is when no anon_vma has been allocated to this vma yet, so we can use vmf_anon_prepare() instead of anon_vma_prepare() to bailout if necessary. This should only happen for the first hugetlb page in the vma.
Additionally, this patchset begins to use struct vm_fault within hugetlb_fault(). This works towards cleaning up hugetlb code, and should significantly reduce the number of arguments passed to functions.
The last patch in this series may cause ltp hugemmap10 to "fail". This is because vmf_anon_prepare() may bailout with no anon_vma under the VMA lock after allocating a folio for the hugepage. In free_huge_folio(), this folio is completely freed on bailout iff there is a surplus of hugetlb pages. This will remove a folio off the freelist and decrement the number of hugepages while ltp expects these counters to remain unchanged on failure. The rest of the ltp testcases pass.
This patch (of 2):
In order to handle hugetlb faults under the VMA lock, hugetlb can use vmf_anon_prepare() to ensure we can safely prepare an anon_vma. Change it to be a non-static function so it can be used within hugetlb as well.
Link: https://lkml.kernel.org/r/20240221234732.187629-6-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20240221234732.187629-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/internal.h | 1 + mm/memory.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/internal.h b/mm/internal.h index 65e06f06d26b..0ecbaa392054 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -282,6 +282,7 @@ static inline void wake_throttle_isolated(pg_data_t *pgdat) wake_up(wqh); }
+vm_fault_t vmf_anon_prepare(struct vm_fault *vmf); vm_fault_t do_swap_page(struct vm_fault *vmf); void folio_rotate_reclaimable(struct folio *folio); bool __folio_end_writeback(struct folio *folio); diff --git a/mm/memory.c b/mm/memory.c index ed275401e695..776f59299900 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3215,7 +3215,7 @@ static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf) return VM_FAULT_RETRY; }
-static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) +vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma;