[PATCH OLK-6.6 00/11] large folio: Performance and bugfix

Liu Shixin

20 Sep 2024 20 Sep '24

11:46 a.m.

Patch [1-3] are cleanups for shmem. Patch [4,5] fix performance of shmem write. Patch [6-11] fix fault handler's handling of poisoned tail pages. Baolin Wang (3): mm: shmem: simplify the suitable huge orders validation for tmpfs mm: shmem: rename shmem_is_huge() to shmem_huge_global_enabled() mm: shmem: move shmem_huge_global_enabled() into shmem_allowable_huge_orders() Kefeng Wang (1): tmpfs: fault in smaller chunks if large folio allocation not allowed Matthew Wilcox (Oracle) (6): mm: make mapping_evict_folio() the preferred way to evict clean folios mm: convert __do_fault() to use a folio mm: use mapping_evict_folio() in truncate_error_page() mm: convert soft_offline_in_use_page() to use a folio mm: convert isolate_page() to mf_isolate_folio() mm: remove invalidate_inode_page() Rik van Riel (1): mm,tmpfs: consider end of file write in shmem_is_huge include/linux/fs.h | 2 + include/linux/shmem_fs.h | 15 ++--- mm/filemap.c | 7 +- mm/huge_memory.c | 11 +--- mm/internal.h | 2 +- mm/khugepaged.c | 2 +- mm/memory-failure.c | 54 ++++++++-------- mm/memory.c | 20 +++--- mm/shmem.c | 135 +++++++++++++++++++++------------------ mm/truncate.c | 42 +++++------- mm/userfaultfd.c | 2 +- 11 files changed, 145 insertions(+), 147 deletions(-) -- 2.34.1

Show replies by date

Liu Shixin

20 Sep 20 Sep

11:46 a.m.

New subject: [PATCH OLK-6.6 01/11] mm: shmem: simplify the suitable huge orders validation for tmpfs

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 02/11] mm: shmem: rename shmem_is_huge() to shmem_huge_global_enabled()

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 03/11] mm: shmem: move shmem_huge_global_enabled() into shmem_allowable_huge_orders()

From: Baolin Wang <baolin.wang@linux.alibaba.com> next inclusion category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?... -------------------------------- Move shmem_huge_global_enabled() into shmem_allowable_huge_orders(), so that shmem_allowable_huge_orders() can also help to find the allowable huge orders for tmpfs. Moreover the shmem_huge_global_enabled() can become static. While we are at it, passing the vma instead of mm for shmem_huge_global_enabled() makes code cleaner. No functional changes. Link: https://lkml.kernel.org/r/8e825146bb29ee1a1c7bd64d2968ff3e19be7815.172162664... Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/shmem.c [ Conflict with mm_in_dynamic_pool(), move it after shmem_allowable_huge_orders() to promise orders = 0. ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- include/linux/shmem_fs.h | 12 ++-------- mm/huge_memory.c | 12 +++------- mm/shmem.c | 48 +++++++++++++++++++++++++--------------- 3 files changed, 35 insertions(+), 37 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 83a4fd53df8c..57e8a6689439 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -115,21 +115,13 @@ extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); int shmem_unuse(unsigned int type); #ifdef CONFIG_TRANSPARENT_HUGEPAGE -extern bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, bool shmem_huge_force, - struct mm_struct *mm, unsigned long vm_flags); unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool global_huge); + bool shmem_huge_force); #else -static __always_inline bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct mm_struct *mm, - unsigned long vm_flags) -{ - return false; -} static inline unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool global_huge) + bool shmem_huge_force) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5e546ff35cdf..fbcfbd5fa914 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -156,16 +156,10 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, * Must be done before hugepage flags check since shmem has its * own flags. */ - if (!in_pf && shmem_file(vma->vm_file)) { - bool global_huge = shmem_huge_global_enabled(file_inode(vma->vm_file), - vma->vm_pgoff, !enforce_sysfs, - vma->vm_mm, vm_flags); - - if (!vma_is_anon_shmem(vma)) - return global_huge ? orders : 0; + if (!in_pf && shmem_file(vma->vm_file)) return shmem_allowable_huge_orders(file_inode(vma->vm_file), - vma, vma->vm_pgoff, global_huge); - } + vma, vma->vm_pgoff, + !enforce_sysfs); if (!vma_is_anonymous(vma)) { /* diff --git a/mm/shmem.c b/mm/shmem.c index 3aec08ff44f1..9ab915d5c060 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -543,9 +543,10 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct mm_struct *mm, + bool shmem_huge_force, struct vm_area_struct *vma, unsigned long vm_flags) { + struct mm_struct *mm = vma ? vma->vm_mm : NULL; loff_t i_size; if (!S_ISREG(inode->i_mode)) @@ -575,15 +576,15 @@ static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, } } -bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct mm_struct *mm, +static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, + bool shmem_huge_force, struct vm_area_struct *vma, unsigned long vm_flags) { if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER) return false; return __shmem_huge_global_enabled(inode, index, shmem_huge_force, - mm, vm_flags); + vma, vm_flags); } #if defined(CONFIG_SYSFS) @@ -766,6 +767,13 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, { return 0; } + +static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, + bool shmem_huge_force, struct vm_area_struct *vma, + unsigned long vm_flags) +{ + return false; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /* @@ -1642,22 +1650,33 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) #ifdef CONFIG_TRANSPARENT_HUGEPAGE unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool global_huge) + bool shmem_huge_force) { unsigned long mask = READ_ONCE(huge_shmem_orders_always); unsigned long within_size_orders = READ_ONCE(huge_shmem_orders_within_size); - unsigned long vm_flags = vma->vm_flags; + unsigned long vm_flags = vma ? vma->vm_flags : 0; + bool global_huge; loff_t i_size; int order; - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + if (vma && ((vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return 0; /* If the hardware/firmware marked hugepage support disabled. */ if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) return 0; + global_huge = shmem_huge_global_enabled(inode, index, shmem_huge_force, + vma, vm_flags); + if (!vma || !vma_is_anon_shmem(vma)) { + /* + * For tmpfs, we now only support PMD sized THP if huge page + * is enabled, otherwise fallback to order 0. + */ + return global_huge ? BIT(HPAGE_PMD_ORDER) : 0; + } + /* * Following the 'deny' semantics of the top level, force the huge * option off from all mounts. @@ -2100,7 +2119,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, struct mm_struct *fault_mm; struct folio *folio; int error; - bool alloced, huge; + bool alloced; unsigned long orders = 0; if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping))) @@ -2173,17 +2192,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, return 0; } - huge = shmem_huge_global_enabled(inode, index, false, fault_mm, - vma ? vma->vm_flags : 0); - - /* Find hugepage orders that are allowed for anonymous shmem. */ + /* Find hugepage orders that are allowed for anonymous shmem and tmpfs. */ + orders = shmem_allowable_huge_orders(inode, vma, index, false); if (mm_in_dynamic_pool(vma ? vma->vm_mm : current->mm)) orders = 0; - else if (vma && vma_is_anon_shmem(vma)) - orders = shmem_allowable_huge_orders(inode, vma, index, huge); - else if (huge) - orders = BIT(HPAGE_PMD_ORDER); - if (orders > 0) { gfp_t huge_gfp; -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 04/11] mm,tmpfs: consider end of file write in shmem_is_huge

From: Rik van Riel <riel@surriel.com> next inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?... -------------------------------- Take the end of a file write into consideration when deciding whether or not to use huge pages for tmpfs files when the tmpfs filesystem is mounted with huge=within_size This allows large writes that append to the end of a file to automatically use large pages. Doing 4MB sequential writes without fallocate to a 16GB tmpfs file with fio. The numbers without THP or with huge=always stay the same, but the performance with huge=within_size now matches that of huge=always. huge before after 4kB pages 1560 MB/s 1560 MB/s within_size 1560 MB/s 4720 MB/s always: 4720 MB/s 4720 MB/s [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20240903111928.7171e60c@imladris.surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: fs/xfs/scrub/xfile.c fs/xfs/xfs_buf_mem.c mm/khugepaged.c mm/shmem.c [ Conflict in xfile.c,xfs_buf_mem.c and shmem.c because shmem_get_folio() has not been exported and used in xfs. Conflict in khugepaged.c because there are some page that have not been converted to folio. Context conflict with mm_in_dynamic_pool() in shmem.c ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- include/linux/shmem_fs.h | 8 +++--- mm/huge_memory.c | 2 +- mm/khugepaged.c | 2 +- mm/shmem.c | 58 +++++++++++++++++++++------------------- mm/userfaultfd.c | 2 +- 5 files changed, 37 insertions(+), 35 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 57e8a6689439..0880504a781e 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -117,11 +117,11 @@ int shmem_unuse(unsigned int type); #ifdef CONFIG_TRANSPARENT_HUGEPAGE unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool shmem_huge_force); + loff_t write_end, bool shmem_huge_force); #else static inline unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool shmem_huge_force) + loff_t write_end, bool shmem_huge_force) { return 0; } @@ -147,8 +147,8 @@ enum sgp_type { SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */ }; -int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, - enum sgp_type sgp); +int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end, + struct folio **foliop, enum sgp_type sgp); struct folio *shmem_read_folio_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fbcfbd5fa914..ea560ea7b39e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -158,7 +158,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, */ if (!in_pf && shmem_file(vma->vm_file)) return shmem_allowable_huge_orders(file_inode(vma->vm_file), - vma, vma->vm_pgoff, + vma, vma->vm_pgoff, 0, !enforce_sysfs); if (!vma_is_anonymous(vma)) { diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8006b13304de..c6379a2d55bc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1900,7 +1900,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (xa_is_value(page) || !PageUptodate(page)) { xas_unlock_irq(&xas); /* swap in or instantiate fallocated page */ - if (shmem_get_folio(mapping->host, index, + if (shmem_get_folio(mapping->host, index, 0, &folio, SGP_NOALLOC)) { result = SCAN_FAIL; goto xa_unlocked; diff --git a/mm/shmem.c b/mm/shmem.c index 9ab915d5c060..700335e56e67 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -543,7 +543,8 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct vm_area_struct *vma, + loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { struct mm_struct *mm = vma ? vma->vm_mm : NULL; @@ -563,7 +564,8 @@ static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, return true; case SHMEM_HUGE_WITHIN_SIZE: index = round_up(index + 1, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); + i_size = max(write_end, i_size_read(inode)); + i_size = round_up(i_size, PAGE_SIZE); if (i_size >> PAGE_SHIFT >= index) return true; fallthrough; @@ -577,14 +579,14 @@ static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, } static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct vm_area_struct *vma, - unsigned long vm_flags) + loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER) return false; - return __shmem_huge_global_enabled(inode, index, shmem_huge_force, - vma, vm_flags); + return __shmem_huge_global_enabled(inode, index, write_end, + shmem_huge_force, vma, vm_flags); } #if defined(CONFIG_SYSFS) @@ -769,8 +771,8 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, } static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct vm_area_struct *vma, - unsigned long vm_flags) + loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { return false; } @@ -976,7 +978,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index) * (although in some cases this is just a waste of time). */ folio = NULL; - shmem_get_folio(inode, index, &folio, SGP_READ); + shmem_get_folio(inode, index, 0, &folio, SGP_READ); return folio; } @@ -1161,7 +1163,7 @@ static int shmem_getattr(struct mnt_idmap *idmap, STATX_ATTR_NODUMP); generic_fillattr(idmap, request_mask, inode, stat); - if (shmem_huge_global_enabled(inode, 0, false, NULL, 0)) + if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; if (request_mask & STATX_BTIME) { @@ -1650,7 +1652,7 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) #ifdef CONFIG_TRANSPARENT_HUGEPAGE unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool shmem_huge_force) + loff_t write_end, bool shmem_huge_force) { unsigned long mask = READ_ONCE(huge_shmem_orders_always); unsigned long within_size_orders = READ_ONCE(huge_shmem_orders_within_size); @@ -1667,8 +1669,8 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) return 0; - global_huge = shmem_huge_global_enabled(inode, index, shmem_huge_force, - vma, vm_flags); + global_huge = shmem_huge_global_enabled(inode, index, write_end, + shmem_huge_force, vma, vm_flags); if (!vma || !vma_is_anon_shmem(vma)) { /* * For tmpfs, we now only support PMD sized THP if huge page @@ -2112,8 +2114,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, * vmf and fault_type are only supplied by shmem_fault: otherwise they are NULL. */ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, - struct folio **foliop, enum sgp_type sgp, gfp_t gfp, - struct vm_fault *vmf, vm_fault_t *fault_type) + loff_t write_end, struct folio **foliop, enum sgp_type sgp, + gfp_t gfp, struct vm_fault *vmf, vm_fault_t *fault_type) { struct vm_area_struct *vma = vmf ? vmf->vma : NULL; struct mm_struct *fault_mm; @@ -2193,7 +2195,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, } /* Find hugepage orders that are allowed for anonymous shmem and tmpfs. */ - orders = shmem_allowable_huge_orders(inode, vma, index, false); + orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false); if (mm_in_dynamic_pool(vma ? vma->vm_mm : current->mm)) orders = 0; if (orders > 0) { @@ -2294,10 +2296,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, return error; } -int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, - enum sgp_type sgp) +int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end, + struct folio **foliop, enum sgp_type sgp) { - return shmem_get_folio_gfp(inode, index, foliop, sgp, + return shmem_get_folio_gfp(inode, index, write_end, foliop, sgp, mapping_gfp_mask(inode->i_mapping), NULL, NULL); } @@ -2391,7 +2393,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) } WARN_ON_ONCE(vmf->page != NULL); - err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE, + err = shmem_get_folio_gfp(inode, vmf->pgoff, 0, &folio, SGP_CACHE, gfp, vmf, &ret); if (err) return vmf_error(err); @@ -2876,7 +2878,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping, return -EPERM; } - ret = shmem_get_folio(inode, index, &folio, SGP_WRITE); + ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE); if (ret) return ret; @@ -2947,7 +2949,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) break; } - error = shmem_get_folio(inode, index, &folio, SGP_READ); + error = shmem_get_folio(inode, index, 0, &folio, SGP_READ); if (error) { if (error == -EINVAL) error = 0; @@ -3123,7 +3125,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, if (*ppos >= i_size_read(inode)) break; - error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio, + error = shmem_get_folio(inode, *ppos / PAGE_SIZE, 0, &folio, SGP_READ); if (error) { if (error == -EINVAL) @@ -3310,8 +3312,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) error = -ENOMEM; else - error = shmem_get_folio(inode, index, &folio, - SGP_FALLOC); + error = shmem_get_folio(inode, index, offset + len, + &folio, SGP_FALLOC); if (error) { info->fallocend = undo_fallocend; /* Remove the !uptodate folios we added */ @@ -3663,7 +3665,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir, } else { inode_nohighmem(inode); inode->i_mapping->a_ops = &shmem_aops; - error = shmem_get_folio(inode, 0, &folio, SGP_WRITE); + error = shmem_get_folio(inode, 0, 0, &folio, SGP_WRITE); if (error) goto out_remove_offset; inode->i_op = &shmem_symlink_inode_operations; @@ -3709,7 +3711,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode, return ERR_PTR(-ECHILD); } } else { - error = shmem_get_folio(inode, 0, &folio, SGP_READ); + error = shmem_get_folio(inode, 0, 0, &folio, SGP_READ); if (error) return ERR_PTR(error); if (!folio) @@ -5168,7 +5170,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping, struct folio *folio; int error; - error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE, + error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE, gfp, NULL, NULL); if (error) return ERR_PTR(error); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 4ab24c56f660..8c22dd4e5e15 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -292,7 +292,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, struct page *page; int ret; - ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC); + ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret == -ENOENT) ret = -EFAULT; -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 05/11] tmpfs: fault in smaller chunks if large folio allocation not allowed

From: Kefeng Wang <wangkefeng.wang@huawei.com> hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS -------------------------------- The tmpfs supports large folio, but there is some configurable options to enable/disable large folio allocation, and for huge=within_size, large folio only allowabled if it fully within i_size, so there is performance issue when perform write without large folio, it is very similar with commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large folio mappings"), Fix it by checking whether it allows large folio allocation or not before perform write. Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- include/linux/fs.h | 2 ++ mm/filemap.c | 7 ++++++- mm/shmem.c | 5 +++++ 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index e7c55ebb9d71..5e37afb1b844 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -372,6 +372,8 @@ enum rw_hint { #define IOCB_DIO_CALLER_COMP (1 << 22) /* kiocb is a read or write operation submitted by fs/aio.c. */ #define IOCB_AIO_RW (1 << 23) +/* fault int small chunks(PAGE_SIZE) from userspace */ +#define IOCB_NO_LARGE_CHUNK (1 << 24) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ diff --git a/mm/filemap.c b/mm/filemap.c index 1d5d5f1c2b54..5d8e8810ae34 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4043,9 +4043,14 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) loff_t pos = iocb->ki_pos; struct address_space *mapping = file->f_mapping; const struct address_space_operations *a_ops = mapping->a_ops; - size_t chunk = mapping_max_folio_size(mapping); long status = 0; ssize_t written = 0; + size_t chunk; + + if (iocb->ki_flags & IOCB_NO_LARGE_CHUNK) + chunk = PAGE_SIZE; + else + chunk = mapping_max_folio_size(mapping); do { struct page *page; diff --git a/mm/shmem.c b/mm/shmem.c index 700335e56e67..72e0ec87219e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3042,6 +3042,7 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; + pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; ssize_t ret; inode_lock(inode); @@ -3054,6 +3055,10 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = file_update_time(file); if (ret) goto unlock; + + if (!shmem_allowable_huge_orders(inode, NULL, index, 0, false)) + iocb->ki_flags |= IOCB_NO_LARGE_CHUNK; + ret = generic_perform_write(iocb, from); unlock: inode_unlock(inode); -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 06/11] mm: make mapping_evict_folio() the preferred way to evict clean folios

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 1e12cbb9f69541181afab6b1ff358b4f1dd3e253 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "Fix fault handler's handling of poisoned tail pages". Since introducing the ability to have large folios in the page cache, it's been possible to have a hwpoisoned tail page returned from the fault handler. We handle this situation poorly; failing to remove the affected page from use. This isn't a minimal patch to fix it, it's a full conversion of all the code surrounding it. This patch (of 6): invalidate_inode_page() does very little beyond calling mapping_evict_folio(). Move the check for mapping being NULL into mapping_evict_folio() and make it available to the rest of the MM for use in the next few patches. Link: https://lkml.kernel.org/r/20231108182809.602073-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231108182809.602073-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/internal.h | 1 + mm/truncate.c | 33 ++++++++++++++++----------------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index e5b541e3f67e..2782a9426147 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -343,6 +343,7 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio); int truncate_inode_folio(struct address_space *mapping, struct folio *folio); bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end); +long mapping_evict_folio(struct address_space *mapping, struct folio *folio); long invalidate_inode_page(struct page *page); unsigned long mapping_try_invalidate(struct address_space *mapping, pgoff_t start, pgoff_t end, unsigned long *nr_failed); diff --git a/mm/truncate.c b/mm/truncate.c index 8e3aa9e8618e..1d516e51e29d 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -266,9 +266,22 @@ int generic_error_remove_page(struct address_space *mapping, struct page *page) } EXPORT_SYMBOL(generic_error_remove_page); -static long mapping_evict_folio(struct address_space *mapping, - struct folio *folio) +/** + * mapping_evict_folio() - Remove an unused folio from the page-cache. + * @mapping: The mapping this folio belongs to. + * @folio: The folio to remove. + * + * Safely remove one folio from the page cache. + * It only drops clean, unused folios. + * + * Context: Folio must be locked. + * Return: The number of pages successfully removed. + */ +long mapping_evict_folio(struct address_space *mapping, struct folio *folio) { + /* The page may have been truncated before it was locked */ + if (!mapping) + return 0; if (folio_test_dirty(folio) || folio_test_writeback(folio)) return 0; /* The refcount will be elevated if any page in the folio is mapped */ @@ -281,25 +294,11 @@ static long mapping_evict_folio(struct address_space *mapping, return remove_mapping(mapping, folio); } -/** - * invalidate_inode_page() - Remove an unused page from the pagecache. - * @page: The page to remove. - * - * Safely invalidate one page from its pagecache mapping. - * It only drops clean, unused pages. - * - * Context: Page must be locked. - * Return: The number of pages successfully removed. - */ long invalidate_inode_page(struct page *page) { struct folio *folio = page_folio(page); - struct address_space *mapping = folio_mapping(folio); - /* The page may have been truncated before it was locked */ - if (!mapping) - return 0; - return mapping_evict_folio(mapping, folio); + return mapping_evict_folio(folio_mapping(folio), folio); } /** -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 07/11] mm: convert __do_fault() to use a folio

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 01d1e0e6b7d99ebaf2e42d2205595080b7d0c271 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Convert vmf->page to a folio as soon as we're going to use it. This fixes a bug if the fault handler returns a tail page with hardware poison; tail pages have an invalid page->index, so we would fail to unmap the page from the page tables. We actually have to unmap the entire folio (or mapping_evict_folio() will fail), so use unmap_mapping_folio() instead. This also saves various calls to compound_head() hidden in lock_page(), put_page(), etc. Link: https://lkml.kernel.org/r/20231108182809.602073-3-willy@infradead.org Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/memory.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 03fa94ae898a..ea6db8bdeeb9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4586,6 +4586,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) static vm_fault_t __do_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + struct folio *folio; vm_fault_t ret; /* @@ -4614,27 +4615,26 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) VM_FAULT_DONE_COW))) return ret; + folio = page_folio(vmf->page); if (unlikely(PageHWPoison(vmf->page))) { - struct page *page = vmf->page; vm_fault_t poisonret = VM_FAULT_HWPOISON; if (ret & VM_FAULT_LOCKED) { - if (page_mapped(page)) - unmap_mapping_pages(page_mapping(page), - page->index, 1, false); - /* Retry if a clean page was removed from the cache. */ - if (invalidate_inode_page(page)) + if (page_mapped(vmf->page)) + unmap_mapping_folio(folio); + /* Retry if a clean folio was removed from the cache. */ + if (mapping_evict_folio(folio->mapping, folio)) poisonret = VM_FAULT_NOPAGE; - unlock_page(page); + folio_unlock(folio); } - put_page(page); + folio_put(folio); vmf->page = NULL; return poisonret; } if (unlikely(!(ret & VM_FAULT_LOCKED))) - lock_page(vmf->page); + folio_lock(folio); else - VM_BUG_ON_PAGE(!PageLocked(vmf->page), vmf->page); + VM_BUG_ON_PAGE(!folio_test_locked(folio), vmf->page); return ret; } -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 08/11] mm: use mapping_evict_folio() in truncate_error_page()

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 19369d866a8b89788cdc9b10c7b8c9b2777f806b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- We already have the folio and the mapping, so replace the call to invalidate_inode_page() with mapping_evict_folio(). Link: https://lkml.kernel.org/r/20231108182809.602073-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index abfd98ee047b..20f3205135de 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -943,10 +943,10 @@ static int delete_from_lru_cache(struct page *p) static int truncate_error_page(struct page *p, unsigned long pfn, struct address_space *mapping) { + struct folio *folio = page_folio(p); int ret = MF_FAILED; if (mapping->a_ops->error_remove_page) { - struct folio *folio = page_folio(p); int err = mapping->a_ops->error_remove_page(mapping, p); if (err != 0) @@ -960,7 +960,7 @@ static int truncate_error_page(struct page *p, unsigned long pfn, * If the file system doesn't support it just invalidate * This fails on dirty or anything with private pages */ - if (invalidate_inode_page(p)) + if (mapping_evict_folio(mapping, folio)) ret = MF_RECOVERED; else pr_info("%#lx: Failed to invalidate\n", pfn); -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 09/11] mm: convert soft_offline_in_use_page() to use a folio

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 049b26048dd287d52f6f6fbe5eafa301fdca5d37 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Replace the existing head-page logic with folio logic. Link: https://lkml.kernel.org/r/20231108182809.602073-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/memory-failure.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 20f3205135de..a5dfc8cc5632 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2691,40 +2691,40 @@ static int soft_offline_in_use_page(struct page *page) { long ret = 0; unsigned long pfn = page_to_pfn(page); - struct page *hpage = compound_head(page); + struct folio *folio = page_folio(page); char const *msg_page[] = {"page", "hugepage"}; - bool huge = PageHuge(page); + bool huge = folio_test_hugetlb(folio); LIST_HEAD(pagelist); struct migration_target_control mtc = { .nid = NUMA_NO_NODE, .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, }; - if (!huge && PageTransHuge(hpage)) { + if (!huge && folio_test_large(folio)) { if (try_to_split_thp_page(page)) { pr_info("soft offline: %#lx: thp split failed\n", pfn); return -EBUSY; } - hpage = page; + folio = page_folio(page); } - lock_page(page); + folio_lock(folio); if (!huge) - wait_on_page_writeback(page); + folio_wait_writeback(folio); if (PageHWPoison(page)) { - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); pr_info("soft offline: %#lx page already poisoned\n", pfn); return 0; } - if (!huge && PageLRU(page) && !PageSwapCache(page)) + if (!huge && folio_test_lru(folio) && !folio_test_swapcache(folio)) /* * Try to invalidate first. This should work for * non dirty unmapped page cache pages. */ - ret = invalidate_inode_page(page); - unlock_page(page); + ret = mapping_evict_folio(folio_mapping(folio), folio); + folio_unlock(folio); if (ret) { pr_info("soft_offline: %#lx: invalidated\n", pfn); @@ -2732,7 +2732,7 @@ static int soft_offline_in_use_page(struct page *page) return 0; } - if (isolate_page(hpage, &pagelist)) { + if (isolate_page(&folio->page, &pagelist)) { ret = migrate_pages(&pagelist, alloc_migration_target, NULL, (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE, NULL); if (!ret) { -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 10/11] mm: convert isolate_page() to mf_isolate_folio()

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 761d79fbad2a424a240a351b898b54eb674d3bdc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The only caller now has a folio, so pass it in and operate on it. Saves many page->folio conversions and introduces only one folio->page conversion when calling isolate_movable_page(). Link: https://lkml.kernel.org/r/20231108182809.602073-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/memory-failure.c [ Context conflicts with commit 6eedb5f34daf ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/memory-failure.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a5dfc8cc5632..607ff052158d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2648,37 +2648,37 @@ int soft_online_page(unsigned long pfn) } EXPORT_SYMBOL_GPL(soft_online_page); -static bool isolate_page(struct page *page, struct list_head *pagelist) +static bool mf_isolate_folio(struct folio *folio, struct list_head *pagelist) { bool isolated = false; - if (PageHuge(page)) { - isolated = isolate_hugetlb(page_folio(page), pagelist); + if (folio_test_hugetlb(folio)) { + isolated = isolate_hugetlb(folio, pagelist); } else { - bool lru = !__PageMovable(page); + bool lru = !__folio_test_movable(folio); if (lru) - isolated = isolate_lru_page(page); + isolated = folio_isolate_lru(folio); else - isolated = isolate_movable_page(page, + isolated = isolate_movable_page(&folio->page, ISOLATE_UNEVICTABLE); if (isolated) { - list_add(&page->lru, pagelist); + list_add(&folio->lru, pagelist); if (lru) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_lru(page)); + node_stat_add_folio(folio, NR_ISOLATED_ANON + + folio_is_file_lru(folio)); } } /* - * If we succeed to isolate the page, we grabbed another refcount on - * the page, so we can safely drop the one we got from get_any_page(). - * If we failed to isolate the page, it means that we cannot go further + * If we succeed to isolate the folio, we grabbed another refcount on + * the folio, so we can safely drop the one we got from get_any_page(). + * If we failed to isolate the folio, it means that we cannot go further * and we will return an error, so drop the reference we got from * get_any_page() as well. */ - put_page(page); + folio_put(folio); return isolated; } @@ -2732,7 +2732,7 @@ static int soft_offline_in_use_page(struct page *page) return 0; } - if (isolate_page(&folio->page, &pagelist)) { + if (mf_isolate_folio(folio, &pagelist)) { ret = migrate_pages(&pagelist, alloc_migration_target, NULL, (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE, NULL); if (!ret) { -- 2.34.1

Liu Shixin

11:46 a.m.

New subject: [PATCH OLK-6.6 11/11] mm: remove invalidate_inode_page()

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 2033c98cce666b0d125ae956613ab5111bb8d202 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAO6NS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All callers are now converted to call mapping_evict_folio(). Link: https://lkml.kernel.org/r/20231108182809.602073-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/internal.h | 1 - mm/truncate.c | 11 ++--------- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 2782a9426147..08e32fe3d481 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -344,7 +344,6 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio); bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end); long mapping_evict_folio(struct address_space *mapping, struct folio *folio); -long invalidate_inode_page(struct page *page); unsigned long mapping_try_invalidate(struct address_space *mapping, pgoff_t start, pgoff_t end, unsigned long *nr_failed); diff --git a/mm/truncate.c b/mm/truncate.c index 1d516e51e29d..52e3a703e7b2 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -294,13 +294,6 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) return remove_mapping(mapping, folio); } -long invalidate_inode_page(struct page *page) -{ - struct folio *folio = page_folio(page); - - return mapping_evict_folio(folio_mapping(folio), folio); -} - /** * truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets * @mapping: mapping to truncate @@ -559,9 +552,9 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, EXPORT_SYMBOL(invalidate_mapping_pages); /* - * This is like invalidate_inode_page(), except it ignores the page's + * This is like mapping_evict_folio(), except it ignores the folio's * refcount. We do this because invalidate_inode_pages2() needs stronger - * invalidation guarantees, and cannot afford to leave pages behind because + * invalidation guarantees, and cannot afford to leave folios behind because * shrink_page_list() has a temp ref on them, or because they're transiently * sitting in the folio_add_lru() caches. */ -- 2.34.1

patchwork bot

11:51 a.m.

反馈：您发送到kernel@openeuler.org的补丁/补丁集，已成功转换为PR！ PR链接地址： https://gitee.com/openeuler/kernel/pulls/11651 邮件列表地址：https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Q... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/11651 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Q...

286

Age (days ago)

286

Last active (days ago)

List overview

12 comments

2 participants

participants (2)

Liu Shixin
patchwork bot

[PATCH OLK-6.6 00/11] large folio: Performance and bugfix

tags

participants (2)