[PATCH OLK-6.6 00/21] swap-in support large folio

Mainly include two patchset and bugfix: 1. mm: enable large folios swap-in support 2. More swap folio conversions Andy Shevchenko (1): mm: remove unused stub for can_swapin_thp() Barry Song (4): mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: zswap: fix zswap_never_enabled() for CONFIG_ZSWAP==N mm: fix PSWPIN counter for large folios swap-in mm: add per-order mTHP swpin counters Chuanhua Han (1): mm: support large folios swap-in for sync io devices Matthew Wilcox (Oracle) (13): mm: return the folio from __read_swap_cache_async() mm: pass a folio to __swap_writepage() mm: pass a folio to swap_writepage_fs() mm: pass a folio to swap_writepage_bdev_sync() mm: pass a folio to swap_writepage_bdev_async() mm: pass a folio to swap_readpage_fs() mm: pass a folio to swap_readpage_bdev_sync() mm: pass a folio to swap_readpage_bdev_async() mm: convert swap_page_sector() to swap_folio_sector() mm: convert swap_readpage() to swap_read_folio() mm: remove page_swap_info() mm: return a folio from read_swap_cache_async() mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio Wenchao Hao (1): mm: add per-order mTHP swap-in fallback/fallback_charge counters Yosry Ahmed (1): mm: zswap: add zswap_never_enabled() Documentation/admin-guide/mm/transhuge.rst | 14 ++ include/linux/huge_mm.h | 3 + include/linux/memcontrol.h | 5 +- include/linux/mmzone.h | 1 + include/linux/swap.h | 5 +- include/linux/zswap.h | 6 + mm/huge_memory.c | 9 + mm/madvise.c | 22 +- mm/memcontrol.c | 7 +- mm/memory.c | 264 ++++++++++++++++++--- mm/page_io.c | 86 +++---- mm/shmem.c | 8 +- mm/swap.h | 23 +- mm/swap_state.c | 76 +++--- mm/swapfile.c | 16 +- mm/zswap.c | 62 +++-- 16 files changed, 423 insertions(+), 184 deletions(-) -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 96c7b0b42239e7b8987b2664b458dc74e825f760 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "More swap folio conversions". These all seem like fairly straightforward conversions to me. A lot of compound_head() calls get removed. And page_swap_info(), which is nice. This patch (of 13): Move the folio->page conversion into the callers that actually want that. Most of the callers are happier with the folio anyway. If the page_allocated boolean is set, the folio allocated is of order-0, so it is safe to pass the page directly to swap_readpage(). Link: https://lkml.kernel.org/r/20231213215842.671461-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231213215842.671461-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/zswap.h mm/swap.h mm/swap_state.c mm/zswap.c [Context conflict: 1. patchset "workload-specific and memory pressure -driven zswap writeback" is not merged. 2. patchset "mempolicy: cleanups leading to NUMA mpol without vma" is not merged. 3. patchset "mm: zswap: cleanups" is not merged] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/swap.h | 2 +- mm/swap_state.c | 50 ++++++++++++++++++++++------------------------- mm/zswap.c | 52 ++++++++++++++++++++++++------------------------- 3 files changed, 50 insertions(+), 54 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index 500f99202776..1d3ba5b45298 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -48,7 +48,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, struct swap_iocb **plug); -struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, +struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, bool *new_page_allocated); diff --git a/mm/swap_state.c b/mm/swap_state.c index bd2bc1d4cd86..9f4f52350655 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -426,13 +426,12 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, return folio; } -struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, +struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, bool *new_page_allocated) { struct swap_info_struct *si; struct folio *folio; - struct page *page; void *shadow = NULL; *new_page_allocated = false; @@ -449,10 +448,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); - if (!IS_ERR(folio)) { - page = folio_file_page(folio, swp_offset(entry)); - goto got_page; - } + if (!IS_ERR(folio)) + goto got_folio; /* * Just skip read ahead for unused swap slot. @@ -466,7 +463,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, goto fail_put_swap; /* - * Get a new page to read into from swap. Allocate it now, + * Get a new folio to read into from swap. Allocate it now, * before marking swap_map SWAP_HAS_CACHE, when -EEXIST will * cause any racers to loop around until we add it to cache. */ @@ -490,13 +487,13 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * stumble across a swap_map entry whose SWAP_HAS_CACHE * has not yet been cleared. Or race against another * __read_swap_cache_async(), which has set SWAP_HAS_CACHE - * in swap_map, but not yet added its page to swap cache. + * in swap_map, but not yet added its folio to swap cache. */ schedule_timeout_uninterruptible(1); } /* - * The swap entry is ours to swap in. Prepare the new page. + * The swap entry is ours to swap in. Prepare the new folio. */ __folio_set_locked(folio); @@ -517,10 +514,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* Caller will initiate read into locked folio */ folio_add_lru(folio); *new_page_allocated = true; - page = &folio->page; -got_page: +got_folio: put_swap_device(si); - return page; + return folio; fail_unlock: put_swap_folio(folio, entry); @@ -546,13 +542,13 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, unsigned long addr, struct swap_iocb **plug) { bool page_was_allocated; - struct page *retpage = __read_swap_cache_async(entry, gfp_mask, + struct folio *folio = __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, false, plug); + swap_readpage(&folio->page, false, plug); - return retpage; + return folio_file_page(folio, swp_offset(entry)); } static unsigned int __swapin_nr_pages(unsigned long prev_offset, @@ -637,7 +633,7 @@ static unsigned long swapin_nr_pages(unsigned long offset) struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { - struct page *page; + struct folio *folio; unsigned long entry_offset = swp_offset(entry); unsigned long offset = entry_offset; unsigned long start_offset, end_offset; @@ -664,19 +660,19 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ - page = __read_swap_cache_async( + folio = __read_swap_cache_async( swp_entry(swp_type(entry), offset), gfp_mask, vma, addr, &page_allocated); - if (!page) + if (!folio) continue; if (page_allocated) { - swap_readpage(page, false, &splug); + swap_readpage(&folio->page, false, &splug); if (offset != entry_offset) { - SetPageReadahead(page); + folio_set_readahead(folio); count_vm_event(SWAP_RA); } } - put_page(page); + folio_put(folio); } blk_finish_plug(&plug); swap_read_unplug(splug); @@ -800,7 +796,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct blk_plug plug; struct swap_iocb *splug = NULL; struct vm_area_struct *vma = vmf->vma; - struct page *page; + struct folio *folio; pte_t *pte = NULL, pentry; unsigned long addr; swp_entry_t entry; @@ -831,18 +827,18 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, continue; pte_unmap(pte); pte = NULL; - page = __read_swap_cache_async(entry, gfp_mask, vma, + folio = __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_allocated); - if (!page) + if (!folio) continue; if (page_allocated) { - swap_readpage(page, false, &splug); + swap_readpage(&folio->page, false, &splug); if (i != ra_info.offset) { - SetPageReadahead(page); + folio_set_readahead(folio); count_vm_event(SWAP_RA); } } - put_page(page); + folio_put(folio); } if (pte) pte_unmap(pte); diff --git a/mm/zswap.c b/mm/zswap.c index 69681b9173fd..a2e3141bc985 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1041,14 +1041,14 @@ static int zswap_enabled_param_set(const char *val, * writeback code **********************************/ /* - * Attempts to free an entry by adding a page to the swap cache, - * decompressing the entry data into the page, and issuing a - * bio write to write the page back to the swap device. + * Attempts to free an entry by adding a folio to the swap cache, + * decompressing the entry data into the folio, and issuing a + * bio write to write the folio back to the swap device. * - * This can be thought of as a "resumed writeback" of the page + * This can be thought of as a "resumed writeback" of the folio * to the swap device. We are basically resuming the same swap * writeback path that was intercepted with the zswap_store() - * in the first place. After the page has been decompressed into + * in the first place. After the folio has been decompressed into * the swap cache, the compressed version stored by zswap can be * freed. */ @@ -1056,11 +1056,11 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_tree *tree) { swp_entry_t swpentry = entry->swpentry; - struct page *page; + struct folio *folio; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; struct zpool *pool = zswap_find_zpool(entry); - bool page_was_allocated; + bool folio_was_allocated; u8 *src, *tmp = NULL; unsigned int dlen; int ret; @@ -1074,34 +1074,34 @@ static int zswap_writeback_entry(struct zswap_entry *entry, return -ENOMEM; } - /* try to allocate swap cache page */ - page = __read_swap_cache_async(swpentry, GFP_KERNEL, NULL, 0, - &page_was_allocated); - if (!page) { + /* try to allocate swap cache folio */ + folio = __read_swap_cache_async(swpentry, GFP_KERNEL, NULL, 0, + &folio_was_allocated); + if (!folio) { ret = -ENOMEM; goto fail; } - /* Found an existing page, we raced with load/swapin */ - if (!page_was_allocated) { - put_page(page); + /* Found an existing folio, we raced with load/swapin */ + if (!folio_was_allocated) { + folio_put(folio); ret = -EEXIST; goto fail; } /* - * Page is locked, and the swapcache is now secured against + * folio is locked, and the swapcache is now secured against * concurrent swapping to and from the slot. Verify that the * swap entry hasn't been invalidated and recycled behind our * backs (our zswap_entry reference doesn't prevent that), to - * avoid overwriting a new swap page with old compressed data. + * avoid overwriting a new swap folio with old compressed data. */ spin_lock(&tree->lock); if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) { spin_unlock(&tree->lock); - delete_from_swap_cache(page_folio(page)); - unlock_page(page); - put_page(page); + delete_from_swap_cache(folio); + folio_unlock(folio); + folio_put(folio); ret = -ENOMEM; goto fail; } @@ -1121,7 +1121,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); - sg_set_page(&output, page, PAGE_SIZE, 0); + sg_set_page(&output, &folio->page, PAGE_SIZE, 0); acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen); ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); dlen = acomp_ctx->req->dlen; @@ -1135,15 +1135,15 @@ static int zswap_writeback_entry(struct zswap_entry *entry, BUG_ON(ret); BUG_ON(dlen != PAGE_SIZE); - /* page is up to date */ - SetPageUptodate(page); + /* folio is up to date */ + folio_mark_uptodate(folio); /* move it to the tail of the inactive list after end_writeback */ - SetPageReclaim(page); + folio_set_reclaim(folio); /* start writeback */ - __swap_writepage(page, &wbc); - put_page(page); + __swap_writepage(&folio->page, &wbc); + folio_put(folio); zswap_written_back_pages++; return ret; @@ -1294,7 +1294,7 @@ bool zswap_store(struct folio *folio) dst = acomp_ctx->dstmem; sg_init_table(&input, 1); - sg_set_page(&input, page, PAGE_SIZE, 0); + sg_set_page(&input, &folio->page, PAGE_SIZE, 0); /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */ sg_init_one(&output, dst, PAGE_SIZE * 2); -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit b99b4e0d9d7f29b428bacd7a61188b2abf340c1e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Both callers now have a folio, so pass that in instead of the page. Removes a few hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_io.c mm/zswap.c [Context conflict: 1. patchset "workload-specific and memory pressure- driven zswap writeback" is not merged. 2. patch "mm: ignore data-race in __swap_writepage" is merged first.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 14 +++++++------- mm/swap.h | 2 +- mm/zswap.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 1887cd2093e3..0723ffe27aa7 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -201,7 +201,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) folio_end_writeback(folio); return 0; } - __swap_writepage(&folio->page, wbc); + __swap_writepage(folio, wbc); return 0; } @@ -369,27 +369,27 @@ static void swap_writepage_bdev_async(struct page *page, submit_bio(bio); } -void __swap_writepage(struct page *page, struct writeback_control *wbc) +void __swap_writepage(struct folio *folio, struct writeback_control *wbc) { - struct swap_info_struct *sis = page_swap_info(page); + struct swap_info_struct *sis = swp_swap_info(folio->swap); - VM_BUG_ON_PAGE(!PageSwapCache(page), page); + VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio); /* * ->flags can be updated non-atomicially (scan_swap_map_slots), * but that will never affect SWP_FS_OPS, so the data_race * is safe. */ if (data_race(sis->flags & SWP_FS_OPS)) - swap_writepage_fs(page, wbc); + swap_writepage_fs(&folio->page, wbc); /* * ->flags can be updated non-atomicially (scan_swap_map_slots), * but that will never affect SWP_SYNCHRONOUS_IO, so the data_race * is safe. */ else if (data_race(sis->flags & SWP_SYNCHRONOUS_IO)) - swap_writepage_bdev_sync(page, wbc, sis); + swap_writepage_bdev_sync(&folio->page, wbc, sis); else - swap_writepage_bdev_async(page, wbc, sis); + swap_writepage_bdev_async(&folio->page, wbc, sis); } void swap_write_unplug(struct swap_iocb *sio) diff --git a/mm/swap.h b/mm/swap.h index 1d3ba5b45298..685882e2edec 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -17,7 +17,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug) } void swap_write_unplug(struct swap_iocb *sio); int swap_writepage(struct page *page, struct writeback_control *wbc); -void __swap_writepage(struct page *page, struct writeback_control *wbc); +void __swap_writepage(struct folio *folio, struct writeback_control *wbc); /* linux/mm/swap_state.c */ /* One swap address space for each 64M swap space */ diff --git a/mm/zswap.c b/mm/zswap.c index a2e3141bc985..c7e558167a3e 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1142,7 +1142,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, folio_set_reclaim(folio); /* start writeback */ - __swap_writepage(&folio->page, &wbc); + __swap_writepage(folio, &wbc); folio_put(folio); zswap_written_back_pages++; -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit bfcd44d5f816b442feb27f59e9312ce38ac4b3cf category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Saves several calls to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_io.c [Context conflict: patch "mm: ignore data-race in __swap_writepage" is merged first.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 0723ffe27aa7..84e5342cf78e 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -289,16 +289,16 @@ static void sio_write_complete(struct kiocb *iocb, long ret) mempool_free(sio, sio_pool); } -static void swap_writepage_fs(struct page *page, struct writeback_control *wbc) +static void swap_writepage_fs(struct folio *folio, struct writeback_control *wbc) { struct swap_iocb *sio = NULL; - struct swap_info_struct *sis = page_swap_info(page); + struct swap_info_struct *sis = swp_swap_info(folio->swap); struct file *swap_file = sis->swap_file; - loff_t pos = page_file_offset(page); + loff_t pos = folio_file_pos(folio); - count_swpout_vm_event(page_folio(page)); - set_page_writeback(page); - unlock_page(page); + count_swpout_vm_event(folio); + folio_start_writeback(folio); + folio_unlock(folio); if (wbc->swap_plug) sio = *wbc->swap_plug; if (sio) { @@ -316,8 +316,8 @@ static void swap_writepage_fs(struct page *page, struct writeback_control *wbc) sio->pages = 0; sio->len = 0; } - bvec_set_page(&sio->bvec[sio->pages], page, thp_size(page), 0); - sio->len += thp_size(page); + bvec_set_folio(&sio->bvec[sio->pages], folio, folio_size(folio), 0); + sio->len += folio_size(folio); sio->pages += 1; if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->swap_plug) { swap_write_unplug(sio); @@ -380,7 +380,7 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc) * is safe. */ if (data_race(sis->flags & SWP_FS_OPS)) - swap_writepage_fs(&folio->page, wbc); + swap_writepage_fs(folio, wbc); /* * ->flags can be updated non-atomicially (scan_swap_map_slots), * but that will never affect SWP_SYNCHRONOUS_IO, so the data_race -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 6de62c7bc4bc3444ce63490640efae965b637fe6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Saves a call to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_io.c [Context conflict: patch "mm: ignore data-race in __swap_writepage" is merged first.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 84e5342cf78e..21db4265f1e0 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -327,17 +327,16 @@ static void swap_writepage_fs(struct folio *folio, struct writeback_control *wbc *wbc->swap_plug = sio; } -static void swap_writepage_bdev_sync(struct page *page, +static void swap_writepage_bdev_sync(struct folio *folio, struct writeback_control *wbc, struct swap_info_struct *sis) { struct bio_vec bv; struct bio bio; - struct folio *folio = page_folio(page); bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc)); - bio.bi_iter.bi_sector = swap_page_sector(page); - __bio_add_page(&bio, page, thp_size(page), 0); + bio.bi_iter.bi_sector = swap_page_sector(&folio->page); + bio_add_folio_nofail(&bio, folio, folio_size(folio), 0); bio_associate_blkg_from_page(&bio, folio); count_swpout_vm_event(folio); @@ -387,7 +386,7 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc) * is safe. */ else if (data_race(sis->flags & SWP_SYNCHRONOUS_IO)) - swap_writepage_bdev_sync(&folio->page, wbc, sis); + swap_writepage_bdev_sync(folio, wbc, sis); else swap_writepage_bdev_async(&folio->page, wbc, sis); } -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit ee1b1d9b46f206ffdef5ebe4086d925a5c43805b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Saves a call to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_io.c [Context conflict: patch "mm: ignore data-race in __swap_writepage" is merged first.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 21db4265f1e0..38a995474067 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,18 +348,17 @@ static void swap_writepage_bdev_sync(struct folio *folio, __end_swap_bio_write(&bio); } -static void swap_writepage_bdev_async(struct page *page, +static void swap_writepage_bdev_async(struct folio *folio, struct writeback_control *wbc, struct swap_info_struct *sis) { struct bio *bio; - struct folio *folio = page_folio(page); bio = bio_alloc(sis->bdev, 1, REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc), GFP_NOIO); - bio->bi_iter.bi_sector = swap_page_sector(page); + bio->bi_iter.bi_sector = swap_page_sector(&folio->page); bio->bi_end_io = end_swap_bio_write; - __bio_add_page(bio, page, thp_size(page), 0); + bio_add_folio_nofail(bio, folio, folio_size(folio), 0); bio_associate_blkg_from_page(bio, folio); count_swpout_vm_event(folio); @@ -388,7 +387,7 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc) else if (data_race(sis->flags & SWP_SYNCHRONOUS_IO)) swap_writepage_bdev_sync(folio, wbc, sis); else - swap_writepage_bdev_async(&folio->page, wbc, sis); + swap_writepage_bdev_async(folio, wbc, sis); } void swap_write_unplug(struct swap_iocb *sio) -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 64a24e55e3f462836ee618be480bd1b0b018e557 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Saves a call to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 38a995474067..424620a2bc9c 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -426,12 +426,11 @@ static void sio_read_complete(struct kiocb *iocb, long ret) mempool_free(sio, sio_pool); } -static void swap_readpage_fs(struct page *page, - struct swap_iocb **plug) +static void swap_readpage_fs(struct folio *folio, struct swap_iocb **plug) { - struct swap_info_struct *sis = page_swap_info(page); + struct swap_info_struct *sis = swp_swap_info(folio->swap); struct swap_iocb *sio = NULL; - loff_t pos = page_file_offset(page); + loff_t pos = folio_file_pos(folio); if (plug) sio = *plug; @@ -450,8 +449,8 @@ static void swap_readpage_fs(struct page *page, sio->pages = 0; sio->len = 0; } - bvec_set_page(&sio->bvec[sio->pages], page, thp_size(page), 0); - sio->len += thp_size(page); + bvec_set_folio(&sio->bvec[sio->pages], folio, folio_size(folio), 0); + sio->len += folio_size(folio); sio->pages += 1; if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) { swap_read_unplug(sio); @@ -522,7 +521,7 @@ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug) folio_mark_uptodate(folio); folio_unlock(folio); } else if (data_race(sis->flags & SWP_FS_OPS)) { - swap_readpage_fs(page, plug); + swap_readpage_fs(folio, plug); } else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) { swap_readpage_bdev_sync(page, sis); } else { -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 2c184d821eec55f9ea3c98c67dc2b0c5ec827c87 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Make it plain that this takes the head page (which before this point was just an assumption, but is now enforced by the compiler). Link: https://lkml.kernel.org/r/20231213215842.671461-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 424620a2bc9c..11caabfafe42 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -460,15 +460,15 @@ static void swap_readpage_fs(struct folio *folio, struct swap_iocb **plug) *plug = sio; } -static void swap_readpage_bdev_sync(struct page *page, +static void swap_readpage_bdev_sync(struct folio *folio, struct swap_info_struct *sis) { struct bio_vec bv; struct bio bio; bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_READ); - bio.bi_iter.bi_sector = swap_page_sector(page); - __bio_add_page(&bio, page, thp_size(page), 0); + bio.bi_iter.bi_sector = swap_page_sector(&folio->page); + bio_add_folio_nofail(&bio, folio, folio_size(folio), 0); /* * Keep this task valid during swap readpage because the oom killer may * attempt to access it in the page fault retry time check. @@ -523,7 +523,7 @@ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug) } else if (data_race(sis->flags & SWP_FS_OPS)) { swap_readpage_fs(folio, plug); } else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) { - swap_readpage_bdev_sync(page, sis); + swap_readpage_bdev_sync(folio, sis); } else { swap_readpage_bdev_async(page, sis); } -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 3c3ebd82e0d1e77df7a3906e79b42d8f0793bdd7 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Make it plain that this takes the head page (which before this point was just an assumption, but is now enforced by the compiler). Link: https://lkml.kernel.org/r/20231213215842.671461-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 11caabfafe42..cde9b320be59 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -480,15 +480,15 @@ static void swap_readpage_bdev_sync(struct folio *folio, put_task_struct(current); } -static void swap_readpage_bdev_async(struct page *page, +static void swap_readpage_bdev_async(struct folio *folio, struct swap_info_struct *sis) { struct bio *bio; bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL); - bio->bi_iter.bi_sector = swap_page_sector(page); + bio->bi_iter.bi_sector = swap_page_sector(&folio->page); bio->bi_end_io = end_swap_bio_read; - __bio_add_page(bio, page, thp_size(page), 0); + bio_add_folio_nofail(bio, folio, folio_size(folio), 0); count_vm_event(PSWPIN); submit_bio(bio); } @@ -525,7 +525,7 @@ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug) } else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) { swap_readpage_bdev_sync(folio, sis); } else { - swap_readpage_bdev_async(page, sis); + swap_readpage_bdev_async(folio, sis); } if (workingset) { -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 3a61e6f668120ee2c7840b91891c858d575d07e2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All callers have a folio, so pass it in. Saves a couple of calls to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- include/linux/swap.h | 2 +- mm/page_io.c | 8 ++++---- mm/swapfile.c | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index aa2261b5c772..c5dc488edd48 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -517,7 +517,7 @@ struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); extern void exit_swap_address_space(unsigned int type); extern struct swap_info_struct *get_swap_device(swp_entry_t entry); -sector_t swap_page_sector(struct page *page); +sector_t swap_folio_sector(struct folio *folio); static inline void put_swap_device(struct swap_info_struct *si) { diff --git a/mm/page_io.c b/mm/page_io.c index cde9b320be59..bd1add722e30 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -335,7 +335,7 @@ static void swap_writepage_bdev_sync(struct folio *folio, bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc)); - bio.bi_iter.bi_sector = swap_page_sector(&folio->page); + bio.bi_iter.bi_sector = swap_folio_sector(folio); bio_add_folio_nofail(&bio, folio, folio_size(folio), 0); bio_associate_blkg_from_page(&bio, folio); @@ -356,7 +356,7 @@ static void swap_writepage_bdev_async(struct folio *folio, bio = bio_alloc(sis->bdev, 1, REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc), GFP_NOIO); - bio->bi_iter.bi_sector = swap_page_sector(&folio->page); + bio->bi_iter.bi_sector = swap_folio_sector(folio); bio->bi_end_io = end_swap_bio_write; bio_add_folio_nofail(bio, folio, folio_size(folio), 0); @@ -467,7 +467,7 @@ static void swap_readpage_bdev_sync(struct folio *folio, struct bio bio; bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_READ); - bio.bi_iter.bi_sector = swap_page_sector(&folio->page); + bio.bi_iter.bi_sector = swap_folio_sector(folio); bio_add_folio_nofail(&bio, folio, folio_size(folio), 0); /* * Keep this task valid during swap readpage because the oom killer may @@ -486,7 +486,7 @@ static void swap_readpage_bdev_async(struct folio *folio, struct bio *bio; bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL); - bio->bi_iter.bi_sector = swap_page_sector(&folio->page); + bio->bi_iter.bi_sector = swap_folio_sector(folio); bio->bi_end_io = end_swap_bio_read; bio_add_folio_nofail(bio, folio, folio_size(folio), 0); count_vm_event(PSWPIN); diff --git a/mm/swapfile.c b/mm/swapfile.c index 3af5b6ebb241..9cc8e184ffd9 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -232,14 +232,14 @@ offset_to_swap_extent(struct swap_info_struct *sis, unsigned long offset) BUG(); } -sector_t swap_page_sector(struct page *page) +sector_t swap_folio_sector(struct folio *folio) { - struct swap_info_struct *sis = page_swap_info(page); + struct swap_info_struct *sis = swp_swap_info(folio->swap); struct swap_extent *se; sector_t sector; pgoff_t offset; - offset = __page_file_index(page); + offset = swp_offset(folio->swap); se = offset_to_swap_extent(sis, offset); sector = se->start_block + (offset - se->start_page); return sector << (PAGE_SHIFT - 9); -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit c9bdf768dd9319d2d80a334646e2c8116af9e430 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All callers have a folio, so pass it in, saving two calls to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/swap_state.c mm/swap.h [Context conflict: patchset "mempolicy: cleanups leading to NUMA mpol without vma" is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/memory.c | 4 ++-- mm/page_io.c | 18 +++++++++--------- mm/swap.h | 5 +++-- mm/swap_state.c | 8 ++++---- mm/swapfile.c | 2 +- 5 files changed, 19 insertions(+), 18 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index e8daa5a3d369..e7ef11ca4dbe 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4096,9 +4096,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_add_lru(folio); - /* To provide entry to swap_readpage() */ + /* To provide entry to swap_read_folio() */ folio->swap = entry; - swap_readpage(page, true, NULL); + swap_read_folio(folio, true, NULL); folio->private = NULL; } } else { diff --git a/mm/page_io.c b/mm/page_io.c index bd1add722e30..acf6bee86339 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -426,7 +426,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret) mempool_free(sio, sio_pool); } -static void swap_readpage_fs(struct folio *folio, struct swap_iocb **plug) +static void swap_read_folio_fs(struct folio *folio, struct swap_iocb **plug) { struct swap_info_struct *sis = swp_swap_info(folio->swap); struct swap_iocb *sio = NULL; @@ -460,7 +460,7 @@ static void swap_readpage_fs(struct folio *folio, struct swap_iocb **plug) *plug = sio; } -static void swap_readpage_bdev_sync(struct folio *folio, +static void swap_read_folio_bdev_sync(struct folio *folio, struct swap_info_struct *sis) { struct bio_vec bv; @@ -480,7 +480,7 @@ static void swap_readpage_bdev_sync(struct folio *folio, put_task_struct(current); } -static void swap_readpage_bdev_async(struct folio *folio, +static void swap_read_folio_bdev_async(struct folio *folio, struct swap_info_struct *sis) { struct bio *bio; @@ -493,10 +493,10 @@ static void swap_readpage_bdev_async(struct folio *folio, submit_bio(bio); } -void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug) +void swap_read_folio(struct folio *folio, bool synchronous, + struct swap_iocb **plug) { - struct folio *folio = page_folio(page); - struct swap_info_struct *sis = page_swap_info(page); + struct swap_info_struct *sis = swp_swap_info(folio->swap); bool workingset = folio_test_workingset(folio); unsigned long pflags; bool in_thrashing; @@ -521,11 +521,11 @@ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug) folio_mark_uptodate(folio); folio_unlock(folio); } else if (data_race(sis->flags & SWP_FS_OPS)) { - swap_readpage_fs(folio, plug); + swap_read_folio_fs(folio, plug); } else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) { - swap_readpage_bdev_sync(folio, sis); + swap_read_folio_bdev_sync(folio, sis); } else { - swap_readpage_bdev_async(folio, sis); + swap_read_folio_bdev_async(folio, sis); } if (workingset) { diff --git a/mm/swap.h b/mm/swap.h index 685882e2edec..0cf1df8887d6 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -8,7 +8,8 @@ /* linux/mm/page_io.c */ int sio_pool_init(void); struct swap_iocb; -void swap_readpage(struct page *page, bool do_poll, struct swap_iocb **plug); +void swap_read_folio(struct folio *folio, bool do_poll, + struct swap_iocb **plug); void __swap_read_unplug(struct swap_iocb *plug); static inline void swap_read_unplug(struct swap_iocb *plug) { @@ -63,7 +64,7 @@ static inline unsigned int folio_swap_flags(struct folio *folio) } #else /* CONFIG_SWAP */ struct swap_iocb; -static inline void swap_readpage(struct page *page, bool do_poll, +static inline void swap_read_folio(struct folio *folio, bool do_poll, struct swap_iocb **plug) { } diff --git a/mm/swap_state.c b/mm/swap_state.c index 9f4f52350655..d0fea9932d4b 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -534,7 +534,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * the swap entry is no longer in use. * * get/put_swap_device() aren't needed to call this function, because - * __read_swap_cache_async() call them and swap_readpage() holds the + * __read_swap_cache_async() call them and swap_read_folio() holds the * swap cache folio lock. */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, @@ -546,7 +546,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(&folio->page, false, plug); + swap_read_folio(folio, false, plug); return folio_file_page(folio, swp_offset(entry)); } @@ -666,7 +666,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!folio) continue; if (page_allocated) { - swap_readpage(&folio->page, false, &splug); + swap_read_folio(folio, false, &splug); if (offset != entry_offset) { folio_set_readahead(folio); count_vm_event(SWAP_RA); @@ -832,7 +832,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!folio) continue; if (page_allocated) { - swap_readpage(&folio->page, false, &splug); + swap_read_folio(folio, false, &splug); if (i != ra_info.offset) { folio_set_readahead(folio); count_vm_event(SWAP_RA); diff --git a/mm/swapfile.c b/mm/swapfile.c index 9cc8e184ffd9..7f4c5f59c14f 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2423,7 +2423,7 @@ EXPORT_SYMBOL_GPL(add_swap_extent); /* * A `swap extent' is a simple thing which maps a contiguous range of pages * onto a contiguous range of disk blocks. A rbtree of swap extents is - * built at swapon time and is then used at swap_writepage/swap_readpage + * built at swapon time and is then used at swap_writepage/swap_read_folio * time for locating where on disk a page belongs. * * If the swapfile is an S_ISBLK block device, a single extent is installed. -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 69fe7d67cb0c6eeab3d4c9a3bf950f9d12af4719 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- It's more efficient to get the swap_info_struct by calling swp_swap_info() directly. Link: https://lkml.kernel.org/r/20231213215842.671461-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/swapfile.c [Context conflict.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- include/linux/swap.h | 3 +-- mm/swap.h | 2 +- mm/swapfile.c | 8 +------- 3 files changed, 3 insertions(+), 10 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c5dc488edd48..0b7ebe9b3e2c 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -511,8 +511,7 @@ extern sector_t swapdev_block(int, pgoff_t); extern int __swap_count(swp_entry_t entry); extern int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); -extern struct swap_info_struct *page_swap_info(struct page *); -extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); +struct swap_info_struct *swp_swap_info(swp_entry_t entry); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); extern void exit_swap_address_space(unsigned int type); diff --git a/mm/swap.h b/mm/swap.h index 0cf1df8887d6..b6c6a3b46ff3 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -60,7 +60,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, static inline unsigned int folio_swap_flags(struct folio *folio) { - return page_swap_info(&folio->page)->flags; + return swp_swap_info(folio->swap)->flags; } #else /* CONFIG_SWAP */ struct swap_iocb; diff --git a/mm/swapfile.c b/mm/swapfile.c index 7f4c5f59c14f..5e77c6417ec6 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3595,18 +3595,12 @@ struct swap_info_struct *swp_swap_info(swp_entry_t entry) return swap_type_to_swap_info(swp_type(entry)); } -struct swap_info_struct *page_swap_info(struct page *page) -{ - swp_entry_t entry = page_swap_entry(page); - return swp_swap_info(entry); -} - /* * out-of-line methods to avoid include hell. */ struct address_space *swapcache_mapping(struct folio *folio) { - return page_swap_info(&folio->page)->swap_file->f_mapping; + return swp_swap_info(folio->swap)->swap_file->f_mapping; } EXPORT_SYMBOL_GPL(swapcache_mapping); -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 6e03492e9d288d9ce886064289e2768da5d7d967 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The only two callers simply call put_page() on the page returned, so they're happier calling folio_put(). Saves two calls to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/swap.h mm/swap_state.c [Context conflict: 1. patchset "mempolicy: cleanups leading to NUMA mpol without vma" is not merged. 2. patchset "workload-specific and memory pressure-driven zswap writeback" is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/madvise.c | 22 +++++++++++----------- mm/swap.h | 8 +++----- mm/swap_state.c | 16 ++++++++++------ 3 files changed, 24 insertions(+), 22 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 4fd5eb465bad..0ad321350cfe 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -203,7 +203,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, for (addr = start; addr < end; addr += PAGE_SIZE) { pte_t pte; swp_entry_t entry; - struct page *page; + struct folio *folio; if (!ptep++) { ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -221,10 +221,10 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_unmap_unlock(ptep, ptl); ptep = NULL; - page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, + folio = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, addr, &splug); - if (page) - put_page(page); + if (folio) + folio_put(folio); } if (ptep) @@ -246,17 +246,17 @@ static void shmem_swapin_range(struct vm_area_struct *vma, { XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); pgoff_t end_index = linear_page_index(vma, end) - 1; - struct page *page; + struct folio *folio; struct swap_iocb *splug = NULL; rcu_read_lock(); - xas_for_each(&xas, page, end_index) { + xas_for_each(&xas, folio, end_index) { unsigned long addr; swp_entry_t entry; - if (!xa_is_value(page)) + if (!xa_is_value(folio)) continue; - entry = radix_to_swp_entry(page); + entry = radix_to_swp_entry(folio); /* There might be swapin error entries in shmem mapping. */ if (non_swap_entry(entry)) continue; @@ -266,10 +266,10 @@ static void shmem_swapin_range(struct vm_area_struct *vma, xas_pause(&xas); rcu_read_unlock(); - page = read_swap_cache_async(entry, mapping_gfp_mask(mapping), + folio = read_swap_cache_async(entry, mapping_gfp_mask(mapping), vma, addr, &splug); - if (page) - put_page(page); + if (folio) + folio_put(folio); rcu_read_lock(); } diff --git a/mm/swap.h b/mm/swap.h index b6c6a3b46ff3..cf01b7cc2c60 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -44,11 +44,9 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr); struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index); - -struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, - unsigned long addr, - struct swap_iocb **plug); +struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, + struct vm_area_struct *vma, unsigned long addr, + struct swap_iocb **plug); struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/swap_state.c b/mm/swap_state.c index d0fea9932d4b..ce8147cc4590 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -537,9 +537,9 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * __read_swap_cache_async() call them and swap_read_folio() holds the * swap cache folio lock. */ -struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, - unsigned long addr, struct swap_iocb **plug) +struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, + struct vm_area_struct *vma, unsigned long addr, + struct swap_iocb **plug) { bool page_was_allocated; struct folio *folio = __read_swap_cache_async(entry, gfp_mask, @@ -548,7 +548,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (page_was_allocated) swap_read_folio(folio, false, plug); - return folio_file_page(folio, swp_offset(entry)); + return folio; } static unsigned int __swapin_nr_pages(unsigned long prev_offset, @@ -680,7 +680,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, lru_add_drain(); /* Push any new pages onto the LRU now */ skip: /* The page was likely read above, so no need for plugging here */ - return read_swap_cache_async(entry, gfp_mask, vma, addr, NULL); + folio = read_swap_cache_async(entry, gfp_mask, vma, addr, NULL); + + return folio_file_page(folio, swp_offset(entry)); } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -847,8 +849,10 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, lru_add_drain(); skip: /* The page was likely read above, so no need for plugging here */ - return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, + folio = read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, NULL); + + return folio_file_page(folio, swp_offset(entry)); } /** -- 2.25.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit a4575c4138db887bd27dc7f87cf7cfb0224c6f5e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- shmem_swapin_cluster() immediately converts the page back to a folio, and swapin_readahead() may as well call folio_file_page() once instead of having each function call it. [willy@infradead.org: avoid NULL pointer deref] Link: https://lkml.kernel.org/r/ZYI7OcVlM1voKfBl@casper.infradead.org Link: https://lkml.kernel.org/r/20231213215842.671461-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/shmem.c mm/swap.h mm/swap_state.c [Context conflict: patchset "mempolicy: cleanups leading to NUMA mpol without vma" is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/shmem.c | 8 +++----- mm/swap.h | 4 ++-- mm/swap_state.c | 24 +++++++++++++----------- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 3f5ba0827ecf..6fdc7144ca91 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1719,18 +1719,16 @@ static struct folio *shmem_swapin(swp_entry_t swap, gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { struct vm_area_struct pvma; - struct page *page; + struct folio *folio; struct vm_fault vmf = { .vma = &pvma, }; shmem_pseudo_vma_init(&pvma, info, index); - page = swap_cluster_readahead(swap, gfp, &vmf); + folio = swap_cluster_readahead(swap, gfp, &vmf); shmem_pseudo_vma_destroy(&pvma); - if (!page) - return NULL; - return page_folio(page); + return folio; } /* diff --git a/mm/swap.h b/mm/swap.h index cf01b7cc2c60..af4dc6921b90 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -51,7 +51,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, bool *new_page_allocated); -struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, +struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); @@ -79,7 +79,7 @@ static inline void show_swap_cache_info(void) { } -static inline struct page *swap_cluster_readahead(swp_entry_t entry, +static inline struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { return NULL; diff --git a/mm/swap_state.c b/mm/swap_state.c index ce8147cc4590..a6136403a812 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -618,7 +618,7 @@ static unsigned long swapin_nr_pages(unsigned long offset) * @gfp_mask: memory allocation flags * @vmf: fault information * - * Returns the struct page for entry and addr, after queueing swapin. + * Returns the struct folio for entry and addr, after queueing swapin. * * Primitive swap readahead code. We simply read an aligned block of * (1 << page_cluster) entries in the swap area. This method is chosen @@ -630,7 +630,7 @@ static unsigned long swapin_nr_pages(unsigned long offset) * * Caller must hold read mmap_lock if vmf->vma is not NULL. */ -struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, +struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { struct folio *folio; @@ -680,9 +680,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, lru_add_drain(); /* Push any new pages onto the LRU now */ skip: /* The page was likely read above, so no need for plugging here */ - folio = read_swap_cache_async(entry, gfp_mask, vma, addr, NULL); - - return folio_file_page(folio, swp_offset(entry)); + return read_swap_cache_async(entry, gfp_mask, vma, addr, NULL); } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -784,7 +782,7 @@ static void swap_ra_info(struct vm_fault *vmf, * @gfp_mask: memory allocation flags * @vmf: fault information * - * Returns the struct page for entry and addr, after queueing swapin. + * Returns the struct folio for entry and addr, after queueing swapin. * * Primitive swap readahead code. We simply read in a few pages whose * virtual addresses are around the fault address in the same vma. @@ -792,7 +790,7 @@ static void swap_ra_info(struct vm_fault *vmf, * Caller must hold read mmap_lock if vmf->vma is not NULL. * */ -static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, +static struct folio *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct vm_fault *vmf) { struct blk_plug plug; @@ -849,10 +847,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, lru_add_drain(); skip: /* The page was likely read above, so no need for plugging here */ - folio = read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, + return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, NULL); - - return folio_file_page(folio, swp_offset(entry)); } /** @@ -870,9 +866,15 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { - return swap_use_vma_readahead() ? + struct folio *folio; + + folio = swap_use_vma_readahead() ? swap_vma_readahead(entry, gfp_mask, vmf) : swap_cluster_readahead(entry, gfp_mask, vmf); + + if (!folio) + return NULL; + return folio_file_page(folio, swp_offset(entry)); } #ifdef CONFIG_SYSFS -- 2.25.1

From: Yosry Ahmed <yosryahmed@google.com> mainline inclusion from mainline-v6.11-rc1 commit 2d4d2b1cfb85cc07f6d5619acb882d8b11e55cf4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Add zswap_never_enabled() to skip the xarray lookup in zswap_load() if zswap was never enabled on the system. It is implemented using static branches for efficiency, as enabling zswap should be a rare event. This could shave some cycles off zswap_load() when CONFIG_ZSWAP is used but zswap is never enabled. However, the real motivation behind this patch is two-fold: - Incoming large folio swapin work will need to fallback to order-0 folios if zswap was ever enabled, because any part of the folio could be in zswap, until proper handling of large folios with zswap is added. - A warning and recovery attempt will be added in a following change in case the above was not done incorrectly. Zswap will fail the read if the folio is large and it was ever enabled. Expose zswap_never_enabled() in the header for the swapin work to use it later. [yosryahmed@google.com: expose zswap_never_enabled() in the header] Link: https://lkml.kernel.org/r/Zmjf0Dr8s9xSW41X@google.com Link: https://lkml.kernel.org/r/20240611024516.1375191-2-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/zswap.h mm/zswap.c [Context conflicts: patch 25cd241408a2 "mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices" and 5b297f70bb26 "mm: zswap: inline and remove zswap_entry_find_get()" are not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- include/linux/zswap.h | 6 ++++++ mm/zswap.c | 10 ++++++++++ 2 files changed, 16 insertions(+) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 2a60ce39cfde..4683a8fe0e5b 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -15,6 +15,7 @@ bool zswap_load(struct folio *folio); void zswap_invalidate(int type, pgoff_t offset); void zswap_swapon(int type); void zswap_swapoff(int type); +bool zswap_never_enabled(void); #else @@ -32,6 +33,11 @@ static inline void zswap_invalidate(int type, pgoff_t offset) {} static inline void zswap_swapon(int type) {} static inline void zswap_swapoff(int type) {} +static inline bool zswap_never_enabled(void) +{ + return false; +} + #endif #endif /* _LINUX_ZSWAP_H */ diff --git a/mm/zswap.c b/mm/zswap.c index c7e558167a3e..491487925192 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -84,6 +84,7 @@ static bool zswap_pool_reached_full; static int zswap_setup(void); /* Enable/disable zswap */ +static DEFINE_STATIC_KEY_MAYBE(CONFIG_ZSWAP_DEFAULT_ON, zswap_ever_enabled); static bool zswap_enabled = IS_ENABLED(CONFIG_ZSWAP_DEFAULT_ON); static int zswap_enabled_param_set(const char *, const struct kernel_param *); @@ -144,6 +145,11 @@ module_param_named(exclusive_loads, zswap_exclusive_loads_enabled, bool, 0644); /* Number of zpools in zswap_pool (empirically determined for scalability) */ #define ZSWAP_NR_ZPOOLS 32 +bool zswap_never_enabled(void) +{ + return !static_branch_maybe(CONFIG_ZSWAP_DEFAULT_ON, &zswap_ever_enabled); +} + /********************************* * data structures **********************************/ @@ -1410,6 +1416,9 @@ bool zswap_load(struct folio *folio) VM_WARN_ON_ONCE(!folio_test_locked(folio)); + if (zswap_never_enabled()) + return false; + /* find */ spin_lock(&tree->lock); entry = zswap_entry_find_get(&tree->rbroot, offset); @@ -1611,6 +1620,7 @@ static int zswap_setup(void) zpool_get_type(pool->zpools[0])); list_add(&pool->list, &zswap_pools); zswap_has_pool = true; + static_branch_enable(&zswap_ever_enabled); } else { pr_err("pool creation failed\n"); zswap_enabled = false; -- 2.25.1

From: Barry Song <v-songbaohua@oppo.com> mainline inclusion from mainline-v6.12-rc1 commit 325efb16da2c840e165d9b620fec8049d4d664cc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- With large folios swap-in, we might need to uncharge multiple entries all together, add nr argument in mem_cgroup_swapin_uncharge_swap(). For the existing two users, just pass nr=1. Link: https://lkml.kernel.org/r/20240908232119.2157-3-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Gao Xiang <xiang@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/swap_state.c include/linux/memcontrol.h [Context conflict: patch 1d3440305e07 “mm: swap: allocate folio only first time in __read_swap_cache_async()”is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- include/linux/memcontrol.h | 5 +++-- mm/memcontrol.c | 7 ++++--- mm/memory.c | 2 +- mm/swap_state.c | 2 +- 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index abe236201e68..f6a1494ad9ac 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -820,7 +820,8 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); -void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); + +void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry, unsigned int nr_pages); void __mem_cgroup_uncharge(struct folio *folio); @@ -1419,7 +1420,7 @@ static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, return 0; } -static inline void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) +static inline void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry, unsigned int nr) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 023d85aabfe8..00b04c0ce172 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -8499,14 +8499,15 @@ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, /* * mem_cgroup_swapin_uncharge_swap - uncharge swap slot - * @entry: swap entry for which the page is charged + * @entry: the first swap entry for which the pages are charged + * @nr_pages: number of pages which will be uncharged * * Call this function after successfully adding the charged page to swapcache. * * Note: This function assumes the page for which swap slot is being uncharged * is order 0 page. */ -void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) +void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry, unsigned int nr_pages) { /* * Cgroup1's unified memory+swap counter has been charged with the @@ -8526,7 +8527,7 @@ void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) * let's not wait for it. The page already received a * memory+swap charge, drop the swap entry duplicate. */ - mem_cgroup_uncharge_swap(entry, 1); + mem_cgroup_uncharge_swap(entry, nr_pages); } } diff --git a/mm/memory.c b/mm/memory.c index e7ef11ca4dbe..2ce29a20c198 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4088,7 +4088,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret = VM_FAULT_OOM; goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry); + mem_cgroup_swapin_uncharge_swap(entry, 1); shadow = get_shadow_from_swap_cache(entry); if (shadow) diff --git a/mm/swap_state.c b/mm/swap_state.c index a6136403a812..75c149a830c6 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -506,7 +506,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (add_to_swap_cache(folio, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) goto fail_unlock; - mem_cgroup_swapin_uncharge_swap(entry); + mem_cgroup_swapin_uncharge_swap(entry, 1); if (shadow) workingset_refault(folio, shadow); -- 2.25.1

From: Chuanhua Han <hanchuanhua@oppo.com> mainline inclusion from mainline-v6.12-rc1 commit 242d12c98174584a18965cfab95778893872d650 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Currently, we have mTHP features, but unfortunately, without support for large folio swap-ins, once these large folios are swapped out, they are lost because mTHP swap is a one-way process. The lack of mTHP swap-in functionality prevents mTHP from being used on devices like Android that heavily rely on swap. This patch introduces mTHP swap-in support. It starts from sync devices such as zRAM. This is probably the simplest and most common use case, benefiting billions of Android phones and similar devices with minimal implementation cost. In this straightforward scenario, large folios are always exclusive, eliminating the need to handle complex rmap and swapcache issues. It offers several benefits: 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after swap-out and swap-in. Large folios in the buddy system are also preserved as much as possible, rather than being fragmented due to swap-in. 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT. w/o this patch (Refer to the data from Chris's and Kairui's latest swap allocator optimization while running ./thp_swap_allocator_test w/o "-a" option [1]): ./thp_swap_allocator_test Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 131, swpout fallback inc: 101, Fallback percentage: 43.53% Iteration 3: swpout inc: 71, swpout fallback inc: 155, Fallback percentage: 68.58% Iteration 4: swpout inc: 55, swpout fallback inc: 168, Fallback percentage: 75.34% Iteration 5: swpout inc: 35, swpout fallback inc: 191, Fallback percentage: 84.51% Iteration 6: swpout inc: 25, swpout fallback inc: 199, Fallback percentage: 88.84% Iteration 7: swpout inc: 23, swpout fallback inc: 205, Fallback percentage: 89.91% Iteration 8: swpout inc: 9, swpout fallback inc: 219, Fallback percentage: 96.05% Iteration 9: swpout inc: 13, swpout fallback inc: 213, Fallback percentage: 94.25% Iteration 10: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74% Iteration 11: swpout inc: 16, swpout fallback inc: 213, Fallback percentage: 93.01% Iteration 12: swpout inc: 10, swpout fallback inc: 210, Fallback percentage: 95.45% Iteration 13: swpout inc: 16, swpout fallback inc: 212, Fallback percentage: 92.98% Iteration 14: swpout inc: 12, swpout fallback inc: 212, Fallback percentage: 94.64% Iteration 15: swpout inc: 15, swpout fallback inc: 211, Fallback percentage: 93.36% Iteration 16: swpout inc: 15, swpout fallback inc: 200, Fallback percentage: 93.02% Iteration 17: swpout inc: 9, swpout fallback inc: 220, Fallback percentage: 96.07% w/ this patch (always 0%): Iteration 1: swpout inc: 948, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 953, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 3: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 4: swpout inc: 952, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 5: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 6: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 7: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 8: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 9: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 10: swpout inc: 945, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 11: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00% ... 3. With both mTHP swap-out and swap-in supported, we offer the option to enable zsmalloc compression/decompression with larger granularity[2]. The upcoming optimization in zsmalloc will significantly increase swap speed and improve compression efficiency. Tested by running 100 iterations of swapping 100MiB of anon memory, the swap speed improved dramatically: time consumption of swapin(ms) time consumption of swapout(ms) lz4 4k 45274 90540 lz4 64k 22942 55667 zstdn 4k 85035 186585 zstdn 64k 46558 118533 The compression ratio also improved, as evaluated with 1 GiB of data: granularity orig_data_size compr_data_size 4KiB-zstd 1048576000 246876055 64KiB-zstd 1048576000 199763892 Without mTHP swap-in, the potential optimizations in zsmalloc cannot be realized. 4. Even mTHP swap-in itself can reduce swap-in page faults by a factor of nr_pages. Swapping in content filled with the same data 0x11, w/o and w/ the patch for five rounds (Since the content is the same, decompression will be very fast. This primarily assesses the impact of reduced page faults): swp in bandwidth(bytes/ms) w/o w/ round1 624152 1127501 round2 631672 1127501 round3 620459 1139756 round4 606113 1139756 round5 624152 1152281 avg 621310 1137359 +83% 5. With both mTHP swap-out and swap-in supported, we offer the option to enable hardware accelerators(Intel IAA) to do parallel decompression with which Kanchana reported 7X improvement on zRAM read latency[3]. [1] https://lore.kernel.org/all/20240730-swap-allocator-v5-0-cb9c148b9297@kernel... [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.com/ [3] https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux.intel.co... Link: https://lkml.kernel.org/r/20240908232119.2157-4-21cnbao@gmail.com Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com> Co-developed-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Gao Xiang <xiang@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Kairui Song <ryncsn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/memory.c include/linux/mmzone.h [Context conflict: 1. 01626a182302 “mm: avoid unconditional one-tick sleep when swapcache_prepare fails" is merged first. 2. compile error for zswap_never_enabled() not found, because of patch b5ba474f3f51 "swap: shrink zswap pool based on memory pressure" is not merged. 3. compile error for swap_zeromap_batch() not found because of0ca0c24e3211 mm: "store zero pages to be swapped out in a bitmap" is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- include/linux/mmzone.h | 1 + mm/memory.c | 263 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 235 insertions(+), 29 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 412167d6fb41..ebaa85d5ac70 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -22,6 +22,7 @@ #include <linux/mm_types.h> #include <linux/page-flags.h> #include <linux/local_lock.h> +#include <linux/zswap.h> #include <asm/page.h> /* Free memory management - zoned buddy allocator. */ diff --git a/mm/memory.c b/mm/memory.c index 2ce29a20c198..8ebfeec0ae79 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3965,6 +3965,192 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); +static struct folio *__alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct folio *folio; + swp_entry_t entry; + + folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, + vmf->address, false); + if (!folio) + return NULL; + + entry = pte_to_swp_entry(vmf->orig_pte); + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, + GFP_KERNEL, entry)) { + folio_put(folio); + return NULL; + } + + return folio; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) +{ + struct swap_info_struct *si = swp_swap_info(entry); + pgoff_t offset = swp_offset(entry); + int i; + + /* + * While allocating a large folio and doing swap_read_folio, which is + * the case the being faulted pte doesn't have swapcache. We need to + * ensure all PTEs have no cache as well, otherwise, we might go to + * swap devices while the content is in swapcache. + */ + for (i = 0; i < max_nr; i++) { + if ((si->swap_map[offset + i] & SWAP_HAS_CACHE)) + return i; + } + + return i; +} + +/* + * Check if the PTEs within a range are contiguous swap entries + * and have consistent swapcache, zeromap. + */ +static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +{ + unsigned long addr; + swp_entry_t entry; + int idx; + pte_t pte; + + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + idx = (vmf->address - addr) / PAGE_SIZE; + pte = ptep_get(ptep); + + if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -idx))) + return false; + entry = pte_to_swp_entry(pte); + if (swap_pte_batch(ptep, nr_pages, pte) != nr_pages) + return false; + + /* + * swap_read_folio() can't handle the case a large folio is hybridly + * from different backends. And they are likely corner cases. Similar + * things might be added once zswap support large folios. + */ + if (unlikely(non_swapcache_batch(entry, nr_pages) != nr_pages)) + return false; + + return true; +} + +static inline unsigned long thp_swap_suitable_orders(pgoff_t swp_offset, + unsigned long addr, + unsigned long orders) +{ + int order, nr; + + order = highest_order(orders); + + /* + * To swap in a THP with nr pages, we require that its first swap_offset + * is aligned with that number, as it was when the THP was swapped out. + * This helps filter out most invalid entries. + */ + while (orders) { + nr = 1 << order; + if ((addr >> PAGE_SHIFT) % nr == swp_offset % nr) + break; + order = next_order(&orders, order); + } + + return orders; +} + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long orders; + struct folio *folio; + unsigned long addr; + swp_entry_t entry; + spinlock_t *ptl; + pte_t *pte; + gfp_t gfp; + int order; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (unlikely(userfaultfd_armed(vma))) + goto fallback; + + /* + * A large swapped out folio could be partially or fully in zswap. We + * lack handling for such cases, so fallback to swapping in order-0 + * folio. + */ + if (!zswap_never_enabled()) + goto fallback; + + entry = pte_to_swp_entry(vmf->orig_pte); + /* + * Get a list of all the (large) orders below PMD_ORDER that are enabled + * and suitable for swapping THP. + */ + orders = thp_vma_allowable_orders(vma, vma->vm_flags, + TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders = thp_vma_suitable_orders(vma, vmf->address, orders); + orders = thp_swap_suitable_orders(swp_offset(entry), + vmf->address, orders); + + if (!orders) + goto fallback; + + pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address & PMD_MASK, &ptl); + if (unlikely(!pte)) + goto fallback; + + /* + * For do_swap_page, find the highest order where the aligned range is + * completely swap entries with contiguous swap offsets. + */ + order = highest_order(orders); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + if (can_swapin_thp(vmf, pte + pte_index(addr), 1 << order)) + break; + order = next_order(&orders, order); + } + + pte_unmap_unlock(pte, ptl); + + /* Try allocating the highest of the remaining orders. */ + gfp = vma_thp_gfp_mask(vma); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + folio = vma_alloc_folio(gfp, order, vma, addr, true); + if (folio) { + if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, + gfp, entry)) + return folio; + folio_put(folio); + } + order = next_order(&orders, order); + } + +fallback: + return __alloc_swap_folio(vmf); +} +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ +static inline bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +{ + return false; +} + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + return __alloc_swap_folio(vmf); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4058,37 +4244,37 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry, 1)) { - /* Relax a bit to prevent rapid repeated page faults */ - add_wait_queue(&swapcache_wq, &wait); - schedule_timeout_uninterruptible(1); - remove_wait_queue(&swapcache_wq, &wait); - goto out; - } - need_clear_cache = true; - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio = alloc_swap_folio(vmf); page = &folio->page; if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { - ret = VM_FAULT_OOM; + nr_pages = folio_nr_pages(folio); + if (folio_test_large(folio)) + entry.val = ALIGN_DOWN(entry.val, nr_pages); + /* + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread + * may finish swapin first, free the entry, and + * swapout reusing the same entry. It's + * undetectable as pte_same() returns true due + * to entry reuse. + */ + if (swapcache_prepare(entry, nr_pages)) { + /* + * Relax a bit to prevent rapid + * repeated page faults. + */ + add_wait_queue(&swapcache_wq, &wait); + schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry, 1); + need_clear_cache = true; + + mem_cgroup_swapin_uncharge_swap(entry, nr_pages); shadow = get_shadow_from_swap_cache(entry); if (shadow) @@ -4195,6 +4381,24 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_nomap; } + /* allocated large folios for SWP_SYNCHRONOUS_IO */ + if (folio_test_large(folio) && !folio_test_swapcache(folio)) { + unsigned long nr = folio_nr_pages(folio); + unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + unsigned long idx = (vmf->address - folio_start) / PAGE_SIZE; + pte_t *folio_ptep = vmf->pte - idx; + pte_t folio_pte = ptep_get(folio_ptep); + + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || + swap_pte_batch(folio_ptep, nr, folio_pte) != nr) + goto out_nomap; + + page_idx = idx; + address = folio_start; + ptep = folio_ptep; + goto check_folio; + } + nr_pages = 1; page_idx = 0; address = vmf->address; @@ -4327,11 +4531,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_add_lru_vma(folio, vma); } else if (!folio_test_anon(folio)) { /* - * We currently only expect small !anon folios, which are either - * fully exclusive or fully shared. If we ever get large folios - * here, we have to be careful. + * We currently only expect small !anon folios which are either + * fully exclusive or fully shared, or new allocated large + * folios which are fully exclusive. If we ever get large + * folios within swapcache here, we have to be careful. */ - VM_WARN_ON_ONCE(folio_test_large(folio)); + VM_WARN_ON_ONCE(folio_test_large(folio) && folio_test_swapcache(folio)); VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); folio_add_new_anon_rmap(folio, vma, address, rmap_flags); } else { @@ -4374,7 +4579,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) out: /* Clear the swap cache pin for direct swapin after PTL unlock */ if (need_clear_cache) { - swapcache_clear(si, entry, 1); + swapcache_clear(si, entry, nr_pages); if (waitqueue_active(&swapcache_wq)) wake_up(&swapcache_wq); } @@ -4393,7 +4598,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_put(swapcache); } if (need_clear_cache) { - swapcache_clear(si, entry, 1); + swapcache_clear(si, entry, nr_pages); if (waitqueue_active(&swapcache_wq)) wake_up(&swapcache_wq); } -- 2.25.1

From: Barry Song <v-songbaohua@oppo.com> mainline inclusion from mainline-v6.11-rc1 commit 259043e3b730e0aa6408bff27af7edf7a5c9101c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- If CONFIG_ZSWAP is set to N, it means zswap cannot be enabled. zswap_never_enabled() should return true. The only effect of this issue is that with Barry's latest large folio swapin patches for zram ("mm: support mTHP swap-in for zRAM-like swapfile"), we will always fallback to order-0 swapin, even mistakenly when !CONFIG_ZSWAP. Basically this bug makes Barry's in progress patches not work at all. The API was created to inform the mm core that zswap has never been enabled, allowing the mm core to perform mTHP swap-in. This is a transitional solution until zswap supports mTHP. If zswap has been enabled, performing mTHP swap-in will result in corrupted data. You may find the answer in the mTHP swap-in series: https://lore.kernel.org/linux-mm/CAJD7tkZ4FQr6HZpduOdvmqgg_-whuZYE-Bz5O2t6yz... Link: https://lkml.kernel.org/r/20240629232231.42394-1-21cnbao@gmail.com Fixes: 0300e17d67c3 ("mm: zswap: add zswap_never_enabled()") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Chris Li <chrisl@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/zswap.h [Context conflict.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- include/linux/zswap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 4683a8fe0e5b..f647ee38598a 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -35,7 +35,7 @@ static inline void zswap_swapoff(int type) {} static inline bool zswap_never_enabled(void) { - return false; + return true; } #endif -- 2.25.1

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> mainline inclusion from mainline-v6.12-rc4 commit a5e8eb25135a48d400e5a695ba9329bc632c3bb4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When can_swapin_thp() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: mm/memory.c:4184:20: error: unused function 'can_swapin_thp' [-Werror,-Wunused-function] Fix this by removing the unused stub. See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). Link: https://lkml.kernel.org/r/20241008191329.2332346-1-andriy.shevchenko@linux.i... Fixes: 242d12c98174 ("mm: support large folios swap-in for sync io devices") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Barry Song <baohua@kernel.org> Cc: Bill Wendling <morbo@google.com> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/memory.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 8ebfeec0ae79..195a80780974 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4140,11 +4140,6 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) return __alloc_swap_folio(vmf); } #else /* !CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) -{ - return false; -} - static struct folio *alloc_swap_folio(struct vm_fault *vmf) { return __alloc_swap_folio(vmf); -- 2.25.1

From: Barry Song <v-songbaohua@oppo.com> mainline inclusion from mainline-v6.12-rc6 commit b54e1bfecc4b2775c184d2edb319232b853a686d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Similar to PSWPOUT, we should count the number of base pages instead of large folios. Link: https://lkml.kernel.org/r/20241023210201.2798-1-21cnbao@gmail.com Fixes: 242d12c98174 ("mm: support large folios swap-in for sync io devices") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_io.c [Context conflict.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/page_io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index acf6bee86339..167572f84e7a 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -474,7 +474,7 @@ static void swap_read_folio_bdev_sync(struct folio *folio, * attempt to access it in the page fault retry time check. */ get_task_struct(current); - count_vm_event(PSWPIN); + count_vm_events(PSWPIN, folio_nr_pages(folio)); submit_bio_wait(&bio); __end_swap_bio_read(&bio); put_task_struct(current); @@ -489,7 +489,7 @@ static void swap_read_folio_bdev_async(struct folio *folio, bio->bi_iter.bi_sector = swap_folio_sector(folio); bio->bi_end_io = end_swap_bio_read; bio_add_folio_nofail(bio, folio, folio_size(folio), 0); - count_vm_event(PSWPIN); + count_vm_events(PSWPIN, folio_nr_pages(folio)); submit_bio(bio); } -- 2.25.1

From: Barry Song <v-songbaohua@oppo.com> mainline inclusion from mainline-v6.13-rc1 commit aaf2914aec0fa67395574f6fa6b726168b049e60 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This helps profile the sizes of folios being swapped in. Currently, only mTHP swap-out is being counted. The new interface can be found at: /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats swpin For example, cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpin 12809 cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/stats/swpin 4763 [v-songbaohua@oppo.com: add a blank line in doc] Link: https://lkml.kernel.org/r/20241030233423.80759-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20241026082423.26298-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: Documentation/admin-guide/mm/transhuge.rst include/linux/huge_mm.h mm/huge_memory.c mm/page_io.c [Context conflict: 1. patch 0c560dd86040 “mm: swap: count successful large folio zswap stores in hugepage zswpout stats”is not merged. 2. patch 15ff4d409e1a “mm/memcontrol: add per-memcg pgpgin/pswpin counter”is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- Documentation/admin-guide/mm/transhuge.rst | 4 ++++ include/linux/huge_mm.h | 1 + mm/huge_memory.c | 3 +++ mm/page_io.c | 3 +++ 4 files changed, 11 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 3c5cc91696eb..d0ac35d6c844 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -600,6 +600,10 @@ anon_fault_fallback_charge instead falls back to using huge pages with lower orders or small pages even though the allocation was successful. +swpin + is incremented every time a huge page is swapped in from a non-zswap + swap device in one piece. + swpout is incremented every time a huge page is swapped out in one piece without splitting. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 02910e742bb1..97c6ec2b968f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -124,6 +124,7 @@ enum mthp_stat_item { MTHP_STAT_ANON_FAULT_ALLOC, MTHP_STAT_ANON_FAULT_FALLBACK, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, + MTHP_STAT_SWPIN, MTHP_STAT_SWPOUT, MTHP_STAT_SWPOUT_FALLBACK, MTHP_STAT_SHMEM_ALLOC, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4511d26fc28d..fe77b668df52 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -754,6 +754,7 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name) DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); +DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN); DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT); DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK); #ifdef CONFIG_SHMEM @@ -771,6 +772,7 @@ static struct attribute *anon_stats_attrs[] = { &anon_fault_fallback_attr.attr, &anon_fault_fallback_charge_attr.attr, #ifndef CONFIG_SHMEM + &swpin_attr.attr, &swpout_attr.attr, &swpout_fallback_attr.attr, #endif @@ -800,6 +802,7 @@ static struct attribute_group file_stats_attr_grp = { static struct attribute *any_stats_attrs[] = { #ifdef CONFIG_SHMEM + &swpin_attr.attr, &swpout_attr.attr, &swpout_fallback_attr.attr, #endif diff --git a/mm/page_io.c b/mm/page_io.c index 167572f84e7a..839a3b0a90f0 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -411,6 +411,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret) for (p = 0; p < sio->pages; p++) { struct folio *folio = page_folio(sio->bvec[p].bv_page); + count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); folio_mark_uptodate(folio); folio_unlock(folio); } @@ -474,6 +475,7 @@ static void swap_read_folio_bdev_sync(struct folio *folio, * attempt to access it in the page fault retry time check. */ get_task_struct(current); + count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); count_vm_events(PSWPIN, folio_nr_pages(folio)); submit_bio_wait(&bio); __end_swap_bio_read(&bio); @@ -489,6 +491,7 @@ static void swap_read_folio_bdev_async(struct folio *folio, bio->bi_iter.bi_sector = swap_folio_sector(folio); bio->bi_end_io = end_swap_bio_read; bio_add_folio_nofail(bio, folio, folio_size(folio), 0); + count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); count_vm_events(PSWPIN, folio_nr_pages(folio)); submit_bio(bio); } -- 2.25.1

From: Wenchao Hao <haowenchao22@gmail.com> mainline inclusion from mainline-v6.14-rc1 commit 67c8b11bd58aee4644c9a6e495d0c234771e9175 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A3N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Currently, large folio swap-in is supported, but we lack a method to analyze their success ratio. Similar to anon_fault_fallback, we introduce per-order mTHP swpin_fallback and swpin_fallback_charge counters for calculating their success ratio. The new counters are located at: /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats/ swpin_fallback swpin_fallback_charge Link: https://lkml.kernel.org/r/20241202124730.2407037-1-haowenchao22@gmail.com Signed-off-by: Wenchao Hao <haowenchao22@gmail.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lance Yang <ioworker0@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: Documentation/admin-guide/mm/transhuge.rst include/linux/huge_mm.h mm/huge_memory.c mm/memory.c [Context conflict: patch 0c560dd86040 “mm: swap: count successful large folio zswap stores in hugepage zswpout stats”is not merged.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- Documentation/admin-guide/mm/transhuge.rst | 10 ++++++++++ include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 6 ++++++ mm/memory.c | 2 ++ 4 files changed, 20 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index d0ac35d6c844..9c9d938cfae4 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -604,6 +604,16 @@ swpin is incremented every time a huge page is swapped in from a non-zswap swap device in one piece. +swpin_fallback + is incremented if swapin fails to allocate or charge a huge page + and instead falls back to using huge pages with lower orders or + small pages. + +swpin_fallback_charge + is incremented if swapin fails to charge a huge page and instead + falls back to using huge pages with lower orders or small pages + even though the allocation was successful. + swpout is incremented every time a huge page is swapped out in one piece without splitting. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 97c6ec2b968f..cfe42c43b55b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -125,6 +125,8 @@ enum mthp_stat_item { MTHP_STAT_ANON_FAULT_FALLBACK, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, MTHP_STAT_SWPIN, + MTHP_STAT_SWPIN_FALLBACK, + MTHP_STAT_SWPIN_FALLBACK_CHARGE, MTHP_STAT_SWPOUT, MTHP_STAT_SWPOUT_FALLBACK, MTHP_STAT_SHMEM_ALLOC, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fe77b668df52..f22a0dc3fb13 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -755,6 +755,8 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN); +DEFINE_MTHP_STAT_ATTR(swpin_fallback, MTHP_STAT_SWPIN_FALLBACK); +DEFINE_MTHP_STAT_ATTR(swpin_fallback_charge, MTHP_STAT_SWPIN_FALLBACK_CHARGE); DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT); DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK); #ifdef CONFIG_SHMEM @@ -773,6 +775,8 @@ static struct attribute *anon_stats_attrs[] = { &anon_fault_fallback_charge_attr.attr, #ifndef CONFIG_SHMEM &swpin_attr.attr, + &swpin_fallback_attr.attr, + &swpin_fallback_charge_attr.attr, &swpout_attr.attr, &swpout_fallback_attr.attr, #endif @@ -803,6 +807,8 @@ static struct attribute_group file_stats_attr_grp = { static struct attribute *any_stats_attrs[] = { #ifdef CONFIG_SHMEM &swpin_attr.attr, + &swpin_fallback_attr.attr, + &swpin_fallback_charge_attr.attr, &swpout_attr.attr, &swpout_fallback_attr.attr, #endif diff --git a/mm/memory.c b/mm/memory.c index 195a80780974..54f5b1044051 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4131,8 +4131,10 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, gfp, entry)) return folio; + count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK_CHARGE); folio_put(folio); } + count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK); order = next_order(&orders, order); } -- 2.25.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16034 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/QMO... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16034 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/QMO...
participants (2)
-
patchwork bot
-
Tong Tiangen