v1->v2: 1. Introduce NR_PAGE_ORDERS and MAX_PAGE_ORDER to avoid conflicts.
Optimize compaction v2.
Baolin Wang (2): mm: compaction: update the cc->nr_migratepages when allocating or freeing the freepages mm: compaction: limit the suitable target page order to be less than cc->order
Barry Song (1): mm: compaction: avoid fast_isolate_freepages blindly choose improper pageblock
Hugh Dickins (1): mm: add page_rmappable_folio() wrapper
Hyesoo Yu (1): mm: page_alloc: check the order of compound page even when the order is zero
Kemeng Shi (6): mm/compaction: use correct list in move_freelist_{head}/{tail} mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail} mm/compaction: correctly return failure with bogus compound_order in strict mode mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable mm/compaction: improve comment of is_via_compact_memory mm/compaction: factor out code to test if we should run compaction for target order
Liu Shixin (1): mm/compaction: introduce NR_PAGE_ORDERS and MAX_PAGE_ORDER
Zi Yan (4): mm/page_alloc: remove unused fpi_flags in free_pages_prepare() mm/compaction: enable compacting >0 order folios. mm/compaction: add support for >0 order folio memory compaction. mm/compaction: optimize >0 order folio compaction with free page split.
include/trace/events/compaction.h | 6 +- mm/compaction.c | 365 ++++++++++++++++++++---------- mm/internal.h | 13 +- mm/mempolicy.c | 17 +- mm/page_alloc.c | 28 +-- 5 files changed, 274 insertions(+), 155 deletions(-)
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit bbefa0fc04bab21e85f6b2ee7984c59694366f6a category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "Fixes and cleanups to compaction", v3.
This is a series to do fix and clean up to compaction. Patch 1-2 fix and clean up freepage list operation. Patch 3-4 fix and clean up isolation of freepages Patch 7 factor code to check if compaction is needed for allocation order.
More details can be found in respective patches.
This patch (of 6):
The freepage is chained with buddy_list in freelist head. Use buddy_list instead of lru to correct the list operation.
Link: https://lkml.kernel.org/r/20230901155141.249860-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20230901155141.249860-2-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index e9fe6777c8a3..1b8182398474 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1396,8 +1396,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage) { LIST_HEAD(sublist);
- if (!list_is_last(freelist, &freepage->lru)) { - list_cut_before(&sublist, freelist, &freepage->lru); + if (!list_is_last(freelist, &freepage->buddy_list)) { + list_cut_before(&sublist, freelist, &freepage->buddy_list); list_splice_tail(&sublist, freelist); } } @@ -1413,8 +1413,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage) { LIST_HEAD(sublist);
- if (!list_is_first(freelist, &freepage->lru)) { - list_cut_position(&sublist, freelist, &freepage->lru); + if (!list_is_first(freelist, &freepage->buddy_list)) { + list_cut_position(&sublist, freelist, &freepage->buddy_list); list_splice_tail(&sublist, freelist); } }
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit 4c17989116cb0a6a91f4184077c342a9097b748e category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We use move_freelist_head after list_for_each_entry_reverse to skip recent pages. And there is no need to do actual move if all freepages are searched in list_for_each_entry_reverse, e.g. freepage point to first page in freelist. It's more intuitively to call list_is_first with list entry as the first argument and list head as the second argument to check if list entry is the first list entry instead of call list_is_last with list entry and list head passed in reverse.
Similarly, call list_is_last in move_freelist_tail is more intuitively.
Link: https://lkml.kernel.org/r/20230901155141.249860-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 1b8182398474..e6961e221d58 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1396,7 +1396,7 @@ move_freelist_head(struct list_head *freelist, struct page *freepage) { LIST_HEAD(sublist);
- if (!list_is_last(freelist, &freepage->buddy_list)) { + if (!list_is_first(&freepage->buddy_list, freelist)) { list_cut_before(&sublist, freelist, &freepage->buddy_list); list_splice_tail(&sublist, freelist); } @@ -1413,7 +1413,7 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage) { LIST_HEAD(sublist);
- if (!list_is_first(freelist, &freepage->buddy_list)) { + if (!list_is_last(&freepage->buddy_list, freelist)) { list_cut_position(&sublist, freelist, &freepage->buddy_list); list_splice_tail(&sublist, freelist); }
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit 3da0272a4c7d0d37b47b28e87014f421296fc2be category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In strict mode, we should return 0 if there is any hole in pageblock. If we successfully isolated pages at beginning at pageblock and then have a bogus compound_order outside pageblock in next page. We will abort search loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn, we will treat it as a successful isolation in strict mode as blockpfn is not < end_pfn and return partial isolated pages. Then isolate_freepages_range may success unexpectly with hole in isolated range.
Link: https://lkml.kernel.org/r/20230901155141.249860-4-shikemeng@huaweicloud.com Fixes: 9fcd6d2e052e ("mm, compaction: skip compound pages by order in free scanner") Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index e6961e221d58..1cfdb57e41c9 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -627,11 +627,12 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (PageCompound(page)) { const unsigned int order = compound_order(page);
- if (likely(order <= MAX_ORDER)) { + if (blockpfn + (1UL << order) <= end_pfn) { blockpfn += (1UL << order) - 1; page += (1UL << order) - 1; nr_scanned += (1UL << order) - 1; } + goto isolate_fail; }
@@ -679,8 +680,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, spin_unlock_irqrestore(&cc->zone->lock, flags);
/* - * There is a tiny chance that we have read bogus compound_order(), - * so be careful to not go outside of the pageblock. + * Be careful to not go outside of the pageblock. */ if (unlikely(blockpfn > end_pfn)) blockpfn = end_pfn;
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit 8df4e28c64188911fba33789bf2cb882b3ae524e category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We have compact_blockskip_flush check in __reset_isolation_suitable, just remove repeat check before __reset_isolation_suitable in compact_blockskip_flush.
Link: https://lkml.kernel.org/r/20230901155141.249860-5-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 1cfdb57e41c9..fa928bea6b1e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -383,6 +383,7 @@ static void __reset_isolation_suitable(struct zone *zone) bool source_set = false; bool free_set = false;
+ /* Only flush if a full compaction finished recently */ if (!zone->compact_blockskip_flush) return;
@@ -435,9 +436,7 @@ void reset_isolation_suitable(pg_data_t *pgdat) if (!populated_zone(zone)) continue;
- /* Only flush if a full compaction finished recently */ - if (zone->compact_blockskip_flush) - __reset_isolation_suitable(zone); + __reset_isolation_suitable(zone); } }
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit 9cc17ede5125933ab47f8f359c2cce3aca8ee757 category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We do proactive compaction with order == -1 via 1. /proc/sys/vm/compact_memory 2. /sys/devices/system/node/nodex/compact 3. /proc/sys/vm/compaction_proactiveness Add missed situation in which order == -1.
Link: https://lkml.kernel.org/r/20230901155141.249860-6-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index fa928bea6b1e..9096e5b8a7e0 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2069,8 +2069,10 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) }
/* - * order == -1 is expected when compacting via - * /proc/sys/vm/compact_memory + * order == -1 is expected when compacting proactively via + * 1. /proc/sys/vm/compact_memory + * 2. /sys/devices/system/node/nodex/compact + * 3. /proc/sys/vm/compaction_proactiveness */ static inline bool is_via_compact_memory(int order) {
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit e19a3f595ae47bd8c034b98eb0b28a3877413387 category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We always do zone_watermark_ok check and compaction_suitable check together to test if compaction for target order should be ran. Factor these code out to remove repeat code.
Link: https://lkml.kernel.org/r/20230901155141.249860-7-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 66 +++++++++++++++++++++++++++++-------------------- 1 file changed, 39 insertions(+), 27 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 9096e5b8a7e0..be4747ce1ea3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2382,6 +2382,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, return false; }
+/* + * Should we do compaction for target allocation order. + * Return COMPACT_SUCCESS if allocation for target order can be already + * satisfied + * Return COMPACT_SKIPPED if compaction for target order is likely to fail + * Return COMPACT_CONTINUE if compaction for target order should be ran + */ +static enum compact_result +compaction_suit_allocation_order(struct zone *zone, unsigned int order, + int highest_zoneidx, unsigned int alloc_flags) +{ + unsigned long watermark; + + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK); + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx, + alloc_flags)) + return COMPACT_SUCCESS; + + if (!compaction_suitable(zone, order, highest_zoneidx)) + return COMPACT_SKIPPED; + + return COMPACT_CONTINUE; +} + static enum compact_result compact_zone(struct compact_control *cc, struct capture_control *capc) { @@ -2407,19 +2431,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) cc->migratetype = gfp_migratetype(cc->gfp_mask);
if (!is_via_compact_memory(cc->order)) { - unsigned long watermark; - - /* Allocation can already succeed, nothing to do */ - watermark = wmark_pages(cc->zone, - cc->alloc_flags & ALLOC_WMARK_MASK); - if (zone_watermark_ok(cc->zone, cc->order, watermark, - cc->highest_zoneidx, cc->alloc_flags)) - return COMPACT_SUCCESS; - - /* Compaction is likely to fail */ - if (!compaction_suitable(cc->zone, cc->order, - cc->highest_zoneidx)) - return COMPACT_SKIPPED; + ret = compaction_suit_allocation_order(cc->zone, cc->order, + cc->highest_zoneidx, + cc->alloc_flags); + if (ret != COMPACT_CONTINUE) + return ret; }
/* @@ -2918,6 +2934,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat) int zoneid; struct zone *zone; enum zone_type highest_zoneidx = pgdat->kcompactd_highest_zoneidx; + enum compact_result ret;
for (zoneid = 0; zoneid <= highest_zoneidx; zoneid++) { zone = &pgdat->node_zones[zoneid]; @@ -2925,14 +2942,10 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat) if (!populated_zone(zone)) continue;
- /* Allocation can already succeed, check other zones */ - if (zone_watermark_ok(zone, pgdat->kcompactd_max_order, - min_wmark_pages(zone), - highest_zoneidx, 0)) - continue; - - if (compaction_suitable(zone, pgdat->kcompactd_max_order, - highest_zoneidx)) + ret = compaction_suit_allocation_order(zone, + pgdat->kcompactd_max_order, + highest_zoneidx, ALLOC_WMARK_MIN); + if (ret == COMPACT_CONTINUE) return true; }
@@ -2955,6 +2968,8 @@ static void kcompactd_do_work(pg_data_t *pgdat) .ignore_skip_hint = false, .gfp_mask = GFP_KERNEL, }; + enum compact_result ret; + trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order, cc.highest_zoneidx); count_compact_event(KCOMPACTD_WAKE); @@ -2969,12 +2984,9 @@ static void kcompactd_do_work(pg_data_t *pgdat) if (compaction_deferred(zone, cc.order)) continue;
- /* Allocation can already succeed, nothing to do */ - if (zone_watermark_ok(zone, cc.order, - min_wmark_pages(zone), zoneid, 0)) - continue; - - if (!compaction_suitable(zone, cc.order, zoneid)) + ret = compaction_suit_allocation_order(zone, + cc.order, zoneid, ALLOC_WMARK_MIN); + if (ret != COMPACT_CONTINUE) continue;
if (kthread_should_stop())
From: Hyesoo Yu hyesoo.yu@samsung.com
mainline inclusion from mainline-v6.7-rc1 commit 76f26535d1446373d4735a252ea4247c39d64ba6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
For compound pages, the head sets the PG_head flag and the tail sets the compound_head to indicate the head page. If a user allocates a compound page and frees it with a different order, the compound page information will not be properly initialized. To detect this problem, compound_order(page) and the order argument are compared, but this is not checked when the order argument is zero. That error should be checked regardless of the order.
Link: https://lkml.kernel.org/r/20231023083217.1866451-1-hyesoo.yu@samsung.com Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Reviewed-by: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/page_alloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 10facd4d65ec..ce79b4e63f9b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1086,6 +1086,7 @@ static __always_inline bool free_pages_prepare(struct page *page, int bad = 0; bool skip_kasan_poison = should_skip_kasan_poison(page, fpi_flags); bool init = want_init_on_free(); + bool compound = PageCompound(page);
VM_BUG_ON_PAGE(PageTail(page), page);
@@ -1104,16 +1105,15 @@ static __always_inline bool free_pages_prepare(struct page *page, return false; }
+ VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); + /* * Check tail pages before head page information is cleared to * avoid checking PageCompound for order-0 pages. */ if (unlikely(order)) { - bool compound = PageCompound(page); int i;
- VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); - if (compound) page[1].flags &= ~PAGE_FLAGS_SECOND; for (i = 1; i < (1 << order); i++) {
From: Hugh Dickins hughd@google.com
mainline inclusion from mainline-v6.7-rc1 commit 23e4883248f0472d806c8b3422ba6257e67bf1a5 category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
folio_prep_large_rmappable() is being used repeatedly along with a conversion from page to folio, a check non-NULL, a check order > 1: wrap it all up into struct folio *page_rmappable_folio(struct page *).
Link: https://lkml.kernel.org/r/8d92c6cf-eebe-748-e29c-c8ab224c741@google.com Signed-off-by: Hugh Dickins hughd@google.com Cc: Andi Kleen ak@linux.intel.com Cc: Christoph Lameter cl@linux.com Cc: David Hildenbrand david@redhat.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: "Huang, Ying" ying.huang@intel.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@suse.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Nhat Pham nphamcs@gmail.com Cc: Sidhartha Kumar sidhartha.kumar@oracle.com Cc: Suren Baghdasaryan surenb@google.com Cc: Tejun heo tj@kernel.org Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: Yosry Ahmed yosryahmed@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/internal.h | 9 +++++++++ mm/mempolicy.c | 17 +++-------------- mm/page_alloc.c | 8 ++------ 3 files changed, 14 insertions(+), 20 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 58abd8489f34..3730156306da 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -508,6 +508,15 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
void folio_undo_large_rmappable(struct folio *folio);
+static inline struct folio *page_rmappable_folio(struct page *page) +{ + struct folio *folio = (struct folio *)page; + + if (folio && folio_order(folio) > 1) + folio_prep_large_rmappable(folio); + return folio; +} + static inline void prep_compound_head(struct page *page, unsigned int order) { struct folio *folio = (struct folio *)page; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 707dde78d753..f28f4c277099 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2211,10 +2211,7 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, mpol_cond_put(pol); gfp |= __GFP_COMP; page = alloc_page_interleave(gfp, order, nid); - folio = (struct folio *)page; - if (folio && order > 1) - folio_prep_large_rmappable(folio); - goto out; + return page_rmappable_folio(page); }
if (pol->mode == MPOL_PREFERRED_MANY) { @@ -2224,10 +2221,7 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, gfp |= __GFP_COMP; page = alloc_pages_preferred_many(gfp, order, node, pol); mpol_cond_put(pol); - folio = (struct folio *)page; - if (folio && order > 1) - folio_prep_large_rmappable(folio); - goto out; + return page_rmappable_folio(page); }
if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { @@ -2323,12 +2317,7 @@ EXPORT_SYMBOL(alloc_pages);
struct folio *folio_alloc(gfp_t gfp, unsigned order) { - struct page *page = alloc_pages(gfp | __GFP_COMP, order); - struct folio *folio = (struct folio *)page; - - if (folio && order > 1) - folio_prep_large_rmappable(folio); - return folio; + return page_rmappable_folio(alloc_pages(gfp | __GFP_COMP, order)); } EXPORT_SYMBOL(folio_alloc);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ce79b4e63f9b..9d24b2156c28 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4845,12 +4845,8 @@ struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid, nodemask_t *nodemask) { struct page *page = __alloc_pages(gfp | __GFP_COMP, order, - preferred_nid, nodemask); - struct folio *folio = (struct folio *)page; - - if (folio && order > 1) - folio_prep_large_rmappable(folio); - return folio; + preferred_nid, nodemask); + return page_rmappable_folio(page); } EXPORT_SYMBOL(__folio_alloc);
From: Barry Song 21cnbao@gmail.com
mainline inclusion from mainline-v6.8-rc1 commit d19b1a1797d8e73eebce7eced289e0c7c1b5de80 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Testing shows fast_isolate_freepages can blindly choose an unsuitable pageblock from time to time particularly while the min mark is used from XXX path:
if (!page) { cc->fast_search_fail++; if (scan_start) { /* * Use the highest PFN found above min. If one was * not found, be pessimistic for direct compaction * and use the min mark. */ if (highest >= min_pfn) { page = pfn_to_page(highest); cc->free_pfn = highest; } else { if (cc->direct_compaction && pfn_valid(min_pfn)) { /* XXX */ page = pageblock_pfn_to_page(min_pfn, min(pageblock_end_pfn(min_pfn), zone_end_pfn(cc->zone)), cc->zone); cc->free_pfn = min_pfn; } } } }
The reason is that no code is doing any check on the min_pfn min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
In contrast, slow path of isolate_freepages() is always skipping unsuitable pageblocks in a decent way.
This issue doesn't happen quite often. When running 25 machines with 16GiB memory for one night, most of them can hit this unexpected code path. However the frequency isn't like many times per second. It might be one time in a couple of hours. Thus, it is very hard to measure the visible performance impact in my machines though the affection of choosing the unsuitable migration_target should be negative in theory.
I feel it's still worth fixing this to at least make the code theoretically self-explanatory as it is quite odd an unsuitable migration_target can be still migration_target.
Link: https://lkml.kernel.org/r/20231206110054.61617-1-v-songbaohua@oppo.com Signed-off-by: Barry Song v-songbaohua@oppo.com Reported-by: Zhanyuan Hu huzhanyuan@oppo.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Cc: David Hildenbrand david@redhat.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kemeng Shi shikemeng@huaweicloud.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c index be4747ce1ea3..3df971352e05 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1612,6 +1612,9 @@ static void fast_isolate_freepages(struct compact_control *cc) min(pageblock_end_pfn(min_pfn), zone_end_pfn(cc->zone)), cc->zone); + if (page && !suitable_migration_target(cc, page)) + page = NULL; + cc->free_pfn = min_pfn; } }
From: Baolin Wang baolin.wang@linux.alibaba.com
mainline inclusion from mainline-v6.9-rc1 commit ab755bf4249b992fc2140d615ab0a686d50765b4 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently we will use 'cc->nr_freepages >= cc->nr_migratepages' comparison to ensure that enough freepages are isolated in isolate_freepages(), however it just decreases the cc->nr_freepages without updating cc->nr_migratepages in compaction_alloc(), which will waste more CPU cycles and cause too many freepages to be isolated.
So we should also update the cc->nr_migratepages when allocating or freeing the freepages to avoid isolating excess freepages. And I can see fewer free pages are scanned and isolated when running thpcompact on my Arm64 server:
k6.7 k6.7_patched Ops Compaction pages isolated 120692036.00 118160797.00 Ops Compaction migrate scanned 131210329.00 154093268.00 Ops Compaction free scanned 1090587971.00 1080632536.00 Ops Compact scan efficiency 12.03 14.26
Moreover, I did not see an obvious latency improvements, this is likely because isolating freepages is not the bottleneck in the thpcompact test case.
k6.7 k6.7_patched Amean fault-both-1 1089.76 ( 0.00%) 1080.16 * 0.88%* Amean fault-both-3 1616.48 ( 0.00%) 1636.65 * -1.25%* Amean fault-both-5 2266.66 ( 0.00%) 2219.20 * 2.09%* Amean fault-both-7 2909.84 ( 0.00%) 2801.90 * 3.71%* Amean fault-both-12 4861.26 ( 0.00%) 4733.25 * 2.63%* Amean fault-both-18 7351.11 ( 0.00%) 6950.51 * 5.45%* Amean fault-both-24 9059.30 ( 0.00%) 9159.99 * -1.11%* Amean fault-both-30 10685.68 ( 0.00%) 11399.02 * -6.68%*
Link: https://lkml.kernel.org/r/6440493f18da82298152b6305d6b41c2962a3ce6.170840924... Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Reviewed-by: Vlastimil Babka vbabka@suse.cz Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- include/trace/events/compaction.h | 6 +++--- mm/compaction.c | 12 ++++++++++-- 2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 2b2a975efd20..d05759d18538 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -78,10 +78,10 @@ DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_fast_isolate_freepage #ifdef CONFIG_COMPACTION TRACE_EVENT(mm_compaction_migratepages,
- TP_PROTO(struct compact_control *cc, + TP_PROTO(unsigned int nr_migratepages, unsigned int nr_succeeded),
- TP_ARGS(cc, nr_succeeded), + TP_ARGS(nr_migratepages, nr_succeeded),
TP_STRUCT__entry( __field(unsigned long, nr_migrated) @@ -90,7 +90,7 @@ TRACE_EVENT(mm_compaction_migratepages,
TP_fast_assign( __entry->nr_migrated = nr_succeeded; - __entry->nr_failed = cc->nr_migratepages - nr_succeeded; + __entry->nr_failed = nr_migratepages - nr_succeeded; ),
TP_printk("nr_migrated=%lu nr_failed=%lu", diff --git a/mm/compaction.c b/mm/compaction.c index 3df971352e05..923c9a480ea8 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1778,6 +1778,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) dst = list_entry(cc->freepages.next, struct folio, lru); list_del(&dst->lru); cc->nr_freepages--; + cc->nr_migratepages--;
return dst; } @@ -1793,6 +1794,7 @@ static void compaction_free(struct folio *dst, unsigned long data)
list_add(&dst->lru, &cc->freepages); cc->nr_freepages++; + cc->nr_migratepages++; }
/* possible outcome of isolate_migratepages */ @@ -2418,7 +2420,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; bool update_cached; - unsigned int nr_succeeded = 0; + unsigned int nr_succeeded = 0, nr_migratepages;
/* * These counters track activities during zone compaction. Initialize @@ -2536,11 +2538,17 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) pageblock_start_pfn(cc->migrate_pfn - 1)); }
+ /* + * Record the number of pages to migrate since the + * compaction_alloc/free() will update cc->nr_migratepages + * properly. + */ + nr_migratepages = cc->nr_migratepages; err = migrate_pages(&cc->migratepages, compaction_alloc, compaction_free, (unsigned long)cc, cc->mode, MR_COMPACTION, &nr_succeeded);
- trace_mm_compaction_migratepages(cc, nr_succeeded); + trace_mm_compaction_migratepages(nr_migratepages, nr_succeeded);
/* All pages were either migrated or will be released */ cc->nr_migratepages = 0;
From: Baolin Wang baolin.wang@linux.alibaba.com
mainline inclusion from mainline-v6.9-rc1 commit 1883e8ac96ddd73a87db7f2f8c06111148a3db6f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
It can not improve the fragmentation if we isolate the target free pages exceeding cc->order, especially when the cc->order is less than pageblock_order. For example, suppose the pageblock_order is MAX_ORDER (size is 4M) and cc->order is 2M THP size, we should not isolate other 2M free pages to be the migration target, which can not improve the fragmentation.
Moreover this is also applicable for large folio compaction.
Link: https://lkml.kernel.org/r/afcd9377351c259df7a25a388a4a0d5862b986f4.170592839... Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: Vlastimil Babka vbabka@suse.cz Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 923c9a480ea8..1a2ddc1e1a91 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1347,12 +1347,14 @@ static bool suitable_migration_target(struct compact_control *cc, { /* If the page is a large free page, then disallow migration */ if (PageBuddy(page)) { + int order = cc->order > 0 ? cc->order : pageblock_order; + /* * We are checking page_order without zone->lock taken. But * the only small danger is that we skip a potentially suitable * pageblock, so it's not worth to check order for valid range. */ - if (buddy_order_unsafe(page) >= pageblock_order) + if (buddy_order_unsafe(page) >= order) return false; }
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6
--------------------------------
Define NR_PAGE_ORDERS and MAX_PAGE_ORDER to avoid backport conflicts. This is part of commit fd37721803c6 ("mm, treewide: introduce NR_PAGE_ORDERS") and commit 5e0a760b4441 ("mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER")
Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 1a2ddc1e1a91..719481430d24 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -26,6 +26,9 @@ #include <linux/dynamic_pool.h> #include "internal.h"
+#define NR_PAGE_ORDERS (MAX_ORDER + 1) +#define MAX_PAGE_ORDER MAX_ORDER + #ifdef CONFIG_COMPACTION /* * Fragmentation score check interval for proactive compaction purposes. @@ -1000,7 +1003,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * a valid page order. Consider only values in the * valid order range to prevent low_pfn overflow. */ - if (freepage_order > 0 && freepage_order <= MAX_ORDER) { + if (freepage_order > 0 && freepage_order <= MAX_PAGE_ORDER) { low_pfn += (1UL << freepage_order) - 1; nr_scanned += (1UL << freepage_order) - 1; } @@ -1018,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (PageCompound(page) && !cc->alloc_contig) { const unsigned int order = compound_order(page);
- if (likely(order <= MAX_ORDER)) { + if (likely(order <= MAX_PAGE_ORDER)) { low_pfn += (1UL << order) - 1; nr_scanned += (1UL << order) - 1; } @@ -2237,7 +2240,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
/* Direct compactor: Is a suitable page free? */ ret = COMPACT_NO_SUITABLE_PAGE; - for (order = cc->order; order <= MAX_ORDER; order++) { + for (order = cc->order; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &cc->zone->free_area[order]; bool can_steal;
From: Zi Yan ziy@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 5267fe5d092e80a83740e5a1f6d5638d88ac7309 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "Enable >0 order folio memory compaction", v7.
This patchset enables >0 order folio memory compaction, which is one of the prerequisitions for large folio support[1].
I am aware of that split free pages is necessary for folio migration in compaction, since if >0 order free pages are never split and no order-0 free page is scanned, compaction will end prematurely due to migration returns -ENOMEM. Free page split becomes a must instead of an optimization.
lkp ncompare results (on a 8-CPU (Intel Xeon E5-2650 v4 @2.20GHz) 16G VM) for default LRU (-no-mglru) and CONFIG_LRU_GEN are shown at the bottom, copied from V3[4]. In sum, most of vm-scalability applications do not see performance change, and the others see ~4% to ~26% performance boost under default LRU and ~2% to ~6% performance boost under CONFIG_LRU_GEN.
Overview ===
To support >0 order folio compaction, the patchset changes how free pages used for migration are kept during compaction. Free pages used to be split into order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared, page order stored in page->private is zeroed, and page reference is set to 1). Now all free pages are kept in a NR_PAGE_ORDER array of page lists based on their order without post allocation process. When migrate_pages() asks for a new page, one of the free pages, based on the requested page order, is then processed and given out. And THP <2MB would need this feature.
[1] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.co... [2] https://lore.kernel.org/linux-mm/20231113170157.280181-1-zi.yan@sent.com/ [3] https://lore.kernel.org/linux-mm/20240123034636.1095672-1-zi.yan@sent.com/ [4] https://lore.kernel.org/linux-mm/20240202161554.565023-1-zi.yan@sent.com/ [5] https://lore.kernel.org/linux-mm/20240212163510.859822-1-zi.yan@sent.com/ [6] https://lore.kernel.org/linux-mm/20240214220420.1229173-1-zi.yan@sent.com/ [7] https://lore.kernel.org/linux-mm/20240216170432.1268753-1-zi.yan@sent.com/
This patch (of 4):
Commit 0a54864f8dfb ("kasan: remove PG_skip_kasan_poison flag") removes the use of fpi_flags in should_skip_kasan_poison() and fpi_flags is only passed to should_skip_kasan_poison() in free_pages_prepare(). Remove the unused parameter.
Link: https://lkml.kernel.org/r/20240220183220.1451315-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20240220183220.1451315-2-zi.yan@sent.com Signed-off-by: Zi Yan ziy@nvidia.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: David Hildenbrand david@redhat.com Cc: Adam Manzanares a.manzanares@samsung.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kemeng Shi shikemeng@huaweicloud.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Luis Chamberlain mcgrof@kernel.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Ryan Roberts ryan.roberts@arm.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yin Fengwei fengwei.yin@intel.com Cc: Yu Zhao yuzhao@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/page_alloc.c [ Compatible with dpool_free_page_prepare(). ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/page_alloc.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9d24b2156c28..048d0bd2c8f3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1061,7 +1061,7 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) * on-demand allocation and then freed again before the deferred pages * initialization is done, but this is not likely to happen. */ -static inline bool should_skip_kasan_poison(struct page *page, fpi_t fpi_flags) +static inline bool should_skip_kasan_poison(struct page *page) { if (IS_ENABLED(CONFIG_KASAN_GENERIC)) return deferred_pages_enabled(); @@ -1081,10 +1081,10 @@ static void kernel_init_pages(struct page *page, int numpages) }
static __always_inline bool free_pages_prepare(struct page *page, - unsigned int order, fpi_t fpi_flags) + unsigned int order) { int bad = 0; - bool skip_kasan_poison = should_skip_kasan_poison(page, fpi_flags); + bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page);
@@ -1271,7 +1271,7 @@ static void __free_pages_ok(struct page *page, unsigned int order, unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page);
- if (!free_pages_prepare(page, order, fpi_flags)) + if (!free_pages_prepare(page, order)) return;
/* @@ -1571,7 +1571,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags */ bool dpool_free_page_prepare(struct page *page) { - return free_pages_prepare(page, 0, 0); + return free_pages_prepare(page, 0); }
int dpool_check_new_page(struct page *page) @@ -2372,7 +2372,7 @@ static bool free_unref_page_prepare(struct page *page, unsigned long pfn, { int migratetype;
- if (!free_pages_prepare(page, order, FPI_NONE)) + if (!free_pages_prepare(page, order)) return false;
migratetype = get_pfnblock_migratetype(page, pfn);
From: Zi Yan ziy@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit ee6f62fd34f0bb99ef93f799bcf5fc6a6b24945b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
migrate_pages() supports >0 order folio migration and during compaction, even if compaction_alloc() cannot provide >0 order free pages, migrate_pages() can split the source page and try to migrate the base pages from the split. It can be a baseline and start point for adding support for compacting >0 order folios.
Link: https://lkml.kernel.org/r/20240220183220.1451315-3-zi.yan@sent.com Signed-off-by: Zi Yan ziy@nvidia.com Suggested-by: Huang Ying ying.huang@intel.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Tested-by: Baolin Wang baolin.wang@linux.alibaba.com Tested-by: Yu Zhao yuzhao@google.com Cc: Adam Manzanares a.manzanares@samsung.com Cc: David Hildenbrand david@redhat.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kemeng Shi shikemeng@huaweicloud.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Luis Chamberlain mcgrof@kernel.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Ryan Roberts ryan.roberts@arm.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 101 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 76 insertions(+), 25 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 719481430d24..a135a50e7591 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -44,9 +44,22 @@ static inline void count_compact_events(enum vm_event_item item, long delta) { count_vm_events(item, delta); } + +/* + * order == -1 is expected when compacting proactively via + * 1. /proc/sys/vm/compact_memory + * 2. /sys/devices/system/node/nodex/compact + * 3. /proc/sys/vm/compaction_proactiveness + */ +static inline bool is_via_compact_memory(int order) +{ + return order == -1; +} + #else #define count_compact_event(item) do { } while (0) #define count_compact_events(item, delta) do { } while (0) +static inline bool is_via_compact_memory(int order) { return false; } #endif
#if defined CONFIG_COMPACTION || defined CONFIG_CMA @@ -820,6 +833,32 @@ static bool too_many_isolated(struct compact_control *cc) return too_many; }
+/** + * skip_isolation_on_order() - determine when to skip folio isolation based on + * folio order and compaction target order + * @order: to-be-isolated folio order + * @target_order: compaction target order + * + * This avoids unnecessary folio isolations during compaction. + */ +static bool skip_isolation_on_order(int order, int target_order) +{ + /* + * Unless we are performing global compaction (i.e., + * is_via_compact_memory), skip any folios that are larger than the + * target order: we wouldn't be here if we'd have a free folio with + * the desired target_order, so migrating this folio would likely fail + * later. + */ + if (!is_via_compact_memory(target_order) && order >= target_order) + return true; + /* + * We limit memory compaction to pageblocks and won't try + * creating free blocks of memory that are larger than that. + */ + return order >= pageblock_order; +} + /** * isolate_migratepages_block() - isolate all migrate-able pages within * a single pageblock @@ -950,7 +989,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, valid_page = page; }
- if (PageHuge(page) && cc->alloc_contig) { + if (PageHuge(page)) { + /* + * skip hugetlbfs if we are not compacting for pages + * bigger than its order. THPs and other compound pages + * are handled below. + */ + if (!cc->alloc_contig) { + const unsigned int order = compound_order(page); + + if (order <= MAX_PAGE_ORDER) { + low_pfn += (1UL << order) - 1; + nr_scanned += (1UL << order) - 1; + } + goto isolate_fail; + } + /* for alloc_contig case */ if (locked) { unlock_page_lruvec_irqrestore(locked, flags); locked = NULL; @@ -1011,21 +1065,24 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, }
/* - * Regardless of being on LRU, compound pages such as THP and - * hugetlbfs are not to be compacted unless we are attempting - * an allocation much larger than the huge page size (eg CMA). - * We can potentially save a lot of iterations if we skip them - * at once. The check is racy, but we can consider only valid - * values and the only danger is skipping too much. + * Regardless of being on LRU, compound pages such as THP + * (hugetlbfs is handled above) are not to be compacted unless + * we are attempting an allocation larger than the compound + * page size. We can potentially save a lot of iterations if we + * skip them at once. The check is racy, but we can consider + * only valid values and the only danger is skipping too much. */ if (PageCompound(page) && !cc->alloc_contig) { const unsigned int order = compound_order(page);
- if (likely(order <= MAX_PAGE_ORDER)) { - low_pfn += (1UL << order) - 1; - nr_scanned += (1UL << order) - 1; + /* Skip based on page order and compaction target order. */ + if (skip_isolation_on_order(order, cc->order)) { + if (order <= MAX_PAGE_ORDER) { + low_pfn += (1UL << order) - 1; + nr_scanned += (1UL << order) - 1; + } + goto isolate_fail; } - goto isolate_fail; }
/* @@ -1150,10 +1207,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, }
/* - * folio become large since the non-locked check, - * and it's on LRU. + * Check LRU folio order under the lock */ - if (unlikely(folio_test_large(folio) && !cc->alloc_contig)) { + if (unlikely(skip_isolation_on_order(folio_order(folio), + cc->order) && + !cc->alloc_contig)) { low_pfn += folio_nr_pages(folio) - 1; nr_scanned += folio_nr_pages(folio) - 1; folio_set_lru(folio); @@ -1773,6 +1831,10 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) struct compact_control *cc = (struct compact_control *)data; struct folio *dst;
+ /* this makes migrate_pages() split the source page and retry */ + if (folio_test_large(src)) + return NULL; + if (list_empty(&cc->freepages)) { isolate_freepages(cc);
@@ -2078,17 +2140,6 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE; }
-/* - * order == -1 is expected when compacting proactively via - * 1. /proc/sys/vm/compact_memory - * 2. /sys/devices/system/node/nodex/compact - * 3. /proc/sys/vm/compaction_proactiveness - */ -static inline bool is_via_compact_memory(int order) -{ - return order == -1; -} - /* * Determine whether kswapd is (or recently was!) running on this node. *
From: Zi Yan ziy@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 733aea0b3a7bba0451dfc19322665de13a5b7af4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Before last commit, memory compaction only migrates order-0 folios and skips >0 order folios. Last commit splits all >0 order folios during compaction. This commit migrates >0 order folios during compaction by keeping isolated free pages at their original size without splitting them into order-0 pages and using them directly during migration process.
What is different from the prior implementation: 1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page lists, where each page list stores free pages in the same order. 2. All free pages are not post_alloc_hook() processed nor buddy pages, although their orders are stored in first page's private like buddy pages. 3. During migration, in new page allocation time (i.e., in compaction_alloc()), free pages are then processed by post_alloc_hook(). When migration fails and a new page is returned (i.e., in compaction_free()), free pages are restored by reversing the post_alloc_hook() operations using newly added free_pages_prepare_fpi_none().
Step 3 is done for a latter optimization that splitting and/or merging free pages during compaction becomes easier.
Note: without splitting free pages, compaction can end prematurely due to migration will return -ENOMEM even if there is free pages. This happens when no order-0 free page exist and compaction_alloc() return NULL.
Link: https://lkml.kernel.org/r/20240220183220.1451315-4-zi.yan@sent.com Signed-off-by: Zi Yan ziy@nvidia.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Tested-by: Baolin Wang baolin.wang@linux.alibaba.com Tested-by: Yu Zhao yuzhao@google.com Cc: Adam Manzanares a.manzanares@samsung.com Cc: David Hildenbrand david@redhat.com Cc: Huang Ying ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kemeng Shi shikemeng@huaweicloud.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Luis Chamberlain mcgrof@kernel.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Ryan Roberts ryan.roberts@arm.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/internal.h [ Fix context conflict with dpool_* functions ]. Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 140 +++++++++++++++++++++++++++--------------------- mm/internal.h | 4 +- mm/page_alloc.c | 2 +- 3 files changed, 83 insertions(+), 63 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index a135a50e7591..44c1b5f32e3d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -83,45 +83,56 @@ static inline bool is_via_compact_memory(int order) { return false; } #define COMPACTION_HPAGE_ORDER (PMD_SHIFT - PAGE_SHIFT) #endif
-static unsigned long release_freepages(struct list_head *freelist) +static void split_map_pages(struct list_head *freepages) { + unsigned int i, order; struct page *page, *next; - unsigned long high_pfn = 0; + LIST_HEAD(tmp_list);
- list_for_each_entry_safe(page, next, freelist, lru) { - unsigned long pfn = page_to_pfn(page); - list_del(&page->lru); - __free_page(page); - if (pfn > high_pfn) - high_pfn = pfn; - } + for (order = 0; order < NR_PAGE_ORDERS; order++) { + list_for_each_entry_safe(page, next, &freepages[order], lru) { + unsigned int nr_pages;
- return high_pfn; + list_del(&page->lru); + + nr_pages = 1 << order; + + post_alloc_hook(page, order, __GFP_MOVABLE); + if (order) + split_page(page, order); + + for (i = 0; i < nr_pages; i++) { + list_add(&page->lru, &tmp_list); + page++; + } + } + list_splice_init(&tmp_list, &freepages[0]); + } }
-static void split_map_pages(struct list_head *list) +static unsigned long release_free_list(struct list_head *freepages) { - unsigned int i, order, nr_pages; - struct page *page, *next; - LIST_HEAD(tmp_list); - - list_for_each_entry_safe(page, next, list, lru) { - list_del(&page->lru); + int order; + unsigned long high_pfn = 0;
- order = page_private(page); - nr_pages = 1 << order; + for (order = 0; order < NR_PAGE_ORDERS; order++) { + struct page *page, *next;
- post_alloc_hook(page, order, __GFP_MOVABLE); - if (order) - split_page(page, order); + list_for_each_entry_safe(page, next, &freepages[order], lru) { + unsigned long pfn = page_to_pfn(page);
- for (i = 0; i < nr_pages; i++) { - list_add(&page->lru, &tmp_list); - page++; + list_del(&page->lru); + /* + * Convert free pages into post allocation pages, so + * that we can free them via __free_page. + */ + post_alloc_hook(page, order, __GFP_MOVABLE); + __free_pages(page, order); + if (pfn > high_pfn) + high_pfn = pfn; } } - - list_splice(&tmp_list, list); + return high_pfn; }
#ifdef CONFIG_COMPACTION @@ -674,7 +685,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, nr_scanned += isolated - 1; total_isolated += isolated; cc->nr_freepages += isolated; - list_add_tail(&page->lru, freelist); + list_add_tail(&page->lru, &freelist[order]);
if (!strict && cc->nr_migratepages <= cc->nr_freepages) { blockpfn += isolated; @@ -739,7 +750,11 @@ isolate_freepages_range(struct compact_control *cc, unsigned long start_pfn, unsigned long end_pfn) { unsigned long isolated, pfn, block_start_pfn, block_end_pfn; - LIST_HEAD(freelist); + int order; + struct list_head tmp_freepages[NR_PAGE_ORDERS]; + + for (order = 0; order < NR_PAGE_ORDERS; order++) + INIT_LIST_HEAD(&tmp_freepages[order]);
pfn = start_pfn; block_start_pfn = pageblock_start_pfn(pfn); @@ -770,7 +785,7 @@ isolate_freepages_range(struct compact_control *cc, break;
isolated = isolate_freepages_block(cc, &isolate_start_pfn, - block_end_pfn, &freelist, 0, true); + block_end_pfn, tmp_freepages, 0, true);
/* * In strict mode, isolate_freepages_block() returns 0 if @@ -787,15 +802,15 @@ isolate_freepages_range(struct compact_control *cc, */ }
- /* __isolate_free_page() does not map the pages */ - split_map_pages(&freelist); - if (pfn < end_pfn) { /* Loop terminated early, cleanup. */ - release_freepages(&freelist); + release_free_list(tmp_freepages); return 0; }
+ /* __isolate_free_page() does not map the pages */ + split_map_pages(tmp_freepages); + /* We don't use freelists for anything. */ return pfn; } @@ -1503,7 +1518,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn) if (!page) return;
- isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); + isolate_freepages_block(cc, &start_pfn, end_pfn, cc->freepages, 1, false);
/* Skip this pageblock in the future as it's full or nearly full */ if (start_pfn == end_pfn && !cc->no_set_skip_hint) @@ -1632,7 +1647,7 @@ static void fast_isolate_freepages(struct compact_control *cc) nr_scanned += nr_isolated - 1; total_isolated += nr_isolated; cc->nr_freepages += nr_isolated; - list_add_tail(&page->lru, &cc->freepages); + list_add_tail(&page->lru, &cc->freepages[order]); count_compact_events(COMPACTISOLATED, nr_isolated); } else { /* If isolation fails, abort the search */ @@ -1709,13 +1724,12 @@ static void isolate_freepages(struct compact_control *cc) unsigned long isolate_start_pfn; /* exact pfn we start at */ unsigned long block_end_pfn; /* end of current pageblock */ unsigned long low_pfn; /* lowest pfn scanner is able to scan */ - struct list_head *freelist = &cc->freepages; unsigned int stride;
/* Try a small search of the free lists for a candidate */ fast_isolate_freepages(cc); if (cc->nr_freepages) - goto splitmap; + return;
/* * Initialise the free scanner. The starting point is where we last @@ -1775,7 +1789,7 @@ static void isolate_freepages(struct compact_control *cc)
/* Found a block suitable for isolating free pages from. */ nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn, - block_end_pfn, freelist, stride, false); + block_end_pfn, cc->freepages, stride, false);
/* Update the skip hint if the full pageblock was scanned */ if (isolate_start_pfn == block_end_pfn) @@ -1816,10 +1830,6 @@ static void isolate_freepages(struct compact_control *cc) * and the loop terminated due to isolate_start_pfn < low_pfn */ cc->free_pfn = isolate_start_pfn; - -splitmap: - /* __isolate_free_page() does not map the pages */ - split_map_pages(freelist); }
/* @@ -1830,24 +1840,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) { struct compact_control *cc = (struct compact_control *)data; struct folio *dst; + int order = folio_order(src);
- /* this makes migrate_pages() split the source page and retry */ - if (folio_test_large(src)) - return NULL; - - if (list_empty(&cc->freepages)) { + if (list_empty(&cc->freepages[order])) { isolate_freepages(cc); - - if (list_empty(&cc->freepages)) + if (list_empty(&cc->freepages[order])) return NULL; }
- dst = list_entry(cc->freepages.next, struct folio, lru); + dst = list_first_entry(&cc->freepages[order], struct folio, lru); list_del(&dst->lru); - cc->nr_freepages--; - cc->nr_migratepages--; - - return dst; + post_alloc_hook(&dst->page, order, __GFP_MOVABLE); + if (order) + prep_compound_page(&dst->page, order); + cc->nr_freepages -= 1 << order; + cc->nr_migratepages -= 1 << order; + return page_rmappable_folio(&dst->page); }
/* @@ -1858,10 +1866,19 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) static void compaction_free(struct folio *dst, unsigned long data) { struct compact_control *cc = (struct compact_control *)data; + int order = folio_order(dst); + struct page *page = &dst->page;
- list_add(&dst->lru, &cc->freepages); - cc->nr_freepages++; - cc->nr_migratepages++; + if (folio_put_testzero(dst)) { + free_pages_prepare(page, order); + list_add(&dst->lru, &cc->freepages[order]); + cc->nr_freepages += 1 << order; + } + cc->nr_migratepages += 1 << order; + /* + * someone else has referenced the page, we cannot take it back to our + * free list. + */ }
/* possible outcome of isolate_migratepages */ @@ -2477,6 +2494,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) const bool sync = cc->mode != MIGRATE_ASYNC; bool update_cached; unsigned int nr_succeeded = 0, nr_migratepages; + int order;
/* * These counters track activities during zone compaction. Initialize @@ -2486,7 +2504,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) cc->total_free_scanned = 0; cc->nr_migratepages = 0; cc->nr_freepages = 0; - INIT_LIST_HEAD(&cc->freepages); + for (order = 0; order < NR_PAGE_ORDERS; order++) + INIT_LIST_HEAD(&cc->freepages[order]); INIT_LIST_HEAD(&cc->migratepages);
cc->migratetype = gfp_migratetype(cc->gfp_mask); @@ -2678,7 +2697,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) * so we don't leave any returned pages behind in the next attempt. */ if (cc->nr_freepages > 0) { - unsigned long free_pfn = release_freepages(&cc->freepages); + unsigned long free_pfn = release_free_list(cc->freepages);
cc->nr_freepages = 0; VM_BUG_ON(free_pfn == 0); @@ -2697,7 +2716,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
- VM_BUG_ON(!list_empty(&cc->freepages)); VM_BUG_ON(!list_empty(&cc->migratepages));
return ret; diff --git a/mm/internal.h b/mm/internal.h index 3730156306da..176d94508d29 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -540,6 +540,8 @@ extern void prep_compound_page(struct page *page, unsigned int order);
extern void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags); +extern bool free_pages_prepare(struct page *page, unsigned int order); + #ifdef CONFIG_DYNAMIC_POOL extern bool dpool_free_page_prepare(struct page *page); extern int dpool_check_new_page(struct page *page); @@ -580,7 +582,7 @@ int split_free_page(struct page *free_page, * completes when free_pfn <= migrate_pfn */ struct compact_control { - struct list_head freepages; /* List of free pages to migrate to */ + struct list_head freepages[NR_PAGE_ORDERS]; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ unsigned int nr_freepages; /* Number of isolated free pages */ unsigned int nr_migratepages; /* Number of pages to migrate */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 048d0bd2c8f3..fafdbf5ae169 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1080,7 +1080,7 @@ static void kernel_init_pages(struct page *page, int numpages) kasan_enable_current(); }
-static __always_inline bool free_pages_prepare(struct page *page, +__always_inline bool free_pages_prepare(struct page *page, unsigned int order) { int bad = 0;
From: Zi Yan ziy@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 73318e2cafe53e8b7c8899d990cf8eaca32184d0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CXS6 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
During migration in a memory compaction, free pages are placed in an array of page lists based on their order. But the desired free page order (i.e., the order of a source page) might not be always present, thus leading to migration failures and premature compaction termination. Split a high order free pages when source migration page has a lower order to increase migration successful rate.
Note: merging free pages when a migration fails and a lower order free page is returned via compaction_free() is possible, but there is too much work. Since the free pages are not buddy pages, it is hard to identify these free pages using existing PFN-based page merging algorithm.
Link: https://lkml.kernel.org/r/20240220183220.1451315-5-zi.yan@sent.com Signed-off-by: Zi Yan ziy@nvidia.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Tested-by: Baolin Wang baolin.wang@linux.alibaba.com Tested-by: Yu Zhao yuzhao@google.com Cc: Adam Manzanares a.manzanares@samsung.com Cc: David Hildenbrand david@redhat.com Cc: Huang Ying ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kemeng Shi shikemeng@huaweicloud.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Luis Chamberlain mcgrof@kernel.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Ryan Roberts ryan.roberts@arm.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 35 ++++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 44c1b5f32e3d..52c5f9e2b2c3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1841,15 +1841,40 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) struct compact_control *cc = (struct compact_control *)data; struct folio *dst; int order = folio_order(src); + bool has_isolated_pages = false; + int start_order; + struct page *freepage; + unsigned long size; + +again: + for (start_order = order; start_order < NR_PAGE_ORDERS; start_order++) + if (!list_empty(&cc->freepages[start_order])) + break;
- if (list_empty(&cc->freepages[order])) { - isolate_freepages(cc); - if (list_empty(&cc->freepages[order])) + /* no free pages in the list */ + if (start_order == NR_PAGE_ORDERS) { + if (has_isolated_pages) return NULL; + isolate_freepages(cc); + has_isolated_pages = true; + goto again; + } + + freepage = list_first_entry(&cc->freepages[start_order], struct page, + lru); + size = 1 << start_order; + + list_del(&freepage->lru); + + while (start_order > order) { + start_order--; + size >>= 1; + + list_add(&freepage[size].lru, &cc->freepages[start_order]); + set_page_private(&freepage[size], start_order); } + dst = (struct folio *)freepage;
- dst = list_first_entry(&cc->freepages[order], struct folio, lru); - list_del(&dst->lru); post_alloc_hook(&dst->page, order, __GFP_MOVABLE); if (order) prep_compound_page(&dst->page, order);