[PATCH OLK-6.6 00/32] mm: page_alloc: freelist migratetype hygiene

older
[PATCH OLK-6.6] sdei_watchdog: add...

Jinjiang Tu

2 Sep 2025 2 Sep '25

2:19 p.m.

Brendan Jackman (2): mm/page_alloc: clarify terminology in migratetype fallback code mm/page_alloc: clarify should_claim_block() commentary David Hildenbrand (5): mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() mm/page_alloc: make __alloc_contig_migrate_range() static mm/page_alloc: sort out the alloc_contig_range() gfp flags mess mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy() Huan Yang (1): mm: page_alloc: simpify page del and expand Jinjiang Tu (1): mm: page_alloc: don't steal single pages from biggest buddy Johannes Weiner (13): mm: page_alloc: remove pcppage migratetype caching mm: page_alloc: optimize free_unref_folios() mm: page_alloc: fix up block types when merging compatible blocks mm: page_alloc: move free pages when converting block during isolation mm: page_alloc: fix move_freepages_block() range error mm: page_alloc: fix freelist movement during block conversion mm: page_alloc: close migratetype race between freeing and stealing mm: page_isolation: prepare for hygienic freelists mm: page_alloc: consolidate free page accounting mm: page_alloc: batch vmstat updates in expand() mm: page_alloc: fix highatomic typing in multi-block buddies mm: page_alloc: remove remnants of unlocked migratetype updates mm: page_alloc: speed up fallbacks in rmqueue_bulk() Kefeng Wang (1): mm: remove migration for HugePage in isolate_single_pageblock() Kemeng Shi (2): mm/page_alloc: remove unnecessary check in break_down_buddy_pages mm/page_alloc: remove unnecessary next_page in break_down_buddy_pages Luoxi Li (1): mm: remove unused has_isolate_pageblock Vlastimil Babka (1): mm: page_alloc: change move_freepages() to __move_freepages_block() Yajun Deng (1): mm: page_alloc: simplify __free_pages_ok() Yu Zhao (2): mm/page_alloc: keep track of free highatomic mm/contig_alloc: support __GFP_COMP Zi Yan (1): mm: page_alloc: set migratetype inside move_freepages() gaoxiang17 (1): mm/page_alloc: add some detailed comments in can_steal_fallback include/linux/gfp.h | 21 + include/linux/mm.h | 18 +- include/linux/mmzone.h | 2 +- include/linux/page-isolation.h | 13 +- include/linux/vmstat.h | 8 - mm/compaction.c | 45 +- mm/debug_page_alloc.c | 12 +- mm/internal.h | 15 +- mm/page_alloc.c | 1071 +++++++++++++++++++------------- mm/page_isolation.c | 143 ++--- 10 files changed, 731 insertions(+), 617 deletions(-) -- 2.43.0

Show replies by date

Jinjiang Tu

Jinjiang Tu

2:20 p.m.

New subject: [PATCH OLK-6.6 31/32] mm/page_alloc: clarify should_claim_block() commentary

From: Brendan Jackman <jackmanb@google.com> mainline inclusion from mainline-v6.13-rc4 commit a14efee04796dd3f614eaf5348ca1ac099c21349 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There's lots of text here but it's a little hard to follow, this is an attempt to break it up and align its structure more closely with the code. Reword the top-level function comment to just explain what question the function answers from the point of view of the caller. Break up the internal logic into different sections that can have their own commentary describing why that part of the rationale is present. Note the page_group_by_mobility_disabled logic is not explained in the commentary, that is outside the scope of this patch... Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-2-cb2ef1a4e610@google.co... Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 46 ++++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 83f8ec48f2ef..d934070471ba 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1825,16 +1825,9 @@ static void change_pageblock_range(struct page *pageblock_page, } /* - * When we are falling back to another migratetype during allocation, try to - * claim entire blocks to satisfy further allocations, instead of polluting - * multiple pageblocks. - * - * If we are stealing a relatively large buddy page, it is likely there will be - * more free pages in the pageblock, so try to claim the whole block. For - * reclaimable and unmovable allocations, we try to claim the whole block - * regardless of page size, as fragmentation caused by those allocations - * polluting movable pageblocks is worse than movable allocations stealing from - * unmovable and reclaimable pageblocks. + * When we are falling back to another migratetype during allocation, should we + * try to claim an entire block to satisfy further allocations, instead of + * polluting multiple pageblocks? */ static bool should_try_claim_block(unsigned int order, int start_mt) { @@ -1849,19 +1842,32 @@ static bool should_try_claim_block(unsigned int order, int start_mt) return true; /* - * Movable pages won't cause permanent fragmentation, so when you alloc - * small pages, you just need to temporarily steal unmovable or - * reclaimable pages that are closest to the request size. After a - * while, memory compaction may occur to form large contiguous pages, - * and the next movable allocation may not need to steal. Unmovable and - * reclaimable allocations need to actually claim the whole block. + * Above a certain threshold, always try to claim, as it's likely there + * will be more free pages in the pageblock. + */ + if (order >= pageblock_order / 2) + return true; + + /* + * Unmovable/reclaimable allocations would cause permanent + * fragmentations if they fell back to allocating from a movable block + * (polluting it), so we try to claim the whole block regardless of the + * allocation size. Later movable allocations can always steal from this + * block, which is less problematic. */ - if (order >= pageblock_order / 2 || - start_mt == MIGRATE_RECLAIMABLE || - start_mt == MIGRATE_UNMOVABLE || - page_group_by_mobility_disabled) + if (start_mt == MIGRATE_RECLAIMABLE || start_mt == MIGRATE_UNMOVABLE) return true; + if (page_group_by_mobility_disabled) + return true; + + /* + * Movable pages won't cause permanent fragmentation, so when you alloc + * small pages, we just need to temporarily steal unmovable or + * reclaimable pages that are closest to the request size. After a + * while, memory compaction may occur to form large contiguous pages, + * and the next movable allocation may not need to steal. + */ return false; } -- 2.43.0

Jinjiang Tu

2:20 p.m.

New subject: [PATCH OLK-6.6 32/32] mm: page_alloc: speed up fallbacks in rmqueue_bulk()

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.15-rc3 commit 90abee6d7895d5eef18c91d870d8168be4e76e9d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") as the root cause of a 56.4% regression in vm-scalability::lru-file-mmap-read. Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion"), as the root cause for a regression in worst-case zone->lock+irqoff hold times. Both of these patches modify the page allocator's fallback path to be less greedy in an effort to stave off fragmentation. The flip side of this is that fallbacks are also less productive each time around, which means the fallback search can run much more frequently. Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the percpu cache by allocating a large batch of pages in a loop. It highlights how once the native freelists are exhausted, the fallback code first scans orders top-down for whole blocks to claim, then falls back to a bottom-up search for the smallest buddy to steal. For the next batch page, it goes through the same thing again. This can be made more efficient. Since rmqueue_bulk() holds the zone->lock over the entire batch, the freelists are not subject to outside changes; when the search for a block to claim has already failed, there is no point in trying again for the next page. Modify __rmqueue() to remember the last successful fallback mode, and restart directly from there on the next rmqueue_bulk() iteration. Oliver confirms that this improves beyond the regression that the test robot reported against c2f6ea38fc1b: commit: f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap") c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy") acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput Carlos confirms that worst-case times are almost fully recovered compared to before the earlier culprit patch: 2dd482ba627d (before freelist hygiene): 1ms c0cd6f557b90 (after freelist hygiene): 90ms next-20250319 (steal smallest buddy): 280ms this patch : 8ms [jackmanb@google.com: comment updates] Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com [hannes@cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin] Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Brendan Jackman <jackmanb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Carlos Song <carlos.song@nxp.com> Tested-by: Carlos Song <carlos.song@nxp.com> Tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com Reviewed-by: Brendan Jackman <jackmanb@google.com> Tested-by: Shivank Garg <shivankg@amd.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 113 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 79 insertions(+), 34 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d934070471ba..385f6e48e44b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2150,23 +2150,15 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, } /* - * Try finding a free buddy page on the fallback list. - * - * This will attempt to claim a whole pageblock for the requested type - * to ensure grouping of such requests in the future. - * - * If a whole block cannot be claimed, steal an individual page, regressing to - * __rmqueue_smallest() logic to at least break up as little contiguity as - * possible. + * Try to allocate from some fallback migratetype by claiming the entire block, + * i.e. converting it to the allocation's start migratetype. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. - * - * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype, +__rmqueue_claim(struct zone *zone, int order, int start_migratetype, unsigned int alloc_flags) { struct free_area *area; @@ -2204,14 +2196,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = try_to_claim_block(zone, page, current_order, order, start_migratetype, fallback_mt, alloc_flags); - if (page) - goto got_one; + if (page) { + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; + } } - if (alloc_flags & ALLOC_NOFRAGMENT) - return NULL; + return NULL; +} + +/* + * Try to steal a single page from some fallback migratetype. Leave the rest of + * the block as its current migratetype, potentially causing fragmentation. + */ +static __always_inline struct page * +__rmqueue_steal(struct zone *zone, int order, int start_migratetype) +{ + struct free_area *area; + int current_order; + struct page *page; + int fallback_mt; + bool claim_block; - /* No luck claiming pageblock. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, @@ -2221,25 +2228,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = get_page_from_free_area(area, fallback_mt); page_del_and_expand(zone, page, order, current_order, fallback_mt); - goto got_one; + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; } return NULL; - -got_one: - trace_mm_page_alloc_extfrag(page, order, current_order, - start_migratetype, fallback_mt); - - return page; } +enum rmqueue_mode { + RMQUEUE_NORMAL, + RMQUEUE_CMA, + RMQUEUE_CLAIM, + RMQUEUE_STEAL, +}; + /* * Do the hard work of removing an element from the buddy allocator. * Call me with the zone->lock already held. */ static __always_inline struct page * __rmqueue(struct zone *zone, unsigned int order, int migratetype, - unsigned int alloc_flags) + unsigned int alloc_flags, enum rmqueue_mode *mode) { struct page *page; @@ -2258,16 +2268,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, } } - page = __rmqueue_smallest(zone, order, migratetype); - if (unlikely(!page)) { - if (alloc_flags & ALLOC_CMA) + /* + * First try the freelists of the requested migratetype, then try + * fallbacks modes with increasing levels of fragmentation risk. + * + * The fallback logic is expensive and rmqueue_bulk() calls in + * a loop with the zone->lock held, meaning the freelists are + * not subject to any outside changes. Remember in *mode where + * we found pay dirt, to save us the search on the next call. + */ + switch (*mode) { + case RMQUEUE_NORMAL: + page = __rmqueue_smallest(zone, order, migratetype); + if (page) + return page; + fallthrough; + case RMQUEUE_CMA: + if (alloc_flags & ALLOC_CMA) { page = __rmqueue_cma_fallback(zone, order); - - if (!page) - page = __rmqueue_fallback(zone, order, migratetype, - alloc_flags); + if (page) { + *mode = RMQUEUE_CMA; + return page; + } + } + fallthrough; + case RMQUEUE_CLAIM: + page = __rmqueue_claim(zone, order, migratetype, alloc_flags); + if (page) { + /* Replenished preferred freelist, back to normal mode. */ + *mode = RMQUEUE_NORMAL; + return page; + } + fallthrough; + case RMQUEUE_STEAL: + if (!(alloc_flags & ALLOC_NOFRAGMENT)) { + page = __rmqueue_steal(zone, order, migratetype); + if (page) { + *mode = RMQUEUE_STEAL; + return page; + } + } } - return page; + return NULL; } /* @@ -2279,13 +2321,14 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, unsigned int alloc_flags) { + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; unsigned long flags; int i; spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, - alloc_flags); + alloc_flags, &rmqm); if (unlikely(page == NULL)) break; @@ -2896,7 +2939,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { - page = __rmqueue(zone, order, migratetype, alloc_flags); + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; + + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm); /* * If the allocation fails, allow OOM handling and -- 2.43.0

patchwork bot

2:41 p.m.

反馈：您发送到kernel@openeuler.org的补丁/补丁集，已成功转换为PR！ PR链接地址： https://gitee.com/openeuler/kernel/pulls/17857 邮件列表地址：https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZV7... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/17857 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZV7...

211

Age (days ago)

211

Last active (days ago)

List overview

33 comments

2 participants

participants (2)

Jinjiang Tu
patchwork bot

[PATCH OLK-6.6 00/32] mm: page_alloc: freelist migratetype hygiene

tags

participants (2)