[PATCH OLK-6.6 00/32] mm: page_alloc: freelist migratetype hygiene

Brendan Jackman (2): mm/page_alloc: clarify terminology in migratetype fallback code mm/page_alloc: clarify should_claim_block() commentary David Hildenbrand (5): mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() mm/page_alloc: make __alloc_contig_migrate_range() static mm/page_alloc: sort out the alloc_contig_range() gfp flags mess mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy() Huan Yang (1): mm: page_alloc: simpify page del and expand Jinjiang Tu (1): mm: page_alloc: don't steal single pages from biggest buddy Johannes Weiner (13): mm: page_alloc: remove pcppage migratetype caching mm: page_alloc: optimize free_unref_folios() mm: page_alloc: fix up block types when merging compatible blocks mm: page_alloc: move free pages when converting block during isolation mm: page_alloc: fix move_freepages_block() range error mm: page_alloc: fix freelist movement during block conversion mm: page_alloc: close migratetype race between freeing and stealing mm: page_isolation: prepare for hygienic freelists mm: page_alloc: consolidate free page accounting mm: page_alloc: batch vmstat updates in expand() mm: page_alloc: fix highatomic typing in multi-block buddies mm: page_alloc: remove remnants of unlocked migratetype updates mm: page_alloc: speed up fallbacks in rmqueue_bulk() Kefeng Wang (1): mm: remove migration for HugePage in isolate_single_pageblock() Kemeng Shi (2): mm/page_alloc: remove unnecessary check in break_down_buddy_pages mm/page_alloc: remove unnecessary next_page in break_down_buddy_pages Luoxi Li (1): mm: remove unused has_isolate_pageblock Vlastimil Babka (1): mm: page_alloc: change move_freepages() to __move_freepages_block() Yajun Deng (1): mm: page_alloc: simplify __free_pages_ok() Yu Zhao (2): mm/page_alloc: keep track of free highatomic mm/contig_alloc: support __GFP_COMP Zi Yan (1): mm: page_alloc: set migratetype inside move_freepages() gaoxiang17 (1): mm/page_alloc: add some detailed comments in can_steal_fallback include/linux/gfp.h | 21 + include/linux/mm.h | 18 +- include/linux/mmzone.h | 2 +- include/linux/page-isolation.h | 13 +- include/linux/vmstat.h | 8 - mm/compaction.c | 45 +- mm/debug_page_alloc.c | 12 +- mm/internal.h | 15 +- mm/page_alloc.c | 1071 +++++++++++++++++++------------- mm/page_isolation.c | 143 ++--- 10 files changed, 731 insertions(+), 617 deletions(-) -- 2.43.0

From: Kemeng Shi <shikemeng@huaweicloud.com> mainline inclusion from mainline-v6.7-rc1 commit 27e0db3c21aaf1422980e64b77956e15b839306f category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "Two minor cleanups to break_down_buddy_pages", v2. Two minor cleanups to break_down_buddy_pages. This patch (of 2): 1. We always have target in range started with next_page and full free range started with current_buddy. 2. The last split range size is 1 << low and low should be >= 0, then size >= 1. So page + size != page is always true (because size > 0). As summary, current_page will not equal to target page. Link: https://lkml.kernel.org/r/20230927103514.98281-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20230927103514.98281-2-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c233d61d0d06..55ea3b2cafa2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6986,10 +6986,8 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high, migratetype)) continue; - if (current_buddy != target) { - add_to_free_list(current_buddy, zone, high, migratetype); - set_buddy_order(current_buddy, high); - } + add_to_free_list(current_buddy, zone, high, migratetype); + set_buddy_order(current_buddy, high); } } -- 2.43.0

From: Kemeng Shi <shikemeng@huaweicloud.com> mainline inclusion from mainline-v6.7-rc1 commit 0dfca313a009c83e2ad44b3719dc1222df6c6db5 category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The next_page is only used to forward page in case target is in second half range. Move forward page directly to remove unnecessary next_page. Link: https://lkml.kernel.org/r/20230927103514.98281-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 55ea3b2cafa2..3581c3e01335 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6968,20 +6968,18 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, int migratetype) { unsigned long size = 1 << high; - struct page *current_buddy, *next_page; + struct page *current_buddy; while (high > low) { high--; size >>= 1; if (target >= &page[size]) { - next_page = page + size; current_buddy = page; + page = page + size; } else { - next_page = page; current_buddy = page + size; } - page = next_page; if (set_page_guard(zone, current_buddy, high, migratetype)) continue; -- 2.43.0

From: Yajun Deng <yajun.deng@linux.dev> mainline inclusion from mainline-v6.8-rc1 commit 250ae189d98290d0539b4f9b8c4703e0bf24f9d3 category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There is redundant code in __free_pages_ok(). Use free_one_page() simplify it. Link: https://lkml.kernel.org/r/20231216030503.2126130-1-yajun.deng@linux.dev Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3581c3e01335..776fabffb96e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1273,7 +1273,6 @@ static void free_one_page(struct zone *zone, static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags) { - unsigned long flags; int migratetype; unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); @@ -1288,13 +1287,7 @@ static void __free_pages_ok(struct page *page, unsigned int order, */ migratetype = get_pfnblock_migratetype(page, pfn); - spin_lock_irqsave(&zone->lock, flags); - if (unlikely(has_isolate_pageblock(zone) || - is_migrate_isolate(migratetype))) { - migratetype = get_pfnblock_migratetype(page, pfn); - } - __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); - spin_unlock_irqrestore(&zone->lock, flags); + free_one_page(zone, page, pfn, order, migratetype, fpi_flags); __count_vm_events(PGFREE, 1 << order); } -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit 17edeb5d3f761c20fd28f6002f5a9faa53c0a0d8 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "mm: page_alloc: freelist migratetype hygiene", v4. The page allocator's mobility grouping is intended to keep unmovable pages separate from reclaimable/compactable ones to allow on-demand defragmentation for higher-order allocations and huge pages. Currently, there are several places where accidental type mixing occurs: an allocation asks for a page of a certain migratetype and receives another. This ruins pageblocks for compaction, which in turn makes allocating huge pages more expensive and less reliable. The series addresses those causes. The last patch adds type checks on all freelist movements to prevent new violations being introduced. The benefits can be seen in a mixed workload that stresses the machine with a memcache-type workload and a kernel build job while periodically attempting to allocate batches of THP. The following data is aggregated over 50 consecutive defconfig builds: VANILLA PATCHED Hugealloc Time mean 165843.93 ( +0.00%) 113025.88 ( -31.85%) Hugealloc Time stddev 158957.35 ( +0.00%) 114716.07 ( -27.83%) Kbuild Real time 310.24 ( +0.00%) 300.73 ( -3.06%) Kbuild User time 1271.13 ( +0.00%) 1259.42 ( -0.92%) Kbuild System time 582.02 ( +0.00%) 559.79 ( -3.81%) THP fault alloc 30585.14 ( +0.00%) 40853.62 ( +33.57%) THP fault fallback 36626.46 ( +0.00%) 26357.62 ( -28.04%) THP fault fail rate % 54.49 ( +0.00%) 39.22 ( -27.53%) Pagealloc fallback 1328.00 ( +0.00%) 1.00 ( -99.85%) Pagealloc type mismatch 181009.50 ( +0.00%) 0.00 ( -100.00%) Direct compact stall 434.56 ( +0.00%) 257.66 ( -40.61%) Direct compact fail 421.70 ( +0.00%) 249.94 ( -40.63%) Direct compact success 12.86 ( +0.00%) 7.72 ( -37.09%) Direct compact success rate % 2.86 ( +0.00%) 2.82 ( -0.96%) Compact daemon scanned migrate 3370059.62 ( +0.00%) 3612054.76 ( +7.18%) Compact daemon scanned free 7718439.20 ( +0.00%) 5386385.02 ( -30.21%) Compact direct scanned migrate 309248.62 ( +0.00%) 176721.04 ( -42.85%) Compact direct scanned free 433582.84 ( +0.00%) 315727.66 ( -27.18%) Compact migrate scanned daemon % 91.20 ( +0.00%) 94.48 ( +3.56%) Compact free scanned daemon % 94.58 ( +0.00%) 94.42 ( -0.16%) Compact total migrate scanned 3679308.24 ( +0.00%) 3788775.80 ( +2.98%) Compact total free scanned 8152022.04 ( +0.00%) 5702112.68 ( -30.05%) Alloc stall 872.04 ( +0.00%) 5156.12 ( +490.71%) Pages kswapd scanned 510645.86 ( +0.00%) 3394.94 ( -99.33%) Pages kswapd reclaimed 134811.62 ( +0.00%) 2701.26 ( -98.00%) Pages direct scanned 99546.06 ( +0.00%) 376407.52 ( +278.12%) Pages direct reclaimed 62123.40 ( +0.00%) 289535.70 ( +366.06%) Pages total scanned 610191.92 ( +0.00%) 379802.46 ( -37.76%) Pages scanned kswapd % 76.36 ( +0.00%) 0.10 ( -98.58%) Swap out 12057.54 ( +0.00%) 15022.98 ( +24.59%) Swap in 209.16 ( +0.00%) 256.48 ( +22.52%) File refaults 17701.64 ( +0.00%) 11765.40 ( -33.53%) Huge page success rate is higher, allocation latencies are shorter and more predictable. Stealing (fallback) rate is drastically reduced. Notably, while the vanilla kernel keeps doing fallbacks on an ongoing basis, the patched kernel enters a steady state once the distribution of block types is adequate for the workload. Steals over 50 runs: VANILLA PATCHED 1504.0 227.0 1557.0 6.0 1391.0 13.0 1080.0 26.0 1057.0 40.0 1156.0 6.0 805.0 46.0 736.0 20.0 1747.0 2.0 1699.0 34.0 1269.0 13.0 1858.0 12.0 907.0 4.0 727.0 2.0 563.0 2.0 3094.0 2.0 10211.0 3.0 2621.0 1.0 5508.0 2.0 1060.0 2.0 538.0 3.0 5773.0 2.0 2199.0 0.0 3781.0 2.0 1387.0 1.0 4977.0 0.0 2865.0 1.0 1814.0 1.0 3739.0 1.0 6857.0 0.0 382.0 0.0 407.0 1.0 3784.0 0.0 297.0 0.0 298.0 0.0 6636.0 0.0 4188.0 0.0 242.0 0.0 9960.0 0.0 5816.0 0.0 354.0 0.0 287.0 0.0 261.0 0.0 140.0 1.0 2065.0 0.0 312.0 0.0 331.0 0.0 164.0 0.0 465.0 1.0 219.0 0.0 Type mismatches are down too. Those count every time an allocation request asks for one migratetype and gets another. This can still occur minimally in the patched kernel due to non-stealing fallbacks, but it's quite rare and follows the pattern of overall fallbacks - once the block type distribution settles, mismatches cease as well: VANILLA: PATCHED: 182602.0 268.0 135794.0 20.0 88619.0 19.0 95973.0 0.0 129590.0 0.0 129298.0 0.0 147134.0 0.0 230854.0 0.0 239709.0 0.0 137670.0 0.0 132430.0 0.0 65712.0 0.0 57901.0 0.0 67506.0 0.0 63565.0 4.0 34806.0 0.0 42962.0 0.0 32406.0 0.0 38668.0 0.0 61356.0 0.0 57800.0 0.0 41435.0 0.0 83456.0 0.0 65048.0 0.0 28955.0 0.0 47597.0 0.0 75117.0 0.0 55564.0 0.0 38280.0 0.0 52404.0 0.0 26264.0 0.0 37538.0 0.0 19671.0 0.0 30936.0 0.0 26933.0 0.0 16962.0 0.0 44554.0 0.0 46352.0 0.0 24995.0 0.0 35152.0 0.0 12823.0 0.0 21583.0 0.0 18129.0 0.0 31693.0 0.0 28745.0 0.0 33308.0 0.0 31114.0 0.0 35034.0 0.0 12111.0 0.0 24885.0 0.0 Compaction work is markedly reduced despite much better THP rates. In the vanilla kernel, reclaim seems to have been driven primarily by watermark boosting that happens as a result of fallbacks. With those all but eliminated, watermarks average lower and kswapd does less work. The uptick in direct reclaim is because THP requests have to fend for themselves more often - which is intended policy right now. Aggregate reclaim activity is lowered significantly, though. This patch (of 10): The idea behind the cache is to save get_pageblock_migratetype() lookups during bulk freeing. A microbenchmark suggests this isn't helping, though. The pcp migratetype can get stale, which means that bulk freeing has an extra branch to check if the pageblock was isolated while on the pcp. While the variance overlaps, the cache write and the branch seem to make this a net negative. The following test allocates and frees batches of 10,000 pages (~3x the pcp high marks to trigger flushing): Before: 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% ) 19 context-switches # 4.341 /sec ( +- 3.24% ) 0 cpu-migrations # 0.000 /sec 17,440 page-faults # 3.984 K/sec ( +- 2.90% ) 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% ) 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% ) 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% ) 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% ) 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% ) After: 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% ) 22 context-switches # 5.160 /sec ( +- 3.23% ) 0 cpu-migrations # 0.000 /sec 17,443 page-faults # 4.091 K/sec ( +- 2.90% ) 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% ) 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% ) 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% ) 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% ) 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% ) A side effect is that this also ensures that pages whose pageblock gets stolen while on the pcplist end up on the right freelist and we don't perform potentially type-incompatible buddy merges (or skip merges when we shouldn't), which is likely beneficial to long-term fragmentation management, although the effects would be harder to measure. Settle for simpler and faster code as justification here. Link: https://lkml.kernel.org/r/20240320180429.678181-1-hannes@cmpxchg.org Link: https://lkml.kernel.org/r/20240320180429.678181-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: "Huang, Ying" <ying.huang@intel.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [ Context conflicts with commit 62b208c4859c and ae577de78c12. ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 66 +++++++++++-------------------------------------- 1 file changed, 14 insertions(+), 52 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 776fabffb96e..6fced070bd35 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -207,24 +207,6 @@ EXPORT_SYMBOL(node_states); gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK; -/* - * A cached value of the page's pageblock's migratetype, used when the page is - * put on a pcplist. Used to avoid the pageblock migratetype lookup when - * freeing from pcplists in most cases, at the cost of possibly becoming stale. - * Also the migratetype set in the page does not necessarily match the pcplist - * index, e.g. page might have MIGRATE_CMA set but be on a pcplist with any - * other index - this ensures that it will be put on the correct CMA freelist. - */ -static inline int get_pcppage_migratetype(struct page *page) -{ - return page->index; -} - -static inline void set_pcppage_migratetype(struct page *page, int migratetype) -{ - page->index = migratetype; -} - #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE unsigned int pageblock_order __read_mostly; #endif @@ -1201,7 +1183,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, { unsigned long flags; unsigned int order; - bool isolated_pageblocks; struct page *page; /* @@ -1214,7 +1195,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, pindex = pindex - 1; spin_lock_irqsave(&zone->lock, flags); - isolated_pageblocks = has_isolate_pageblock(zone); while (count > 0) { struct list_head *list; @@ -1230,23 +1210,19 @@ static void free_pcppages_bulk(struct zone *zone, int count, order = pindex_to_order(pindex); nr_pages = 1 << order; do { + unsigned long pfn; int mt; page = list_last_entry(list, struct page, pcp_list); - mt = get_pcppage_migratetype(page); + pfn = page_to_pfn(page); + mt = get_pfnblock_migratetype(page, pfn); /* must delete to avoid corrupting pcp list */ list_del(&page->pcp_list); count -= nr_pages; pcp->count -= nr_pages; - /* MIGRATE_ISOLATE page should not go to pcplists */ - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); - /* Pageblock could have been isolated meanwhile */ - if (unlikely(isolated_pageblocks)) - mt = get_pageblock_migratetype(page); - - __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE); + __free_one_page(page, pfn, zone, order, mt, FPI_NONE); trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); } @@ -1606,7 +1582,6 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, continue; del_page_from_free_list(page, zone, current_order); expand(zone, page, order, current_order, migratetype); - set_pcppage_migratetype(page, migratetype); trace_mm_page_alloc_zone_locked(page, order, migratetype, pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES); @@ -2177,7 +2152,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->pcp_list, list); - if (is_migrate_cma(get_pcppage_migratetype(page))) + if (is_migrate_cma(get_pageblock_migratetype(page))) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); } @@ -2377,19 +2352,6 @@ void drain_all_pages(struct zone *zone) __drain_all_pages(zone, false); } -static bool free_unref_page_prepare(struct page *page, unsigned long pfn, - unsigned int order) -{ - int migratetype; - - if (!free_pages_prepare(page, order)) - return false; - - migratetype = get_pfnblock_migratetype(page, pfn); - set_pcppage_migratetype(page, migratetype); - return true; -} - static int nr_pcp_free(struct per_cpu_pages *pcp, int batch, int high, bool free_high) { int min_nr_free, max_nr_free; @@ -2532,7 +2494,7 @@ void free_unref_page(struct page *page, unsigned int order) return; } - if (!free_unref_page_prepare(page, pfn, order)) + if (!free_pages_prepare(page, order)) return; /* @@ -2542,7 +2504,7 @@ void free_unref_page(struct page *page, unsigned int order) * get those areas back if necessary. Otherwise, we may have to free * excessively into the page allocator */ - migratetype = pcpmigratetype = get_pcppage_migratetype(page); + migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); @@ -2584,14 +2546,14 @@ void free_unref_folios(struct folio_batch *folios) continue; } - if (!free_unref_page_prepare(&folio->page, pfn, order)) + if (!free_pages_prepare(&folio->page, order)) continue; /* * Free isolated folios and orders not handled on the PCP * directly to the allocator, see comment in free_unref_page. */ - migratetype = get_pcppage_migratetype(&folio->page); + migratetype = get_pfnblock_migratetype(&folio->page, pfn); if (!pcp_allowed_order(order) || is_migrate_isolate(migratetype)) { free_one_page(folio_zone(folio), &folio->page, pfn, @@ -2608,10 +2570,11 @@ void free_unref_folios(struct folio_batch *folios) for (i = 0; i < folios->nr; i++) { struct folio *folio = folios->folios[i]; struct zone *zone = folio_zone(folio); + unsigned long pfn = folio_pfn(folio); unsigned int order = (unsigned long)folio->private; folio->private = NULL; - migratetype = get_pcppage_migratetype(&folio->page); + migratetype = get_pfnblock_migratetype(&folio->page, pfn); /* Different zone requires a different pcp lock */ if (zone != locked_zone) { @@ -2628,9 +2591,8 @@ void free_unref_folios(struct folio_batch *folios) pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); - free_one_page(zone, &folio->page, - folio_pfn(folio), order, - migratetype, FPI_NONE); + free_one_page(zone, &folio->page, pfn, + order, migratetype, FPI_NONE); locked_zone = NULL; continue; } @@ -2799,7 +2761,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, } } __mod_zone_freepage_state(zone, -(1 << order), - get_pcppage_migratetype(page)); + get_pageblock_migratetype(page)); spin_unlock_irqrestore(&zone->lock, flags); } while (check_new_pages(page, order)); -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit 9cbe97bad5cd75b5b493734bd2695febb8e95281 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Move direct freeing of isolated pages to the lock-breaking block in the second loop. This saves an unnecessary migratetype reassessment. Minor comment and local variable scoping cleanups. Link: https://lkml.kernel.org/r/20240320180429.678181-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [ Context conflict with commit ae577de78c12. ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6fced070bd35..4261ba077229 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2533,7 +2533,7 @@ void free_unref_folios(struct folio_batch *folios) unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp = NULL; struct zone *locked_zone = NULL; - int i, j, migratetype; + int i, j; /* Prepare folios for freeing */ for (i = 0, j = 0; i < folios->nr; i++) { @@ -2550,12 +2550,14 @@ void free_unref_folios(struct folio_batch *folios) continue; /* - * Free isolated folios and orders not handled on the PCP - * directly to the allocator, see comment in free_unref_page. + * Free orders not handled on the PCP directly to the + * allocator. */ - migratetype = get_pfnblock_migratetype(&folio->page, pfn); - if (!pcp_allowed_order(order) || - is_migrate_isolate(migratetype)) { + if (!pcp_allowed_order(order)) { + int migratetype; + + migratetype = get_pfnblock_migratetype(&folio->page, + pfn); free_one_page(folio_zone(folio), &folio->page, pfn, order, migratetype, FPI_NONE); continue; @@ -2572,15 +2574,29 @@ void free_unref_folios(struct folio_batch *folios) struct zone *zone = folio_zone(folio); unsigned long pfn = folio_pfn(folio); unsigned int order = (unsigned long)folio->private; + int migratetype; folio->private = NULL; migratetype = get_pfnblock_migratetype(&folio->page, pfn); /* Different zone requires a different pcp lock */ - if (zone != locked_zone) { + if (zone != locked_zone || + is_migrate_isolate(migratetype)) { if (pcp) { pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + locked_zone = NULL; + pcp = NULL; + } + + /* + * Free isolated pages directly to the + * allocator, see comment in free_unref_page. + */ + if (is_migrate_isolate(migratetype)) { + free_one_page(zone, &folio->page, pfn, + order, migratetype, FPI_NONE); + continue; } /* @@ -2593,7 +2609,6 @@ void free_unref_folios(struct folio_batch *folios) pcp_trylock_finish(UP_flags); free_one_page(zone, &folio->page, pfn, order, migratetype, FPI_NONE); - locked_zone = NULL; continue; } locked_zone = zone; -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit e6cf9e1c4cde8a53385423ecb8ca581097f42e02 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The buddy allocator coalesces compatible blocks during freeing, but it doesn't update the types of the subblocks to match. When an allocation later breaks the chunk down again, its pieces will be put on freelists of the wrong type. This encourages incompatible page mixing (ask for one type, get another), and thus long-term fragmentation. Update the subblocks when merging a larger chunk, such that a later expand() will maintain freelist type hygiene. Link: https://lkml.kernel.org/r/20240320180429.678181-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: "Huang, Ying" <ying.huang@intel.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4261ba077229..fa3d2c5a08e0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -779,10 +779,17 @@ static inline void __free_one_page(struct page *page, */ int buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn); - if (migratetype != buddy_mt - && (!migratetype_is_mergeable(migratetype) || - !migratetype_is_mergeable(buddy_mt))) - goto done_merging; + if (migratetype != buddy_mt) { + if (!migratetype_is_mergeable(migratetype) || + !migratetype_is_mergeable(buddy_mt)) + goto done_merging; + /* + * Match buddy type. This ensures that + * an expand() down the line puts the + * sub-blocks on the right freelists. + */ + set_pageblock_migratetype(buddy, migratetype); + } } /* -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit b54ccd3c6bacbc571f7e61797fb5ff9fe3861413 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When claiming a block during compaction isolation, move any remaining free pages to the correct freelists as well, instead of stranding them on the wrong list. Otherwise, this encourages incompatible page mixing down the line, and thus long-term fragmentation. Link: https://lkml.kernel.org/r/20240320180429.678181-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: "Huang, Ying" <ying.huang@intel.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fa3d2c5a08e0..c8047135965e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2696,9 +2696,12 @@ int __isolate_free_page(struct page *page, unsigned int order) * Only change normal pageblocks (i.e., they can merge * with others) */ - if (migratetype_is_mergeable(mt)) + if (migratetype_is_mergeable(mt)) { set_pageblock_migratetype(page, MIGRATE_MOVABLE); + move_freepages_block(zone, page, + MIGRATE_MOVABLE, NULL); + } } } -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit 2dd482ba627de15d67f0c0ed445133c8ae9b201b category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When a block is partially outside the zone of the cursor page, the function cuts the range to the pivot page instead of the zone start. This can leave large parts of the block behind, which encourages incompatible page mixing down the line (ask for one type, get another), and thus long-term fragmentation. This triggers reliably on the first block in the DMA zone, whose start_pfn is 1. The block is stolen, but everything before the pivot page (which was often hundreds of pages) is left on the old list. Link: https://lkml.kernel.org/r/20240320180429.678181-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c8047135965e..82181d2a34dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1676,9 +1676,15 @@ int move_freepages_block(struct zone *zone, struct page *page, start_pfn = pageblock_start_pfn(pfn); end_pfn = pageblock_end_pfn(pfn) - 1; - /* Do not cross zone boundaries */ + /* + * The caller only has the lock for @zone, don't touch ranges + * that straddle into other zones. While we could move part of + * the range that's inside the zone, this call is usually + * accompanied by other operations such as migratetype updates + * which also should be locked. + */ if (!zone_spans_pfn(zone, start_pfn)) - start_pfn = pfn; + return 0; if (!zone_spans_pfn(zone, end_pfn)) return 0; -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit c0cd6f557b9090525d288806cccbc73440ac235a category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Currently, page block type conversion during fallbacks, atomic reservations and isolation can strand various amounts of free pages on incorrect freelists. For example, fallback stealing moves free pages in the block to the new type's freelists, but then may not actually claim the block for that type if there aren't enough compatible pages already allocated. In all cases, free page moving might fail if the block straddles more than one zone, in which case no free pages are moved at all, but the block type is changed anyway. This is detrimental to type hygiene on the freelists. It encourages incompatible page mixing down the line (ask for one type, get another) and thus contributes to long-term fragmentation. Split the process into a proper transaction: check first if conversion will happen, then try to move the free pages, and only if that was successful convert the block to the new type. [baolin.wang@linux.alibaba.com: fix allocation failures with CONFIG_CMA] Link: https://lkml.kernel.org/r/a97697e0-45b0-4f71-b087-fdc7a1d43c0e@linux.alibaba... Link: https://lkml.kernel.org/r/20240320180429.678181-7-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: mm/page_alloc.c include/linux/page-isolation.h [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- include/linux/page-isolation.h | 3 +- mm/page_alloc.c | 174 ++++++++++++++++++++------------- mm/page_isolation.c | 22 +++-- 3 files changed, 121 insertions(+), 78 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 4ac34392823a..8550b3c91480 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -34,8 +34,7 @@ static inline bool is_migrate_isolate(int migratetype) #define REPORT_FAILURE 0x2 void set_pageblock_migratetype(struct page *page, int migratetype); -int move_freepages_block(struct zone *zone, struct page *page, - int migratetype, int *num_movable); +int move_freepages_block(struct zone *zone, struct page *page, int migratetype); int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, int migratetype, int flags, gfp_t gfp_flags); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 82181d2a34dc..1ded3dd753e9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1627,9 +1627,8 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, * Note that start_page and end_pages are not aligned on a pageblock * boundary. If alignment is required, use move_freepages_block() */ -static int move_freepages(struct zone *zone, - unsigned long start_pfn, unsigned long end_pfn, - int migratetype, int *num_movable) +static int move_freepages(struct zone *zone, unsigned long start_pfn, + unsigned long end_pfn, int migratetype) { struct page *page; unsigned long pfn; @@ -1639,14 +1638,6 @@ static int move_freepages(struct zone *zone, for (pfn = start_pfn; pfn <= end_pfn;) { page = pfn_to_page(pfn); if (!PageBuddy(page)) { - /* - * We assume that pages that could be isolated for - * migration are movable. But we don't actually try - * isolating, as that would be expensive. - */ - if (num_movable && - (PageLRU(page) || __PageMovable(page))) - (*num_movable)++; pfn++; continue; } @@ -1664,17 +1655,16 @@ static int move_freepages(struct zone *zone, return pages_moved; } -int move_freepages_block(struct zone *zone, struct page *page, - int migratetype, int *num_movable) +static bool prep_move_freepages_block(struct zone *zone, struct page *page, + unsigned long *start_pfn, + unsigned long *end_pfn, + int *num_free, int *num_movable) { - unsigned long start_pfn, end_pfn, pfn; - - if (num_movable) - *num_movable = 0; + unsigned long pfn, start, end; pfn = page_to_pfn(page); - start_pfn = pageblock_start_pfn(pfn); - end_pfn = pageblock_end_pfn(pfn) - 1; + start = pageblock_start_pfn(pfn); + end = pageblock_end_pfn(pfn) - 1; /* * The caller only has the lock for @zone, don't touch ranges @@ -1683,13 +1673,50 @@ int move_freepages_block(struct zone *zone, struct page *page, * accompanied by other operations such as migratetype updates * which also should be locked. */ - if (!zone_spans_pfn(zone, start_pfn)) - return 0; - if (!zone_spans_pfn(zone, end_pfn)) - return 0; + if (!zone_spans_pfn(zone, start)) + return false; + if (!zone_spans_pfn(zone, end)) + return false; + + *start_pfn = start; + *end_pfn = end; + + if (num_free) { + *num_free = 0; + *num_movable = 0; + for (pfn = start; pfn <= end;) { + page = pfn_to_page(pfn); + if (PageBuddy(page)) { + int nr = 1 << buddy_order(page); - return move_freepages(zone, start_pfn, end_pfn, migratetype, - num_movable); + *num_free += nr; + pfn += nr; + continue; + } + /* + * We assume that pages that could be isolated for + * migration are movable. But we don't actually try + * isolating, as that would be expensive. + */ + if (PageLRU(page) || __PageMovable(page)) + (*num_movable)++; + pfn++; + } + } + + return true; +} + +int move_freepages_block(struct zone *zone, struct page *page, + int migratetype) +{ + unsigned long start_pfn, end_pfn; + + if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, + NULL, NULL)) + return -1; + + return move_freepages(zone, start_pfn, end_pfn, migratetype); } static void change_pageblock_range(struct page *pageblock_page, @@ -1774,33 +1801,37 @@ static inline bool boost_watermark(struct zone *zone) } /* - * This function implements actual steal behaviour. If order is large enough, - * we can steal whole pageblock. If not, we first move freepages in this - * pageblock to our migratetype and determine how many already-allocated pages - * are there in the pageblock with a compatible migratetype. If at least half - * of pages are free or compatible, we can change migratetype of the pageblock - * itself, so pages freed in the future will be put on the correct free list. + * This function implements actual steal behaviour. If order is large enough, we + * can claim the whole pageblock for the requested migratetype. If not, we check + * the pageblock for constituent pages; if at least half of the pages are free + * or compatible, we can still claim the whole block, so pages freed in the + * future will be put on the correct free list. Otherwise, we isolate exactly + * the order we need from the fallback block and leave its migratetype alone. */ -static void steal_suitable_fallback(struct zone *zone, struct page *page, - unsigned int alloc_flags, int start_type, bool whole_block) +static struct page * +steal_suitable_fallback(struct zone *zone, struct page *page, + int current_order, int order, int start_type, + unsigned int alloc_flags, bool whole_block) { - unsigned int current_order = buddy_order(page); int free_pages, movable_pages, alike_pages; - int old_block_type; + unsigned long start_pfn, end_pfn; + int block_type; - old_block_type = get_pageblock_migratetype(page); + block_type = get_pageblock_migratetype(page); /* * This can happen due to races and we want to prevent broken * highatomic accounting. */ - if (is_migrate_highatomic(old_block_type)) + if (is_migrate_highatomic(block_type)) goto single_page; /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { + del_page_from_free_list(page, zone, current_order); change_pageblock_range(page, current_order, start_type); - goto single_page; + expand(zone, page, order, current_order, start_type); + return page; } /* @@ -1815,10 +1846,9 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, if (!whole_block) goto single_page; - free_pages = move_freepages_block(zone, page, start_type, - &movable_pages); /* moving whole block can fail due to zone boundary conditions */ - if (!free_pages) + if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, + &free_pages, &movable_pages)) goto single_page; /* @@ -1836,7 +1866,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, * vice versa, be conservative since we can't distinguish the * exact migratetype of non-movable pages. */ - if (old_block_type == MIGRATE_MOVABLE) + if (block_type == MIGRATE_MOVABLE) alike_pages = pageblock_nr_pages - (free_pages + movable_pages); else @@ -1847,13 +1877,16 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, * compatible migratability as our allocation, claim the whole block. */ if (free_pages + alike_pages >= (1 << (pageblock_order-1)) || - page_group_by_mobility_disabled) + page_group_by_mobility_disabled) { + move_freepages(zone, start_pfn, end_pfn, start_type); set_pageblock_migratetype(page, start_type); - - return; + return __rmqueue_smallest(zone, order, start_type); + } single_page: - move_to_free_list(page, zone, current_order, start_type); + del_page_from_free_list(page, zone, current_order); + expand(zone, page, order, current_order, block_type); + return page; } /* @@ -1921,9 +1954,10 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) mt = get_pageblock_migratetype(page); /* Only reserve normal pageblocks (i.e., they can merge with others) */ if (migratetype_is_mergeable(mt)) { - zone->nr_reserved_highatomic += pageblock_nr_pages; - set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); - move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL); + if (move_freepages_block(zone, page, MIGRATE_HIGHATOMIC) != -1) { + set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); + zone->nr_reserved_highatomic += pageblock_nr_pages; + } } out_unlock: @@ -1948,7 +1982,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, struct zone *zone; struct page *page; int order; - bool ret; + int ret; for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->highest_zoneidx, ac->nodemask) { @@ -1997,10 +2031,14 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * of pageblocks that cannot be completely freed * may increase. */ + ret = move_freepages_block(zone, page, ac->migratetype); + /* + * Reserving this block already succeeded, so this should + * not fail on zone boundaries. + */ + WARN_ON_ONCE(ret == -1); set_pageblock_migratetype(page, ac->migratetype); - ret = move_freepages_block(zone, page, ac->migratetype, - NULL); - if (ret) { + if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); return ret; } @@ -2021,7 +2059,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * deviation from the rest of this file, to make the for loop * condition simpler. */ -static __always_inline bool +static __always_inline struct page * __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, unsigned int alloc_flags) { @@ -2068,7 +2106,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, goto do_steal; } - return false; + return NULL; find_smallest: for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { @@ -2088,14 +2126,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, do_steal: page = get_page_from_free_area(area, fallback_mt); - steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, - can_steal); + /* take off list, maybe claim block, expand remainder */ + page = steal_suitable_fallback(zone, page, current_order, order, + start_migratetype, alloc_flags, can_steal); trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, fallback_mt); - return true; - + return page; } /* @@ -2122,15 +2160,15 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, return page; } } -retry: + page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page)) { if (alloc_flags & ALLOC_CMA) page = __rmqueue_cma_fallback(zone, order); - if (!page && __rmqueue_fallback(zone, order, migratetype, - alloc_flags)) - goto retry; + if (!page) + page = __rmqueue_fallback(zone, order, migratetype, + alloc_flags); } return page; } @@ -2702,12 +2740,10 @@ int __isolate_free_page(struct page *page, unsigned int order) * Only change normal pageblocks (i.e., they can merge * with others) */ - if (migratetype_is_mergeable(mt)) { - set_pageblock_migratetype(page, - MIGRATE_MOVABLE); - move_freepages_block(zone, page, - MIGRATE_MOVABLE, NULL); - } + if (migratetype_is_mergeable(mt) && + move_freepages_block(zone, page, + MIGRATE_MOVABLE) != -1) + set_pageblock_migratetype(page, MIGRATE_MOVABLE); } } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c634db349783..bdce807d6c2e 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -179,15 +179,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ unmovable = has_unmovable_pages(check_unmovable_start, check_unmovable_end, migratetype, isol_flags); if (!unmovable) { - unsigned long nr_pages; + int nr_pages; int mt = get_pageblock_migratetype(page); + nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE); + /* Block spans zone boundaries? */ + if (nr_pages == -1) { + spin_unlock_irqrestore(&zone->lock, flags); + return -EBUSY; + } + __mod_zone_freepage_state(zone, -nr_pages, mt); set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_isolate_pageblock++; - nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, - NULL); - - __mod_zone_freepage_state(zone, -nr_pages, mt); spin_unlock_irqrestore(&zone->lock, flags); return 0; } @@ -207,7 +210,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ static void unset_migratetype_isolate(struct page *page, int migratetype) { struct zone *zone; - unsigned long flags, nr_pages; + unsigned long flags; bool isolated_page = false; unsigned int order; struct page *buddy; @@ -253,7 +256,12 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) * allocation. */ if (!isolated_page) { - nr_pages = move_freepages_block(zone, page, migratetype, NULL); + int nr_pages = move_freepages_block(zone, page, migratetype); + /* + * Isolating this block already succeeded, so this + * should not fail on zone boundaries. + */ + WARN_ON_ONCE(nr_pages == -1); __mod_zone_freepage_state(zone, nr_pages, migratetype); } set_pageblock_migratetype(page, migratetype); -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit 55612e80e722ac554cc5e80df05555b4f8d40c37 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There are three freeing paths that read the page's migratetype optimistically before grabbing the zone lock. When this races with block stealing, those pages go on the wrong freelist. The paths in question are: - when freeing >costly orders that aren't THP - when freeing pages to the buddy upon pcp lock contention - when freeing pages that are isolated - when freeing pages initially during boot - when freeing the remainder in alloc_pages_exact() - when "accepting" unaccepted VM host memory before first use - when freeing pages during unpoisoning None of these are so hot that they would need this optimization at the cost of hampering defrag efforts. Especially when contrasted with the fact that the most common buddy freeing path - free_pcppages_bulk - is checking the migratetype under the zone->lock just fine. In addition, isolated pages need to look up the migratetype under the lock anyway, which adds branches to the locked section, and results in a double lookup when the pages are in fact isolated. Move the lookups into the lock. Link: https://lkml.kernel.org/r/20240320180429.678181-8-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [ Context conflict with commit 2ae116c3257d. ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 52 ++++++++++++++++++------------------------------- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1ded3dd753e9..65db6e6a280b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1237,18 +1237,15 @@ static void free_pcppages_bulk(struct zone *zone, int count, spin_unlock_irqrestore(&zone->lock, flags); } -static void free_one_page(struct zone *zone, - struct page *page, unsigned long pfn, - unsigned int order, - int migratetype, fpi_t fpi_flags) +static void free_one_page(struct zone *zone, struct page *page, + unsigned long pfn, unsigned int order, + fpi_t fpi_flags) { unsigned long flags; + int migratetype; spin_lock_irqsave(&zone->lock, flags); - if (unlikely(has_isolate_pageblock(zone) || - is_migrate_isolate(migratetype))) { - migratetype = get_pfnblock_migratetype(page, pfn); - } + migratetype = get_pfnblock_migratetype(page, pfn); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); } @@ -1256,21 +1253,13 @@ static void free_one_page(struct zone *zone, static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags) { - int migratetype; unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); if (!free_pages_prepare(page, order)) return; - /* - * Calling get_pfnblock_migratetype() without spin_lock_irqsave() here - * is used to avoid calling get_pfnblock_migratetype() under the lock. - * This will reduce the lock holding time. - */ - migratetype = get_pfnblock_migratetype(page, pfn); - - free_one_page(zone, page, pfn, order, migratetype, fpi_flags); + free_one_page(zone, page, pfn, order, fpi_flags); __count_vm_events(PGFREE, 1 << order); } @@ -2533,7 +2522,7 @@ void free_unref_page(struct page *page, unsigned int order) struct per_cpu_pages *pcp; struct zone *zone; unsigned long pfn = page_to_pfn(page); - int migratetype, pcpmigratetype; + int migratetype; if (page_from_dynamic_pool(page)) { dynamic_pool_free_page(page); @@ -2555,23 +2544,23 @@ void free_unref_page(struct page *page, unsigned int order) * get those areas back if necessary. Otherwise, we may have to free * excessively into the page allocator */ - migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn); + migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, FPI_NONE); return; } - pcpmigratetype = MIGRATE_MOVABLE; + migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, pcpmigratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, migratetype, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE); } pcp_trylock_finish(UP_flags); } @@ -2605,12 +2594,8 @@ void free_unref_folios(struct folio_batch *folios) * allocator. */ if (!pcp_allowed_order(order)) { - int migratetype; - - migratetype = get_pfnblock_migratetype(&folio->page, - pfn); - free_one_page(folio_zone(folio), &folio->page, pfn, - order, migratetype, FPI_NONE); + free_one_page(folio_zone(folio), &folio->page, + pfn, order, FPI_NONE); continue; } folio->private = (void *)(unsigned long)order; @@ -2646,7 +2631,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (is_migrate_isolate(migratetype)) { free_one_page(zone, &folio->page, pfn, - order, migratetype, FPI_NONE); + order, FPI_NONE); continue; } @@ -2659,7 +2644,7 @@ void free_unref_folios(struct folio_batch *folios) if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); free_one_page(zone, &folio->page, pfn, - order, migratetype, FPI_NONE); + order, FPI_NONE); continue; } locked_zone = zone; @@ -7054,13 +7039,14 @@ bool take_page_off_buddy(struct page *page) bool put_page_back_buddy(struct page *page) { struct zone *zone = page_zone(page); - unsigned long pfn = page_to_pfn(page); unsigned long flags; - int migratetype = get_pfnblock_migratetype(page, pfn); bool ret = false; spin_lock_irqsave(&zone->lock, flags); if (put_page_testzero(page)) { + unsigned long pfn = page_to_pfn(page); + int migratetype = get_pfnblock_migratetype(page, pfn); + ClearPageHWPoisonTakenOff(page); __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); if (TestClearPageHWPoison(page)) { -- 2.43.0

From: Zi Yan <ziy@nvidia.com> mainline inclusion from mainline-v6.10-rc1 commit f37c0f6876a8eabe1477c87860460bc181f6cdbb category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This avoids changing migratetype after move_freepages() or move_freepages_block(), which is error prone. It also prepares for upcoming changes to fix move_freepages() not moving free pages partially in the range. Link: https://lkml.kernel.org/r/20240320180429.678181-9-hannes@cmpxchg.org Signed-off-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: include/linux/page-isolation.h [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 27 +++++++++++++-------------- mm/page_isolation.c | 7 +++---- 2 files changed, 16 insertions(+), 18 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 65db6e6a280b..68257e88d5c3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1612,9 +1612,8 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, #endif /* - * Move the free pages in a range to the freelist tail of the requested type. - * Note that start_page and end_pages are not aligned on a pageblock - * boundary. If alignment is required, use move_freepages_block() + * Change the type of a block and move all its free pages to that + * type's freelist. */ static int move_freepages(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn, int migratetype) @@ -1624,6 +1623,9 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn, unsigned int order; int pages_moved = 0; + VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1)); + VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn); + for (pfn = start_pfn; pfn <= end_pfn;) { page = pfn_to_page(pfn); if (!PageBuddy(page)) { @@ -1641,6 +1643,8 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn, pages_moved += 1 << order; } + set_pageblock_migratetype(pfn_to_page(start_pfn), migratetype); + return pages_moved; } @@ -1868,7 +1872,6 @@ steal_suitable_fallback(struct zone *zone, struct page *page, if (free_pages + alike_pages >= (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) { move_freepages(zone, start_pfn, end_pfn, start_type); - set_pageblock_migratetype(page, start_type); return __rmqueue_smallest(zone, order, start_type); } @@ -1942,12 +1945,10 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) /* Yoink! */ mt = get_pageblock_migratetype(page); /* Only reserve normal pageblocks (i.e., they can merge with others) */ - if (migratetype_is_mergeable(mt)) { - if (move_freepages_block(zone, page, MIGRATE_HIGHATOMIC) != -1) { - set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); + if (migratetype_is_mergeable(mt)) + if (move_freepages_block(zone, page, + MIGRATE_HIGHATOMIC) != -1) zone->nr_reserved_highatomic += pageblock_nr_pages; - } - } out_unlock: spin_unlock_irqrestore(&zone->lock, flags); @@ -2026,7 +2027,6 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * not fail on zone boundaries. */ WARN_ON_ONCE(ret == -1); - set_pageblock_migratetype(page, ac->migratetype); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); return ret; @@ -2725,10 +2725,9 @@ int __isolate_free_page(struct page *page, unsigned int order) * Only change normal pageblocks (i.e., they can merge * with others) */ - if (migratetype_is_mergeable(mt) && - move_freepages_block(zone, page, - MIGRATE_MOVABLE) != -1) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); + if (migratetype_is_mergeable(mt)) + move_freepages_block(zone, page, + MIGRATE_MOVABLE); } } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index bdce807d6c2e..478389b74063 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -189,7 +189,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ return -EBUSY; } __mod_zone_freepage_state(zone, -nr_pages, mt); - set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_isolate_pageblock++; spin_unlock_irqrestore(&zone->lock, flags); return 0; @@ -263,10 +262,10 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) */ WARN_ON_ONCE(nr_pages == -1); __mod_zone_freepage_state(zone, nr_pages, migratetype); - } - set_pageblock_migratetype(page, migratetype); - if (isolated_page) + } else { + set_pageblock_migratetype(page, migratetype); __putback_isolated_page(page, order, migratetype); + } zone->nr_isolate_pageblock--; out: spin_unlock_irqrestore(&zone->lock, flags); -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit fd919a85cd55be5d00a6a7372071f44c8eafb825 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Page isolation currently sets MIGRATE_ISOLATE on a block, then drops zone->lock and scans the block for straddling buddies to split up. Because this happens non-atomically wrt the page allocator, it's possible for allocations to get a buddy whose first block is a regular pcp migratetype but whose tail is isolated. This means that in certain cases memory can still be allocated after isolation. It will also trigger the freelist type hygiene warnings in subsequent patches. start_isolate_page_range() isolate_single_pageblock() set_migratetype_isolate(tail) lock zone->lock move_freepages_block(tail) // nop set_pageblock_migratetype(tail) unlock zone->lock __rmqueue_smallest() del_page_from_freelist(head) expand(head, head_mt) WARN(head_mt != tail_mt) start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES) for (pfn = start_pfn, pfn < end_pfn) if (PageBuddy()) split_free_page(head) Introduce a variant of move_freepages_block() provided by the allocator specifically for page isolation; it moves free pages, converts the block, and handles the splitting of straddling buddies while holding zone->lock. The allocator knows that pageblocks and buddies are always naturally aligned, which means that buddies can only straddle blocks if they're actually >pageblock_order. This means the search-and-split part can be simplified compared to what page isolation used to do. Also tighten up the page isolation code around the expectations of which pages can be large, and how they are freed. Based on extensive discussions with and invaluable input from Zi Yan. [hannes@cmpxchg.org: work around older gcc warning] Link: https://lkml.kernel.org/r/20240321142426.GB777580@cmpxchg.org Link: https://lkml.kernel.org/r/20240320180429.678181-10-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/page-isolation.h mm/internal.h mm/page_alloc.c mm/page_isolation.c [ Context conflict due to miss MAX_PAGE_ORDER. ] Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- include/linux/page-isolation.h | 4 +- mm/internal.h | 4 - mm/page_alloc.c | 204 +++++++++++++++++++-------------- mm/page_isolation.c | 106 ++++++----------- 4 files changed, 155 insertions(+), 163 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 8550b3c91480..c16db0067090 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -34,7 +34,9 @@ static inline bool is_migrate_isolate(int migratetype) #define REPORT_FAILURE 0x2 void set_pageblock_migratetype(struct page *page, int migratetype); -int move_freepages_block(struct zone *zone, struct page *page, int migratetype); + +bool move_freepages_block_isolate(struct zone *zone, struct page *page, + int migratetype); int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, int migratetype, int flags, gfp_t gfp_flags); diff --git a/mm/internal.h b/mm/internal.h index ba2346a41447..785c6aff532c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -714,10 +714,6 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align, void memmap_init_range(unsigned long, int, unsigned long, unsigned long, unsigned long, enum meminit_context, struct vmem_altmap *, int); - -int split_free_page(struct page *free_page, - unsigned int order, unsigned long split_pfn_offset); - #if defined CONFIG_COMPACTION || defined CONFIG_CMA #define MAX_PAGE_ORDER MAX_ORDER diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 68257e88d5c3..ad4f1c4a43bd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -826,64 +826,6 @@ static inline void __free_one_page(struct page *page, page_reporting_notify_free(order); } -/** - * split_free_page() -- split a free page at split_pfn_offset - * @free_page: the original free page - * @order: the order of the page - * @split_pfn_offset: split offset within the page - * - * Return -ENOENT if the free page is changed, otherwise 0 - * - * It is used when the free page crosses two pageblocks with different migratetypes - * at split_pfn_offset within the page. The split free page will be put into - * separate migratetype lists afterwards. Otherwise, the function achieves - * nothing. - */ -int split_free_page(struct page *free_page, - unsigned int order, unsigned long split_pfn_offset) -{ - struct zone *zone = page_zone(free_page); - unsigned long free_page_pfn = page_to_pfn(free_page); - unsigned long pfn; - unsigned long flags; - int free_page_order; - int mt; - int ret = 0; - - if (split_pfn_offset == 0) - return ret; - - spin_lock_irqsave(&zone->lock, flags); - - if (!PageBuddy(free_page) || buddy_order(free_page) != order) { - ret = -ENOENT; - goto out; - } - - mt = get_pfnblock_migratetype(free_page, free_page_pfn); - if (likely(!is_migrate_isolate(mt))) - __mod_zone_freepage_state(zone, -(1UL << order), mt); - - del_page_from_free_list(free_page, zone, order); - for (pfn = free_page_pfn; - pfn < free_page_pfn + (1UL << order);) { - int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); - - free_page_order = min_t(unsigned int, - pfn ? __ffs(pfn) : order, - __fls(split_pfn_offset)); - __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, - mt, FPI_NONE); - pfn += 1UL << free_page_order; - split_pfn_offset -= (1UL << free_page_order); - /* we have done the first part, now switch to second part */ - if (split_pfn_offset == 0) - split_pfn_offset = (1UL << order) - (pfn - free_page_pfn); - } -out: - spin_unlock_irqrestore(&zone->lock, flags); - return ret; -} /* * A bad page could be due to a number of fields. Instead of multiple branches, * try and check multiple fields with one check. The caller must do a detailed @@ -1700,8 +1642,8 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, return true; } -int move_freepages_block(struct zone *zone, struct page *page, - int migratetype) +static int move_freepages_block(struct zone *zone, struct page *page, + int migratetype) { unsigned long start_pfn, end_pfn; @@ -1712,6 +1654,123 @@ int move_freepages_block(struct zone *zone, struct page *page, return move_freepages(zone, start_pfn, end_pfn, migratetype); } +#ifdef CONFIG_MEMORY_ISOLATION +/* Look for a buddy that straddles start_pfn */ +static unsigned long find_large_buddy(unsigned long start_pfn) +{ + int order = 0; + struct page *page; + unsigned long pfn = start_pfn; + + while (!PageBuddy(page = pfn_to_page(pfn))) { + /* Nothing found */ + if (++order > MAX_PAGE_ORDER) + return start_pfn; + pfn &= ~0UL << order; + } + + /* + * Found a preceding buddy, but does it straddle? + */ + if (pfn + (1 << buddy_order(page)) > start_pfn) + return pfn; + + /* Nothing found */ + return start_pfn; +} + +/* Split a multi-block free page into its individual pageblocks */ +static void split_large_buddy(struct zone *zone, struct page *page, + unsigned long pfn, int order) +{ + unsigned long end_pfn = pfn + (1 << order); + + VM_WARN_ON_ONCE(order <= pageblock_order); + VM_WARN_ON_ONCE(pfn & (pageblock_nr_pages - 1)); + + /* Caller removed page from freelist, buddy info cleared! */ + VM_WARN_ON_ONCE(PageBuddy(page)); + + while (pfn != end_pfn) { + int mt = get_pfnblock_migratetype(page, pfn); + + __free_one_page(page, pfn, zone, pageblock_order, mt, FPI_NONE); + pfn += pageblock_nr_pages; + page = pfn_to_page(pfn); + } +} + +/** + * move_freepages_block_isolate - move free pages in block for page isolation + * @zone: the zone + * @page: the pageblock page + * @migratetype: migratetype to set on the pageblock + * + * This is similar to move_freepages_block(), but handles the special + * case encountered in page isolation, where the block of interest + * might be part of a larger buddy spanning multiple pageblocks. + * + * Unlike the regular page allocator path, which moves pages while + * stealing buddies off the freelist, page isolation is interested in + * arbitrary pfn ranges that may have overlapping buddies on both ends. + * + * This function handles that. Straddling buddies are split into + * individual pageblocks. Only the block of interest is moved. + * + * Returns %true if pages could be moved, %false otherwise. + */ +bool move_freepages_block_isolate(struct zone *zone, struct page *page, + int migratetype) +{ + unsigned long start_pfn, end_pfn, pfn; + int nr_moved, mt; + + if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, + NULL, NULL)) + return false; + + /* No splits needed if buddies can't span multiple blocks */ + if (pageblock_order == MAX_PAGE_ORDER) + goto move; + + /* We're a tail block in a larger buddy */ + pfn = find_large_buddy(start_pfn); + if (pfn != start_pfn) { + struct page *buddy = pfn_to_page(pfn); + int order = buddy_order(buddy); + int mt = get_pfnblock_migratetype(buddy, pfn); + + if (!is_migrate_isolate(mt)) + __mod_zone_freepage_state(zone, -(1UL << order), mt); + del_page_from_free_list(buddy, zone, order); + set_pageblock_migratetype(page, migratetype); + split_large_buddy(zone, buddy, pfn, order); + return true; + } + + /* We're the starting block of a larger buddy */ + if (PageBuddy(page) && buddy_order(page) > pageblock_order) { + int mt = get_pfnblock_migratetype(page, pfn); + int order = buddy_order(page); + + if (!is_migrate_isolate(mt)) + __mod_zone_freepage_state(zone, -(1UL << order), mt); + del_page_from_free_list(page, zone, order); + set_pageblock_migratetype(page, migratetype); + split_large_buddy(zone, page, pfn, order); + return true; + } +move: + mt = get_pfnblock_migratetype(page, start_pfn); + nr_moved = move_freepages(zone, start_pfn, end_pfn, migratetype); + if (!is_migrate_isolate(mt)) + __mod_zone_freepage_state(zone, -nr_moved, mt); + else if (!is_migrate_isolate(migratetype)) + __mod_zone_freepage_state(zone, nr_moved, migratetype); + return true; +} +#endif /* CONFIG_MEMORY_ISOLATION */ + static void change_pageblock_range(struct page *pageblock_page, int start_order, int migratetype) { @@ -6597,7 +6656,6 @@ int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask) { unsigned long outer_start, outer_end; - int order; int ret = 0; struct compact_control cc = { @@ -6680,29 +6738,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, * We don't have to hold zone->lock here because the pages are * isolated thus they won't get removed from buddy. */ - - order = 0; - outer_start = start; - while (!PageBuddy(pfn_to_page(outer_start))) { - if (++order > MAX_ORDER) { - outer_start = start; - break; - } - outer_start &= ~0UL << order; - } - - if (outer_start != start) { - order = buddy_order(pfn_to_page(outer_start)); - - /* - * outer_start page could be small order buddy page and - * it doesn't include start page. Adjust outer_start - * in this case to report failed page properly - * on tracepoint in test_pages_isolated() - */ - if (outer_start + (1UL << order) <= start) - outer_start = start; - } + outer_start = find_large_buddy(start); /* Make sure the range is really isolated. */ if (test_pages_isolated(outer_start, end, 0)) { diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 478389b74063..fa446a57b98d 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -179,16 +179,10 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ unmovable = has_unmovable_pages(check_unmovable_start, check_unmovable_end, migratetype, isol_flags); if (!unmovable) { - int nr_pages; - int mt = get_pageblock_migratetype(page); - - nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE); - /* Block spans zone boundaries? */ - if (nr_pages == -1) { + if (!move_freepages_block_isolate(zone, page, MIGRATE_ISOLATE)) { spin_unlock_irqrestore(&zone->lock, flags); return -EBUSY; } - __mod_zone_freepage_state(zone, -nr_pages, mt); zone->nr_isolate_pageblock++; spin_unlock_irqrestore(&zone->lock, flags); return 0; @@ -255,13 +249,11 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) * allocation. */ if (!isolated_page) { - int nr_pages = move_freepages_block(zone, page, migratetype); /* * Isolating this block already succeeded, so this * should not fail on zone boundaries. */ - WARN_ON_ONCE(nr_pages == -1); - __mod_zone_freepage_state(zone, nr_pages, migratetype); + WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype)); } else { set_pageblock_migratetype(page, migratetype); __putback_isolated_page(page, order, migratetype); @@ -377,26 +369,29 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, VM_BUG_ON(!page); pfn = page_to_pfn(page); - /* - * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any - * free pages in [start_pfn, boundary_pfn), its head page will - * always be in the range. - */ + if (PageBuddy(page)) { int order = buddy_order(page); - if (pfn + (1UL << order) > boundary_pfn) { - /* free page changed before split, check it again */ - if (split_free_page(page, order, boundary_pfn - pfn)) - continue; - } + /* move_freepages_block_isolate() handled this */ + VM_WARN_ON_ONCE(pfn + (1 << order) > boundary_pfn); pfn += 1UL << order; continue; } + /* - * migrate compound pages then let the free page handling code - * above do the rest. If migration is not possible, just fail. + * If a compound page is straddling our block, attempt + * to migrate it out of the way. + * + * We don't have to worry about this creating a large + * free page that straddles into our block: gigantic + * pages are freed as order-0 chunks, and LRU pages + * (currently) do not exceed pageblock_order. + * + * The block of interest has already been marked + * MIGRATE_ISOLATE above, so when migration is done it + * will free its pages onto the correct freelists. */ if (PageCompound(page)) { struct page *head = compound_head(page); @@ -407,16 +402,10 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, pfn = head_pfn + nr_pages; continue; } + #if defined CONFIG_COMPACTION || defined CONFIG_CMA - /* - * hugetlb, lru compound (THP), and movable compound pages - * can be migrated. Otherwise, fail the isolation. - */ - if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) { - int order; - unsigned long outer_pfn; + if (PageHuge(page)) { int page_mt = get_pageblock_migratetype(page); - bool isolate_page = !is_migrate_isolate_page(page); struct compact_control cc = { .nr_migratepages = 0, .order = -1, @@ -429,56 +418,25 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, }; INIT_LIST_HEAD(&cc.migratepages); - /* - * XXX: mark the page as MIGRATE_ISOLATE so that - * no one else can grab the freed page after migration. - * Ideally, the page should be freed as two separate - * pages to be added into separate migratetype free - * lists. - */ - if (isolate_page) { - ret = set_migratetype_isolate(page, page_mt, - flags, head_pfn, head_pfn + nr_pages); - if (ret) - goto failed; - } - ret = __alloc_contig_migrate_range(&cc, head_pfn, head_pfn + nr_pages, page_mt); - - /* - * restore the page's migratetype so that it can - * be split into separate migratetype free lists - * later. - */ - if (isolate_page) - unset_migratetype_isolate(page, page_mt); - if (ret) goto failed; - /* - * reset pfn to the head of the free page, so - * that the free page handling code above can split - * the free page to the right migratetype list. - * - * head_pfn is not used here as a hugetlb page order - * can be bigger than MAX_ORDER, but after it is - * freed, the free page order is not. Use pfn within - * the range to find the head of the free page. - */ - order = 0; - outer_pfn = pfn; - while (!PageBuddy(pfn_to_page(outer_pfn))) { - /* stop if we cannot find the free page */ - if (++order > MAX_ORDER) - goto failed; - outer_pfn &= ~0UL << order; - } - pfn = outer_pfn; + pfn = head_pfn + nr_pages; continue; - } else + } + + /* + * These pages are movable too, but they're + * not expected to exceed pageblock_order. + * + * Let us know when they do, so we can add + * proper free and split handling for them. + */ + VM_WARN_ON_ONCE_PAGE(PageLRU(page), page); + VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page); #endif - goto failed; + goto failed; } pfn++; -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit e0932b6c1f942fa747258e152cdce0d0b2b5be5c category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Free page accounting currently happens a bit too high up the call stack, where it has to deal with guard pages, compaction capturing, block stealing and even page isolation. This is subtle and fragile, and makes it difficult to hack on the code. Now that type violations on the freelists have been fixed, push the accounting down to where pages enter and leave the freelist. [hannes@cmpxchg.org: undo unrelated drive-by line wrap] Link: https://lkml.kernel.org/r/20240327185736.GA7597@cmpxchg.org [hannes@cmpxchg.org: remove unused page parameter from account_freepages()] Link: https://lkml.kernel.org/r/20240327185831.GB7597@cmpxchg.org [baolin.wang@linux.alibaba.com: fix free page accounting] Link: https://lkml.kernel.org/r/a2a48baca69f103aa431fd201f8a06e3b95e203d.171264844... [andriy.shevchenko@linux.intel.com: avoid defining unused function] Link: https://lkml.kernel.org/r/20240423161506.2637177-1-andriy.shevchenko@linux.i... Link: https://lkml.kernel.org/r/20240320180429.678181-11-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [fix compile error due to add_to_free_list isn't declared for expand.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- include/linux/mm.h | 18 ++-- include/linux/vmstat.h | 8 -- mm/debug_page_alloc.c | 12 +-- mm/internal.h | 5 -- mm/page_alloc.c | 194 +++++++++++++++++++++++------------------ 5 files changed, 121 insertions(+), 116 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f36bf9ee02f..85f9162436ff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3820,24 +3820,22 @@ static inline bool page_is_guard(struct page *page) return PageGuard(page); } -bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order, - int migratetype); +bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order); static inline bool set_page_guard(struct zone *zone, struct page *page, - unsigned int order, int migratetype) + unsigned int order) { if (!debug_guardpage_enabled()) return false; - return __set_page_guard(zone, page, order, migratetype); + return __set_page_guard(zone, page, order); } -void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order, - int migratetype); +void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order); static inline void clear_page_guard(struct zone *zone, struct page *page, - unsigned int order, int migratetype) + unsigned int order) { if (!debug_guardpage_enabled()) return; - __clear_page_guard(zone, page, order, migratetype); + __clear_page_guard(zone, page, order); } #else /* CONFIG_DEBUG_PAGEALLOC */ @@ -3847,9 +3845,9 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; } static inline bool debug_guardpage_enabled(void) { return false; } static inline bool page_is_guard(struct page *page) { return false; } static inline bool set_page_guard(struct zone *zone, struct page *page, - unsigned int order, int migratetype) { return false; } + unsigned int order) { return false; } static inline void clear_page_guard(struct zone *zone, struct page *page, - unsigned int order, int migratetype) {} + unsigned int order) {} #endif /* CONFIG_DEBUG_PAGEALLOC */ #ifdef __HAVE_ARCH_GATE_AREA diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index d783d5a27352..72ecd46fd0c4 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -487,14 +487,6 @@ static inline void node_stat_sub_folio(struct folio *folio, mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio)); } -static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, - int migratetype) -{ - __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); - if (is_migrate_cma(migratetype)) - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); -} - extern const char * const vmstat_text[]; static inline const char *zone_stat_name(enum zone_stat_item item) diff --git a/mm/debug_page_alloc.c b/mm/debug_page_alloc.c index f9d145730fd1..03a810927d0a 100644 --- a/mm/debug_page_alloc.c +++ b/mm/debug_page_alloc.c @@ -32,8 +32,7 @@ static int __init debug_guardpage_minorder_setup(char *buf) } early_param("debug_guardpage_minorder", debug_guardpage_minorder_setup); -bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order, - int migratetype) +bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order) { if (order >= debug_guardpage_minorder()) return false; @@ -41,19 +40,12 @@ bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order, __SetPageGuard(page); INIT_LIST_HEAD(&page->buddy_list); set_page_private(page, order); - /* Guard pages are not available for any usage */ - if (!is_migrate_isolate(migratetype)) - __mod_zone_freepage_state(zone, -(1 << order), migratetype); return true; } -void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order, - int migratetype) +void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order) { __ClearPageGuard(page); - set_page_private(page, 0); - if (!is_migrate_isolate(migratetype)) - __mod_zone_freepage_state(zone, (1 << order), migratetype); } diff --git a/mm/internal.h b/mm/internal.h index 785c6aff532c..b1b1b9c5dc0c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1192,11 +1192,6 @@ static inline bool is_migrate_highatomic(enum migratetype migratetype) return migratetype == MIGRATE_HIGHATOMIC; } -static inline bool is_migrate_highatomic_page(struct page *page) -{ - return get_pageblock_migratetype(page) == MIGRATE_HIGHATOMIC; -} - void setup_zone_pageset(struct zone *zone); struct migration_target_control { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ad4f1c4a43bd..e5e7651698ad 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -636,23 +636,33 @@ compaction_capture(struct capture_control *capc, struct page *page, } #endif /* CONFIG_COMPACTION */ -/* Used for pages not on another list */ -static inline void add_to_free_list(struct page *page, struct zone *zone, - unsigned int order, int migratetype) +static inline void account_freepages(struct zone *zone, int nr_pages, + int migratetype) { - struct free_area *area = &zone->free_area[order]; + if (is_migrate_isolate(migratetype)) + return; - list_add(&page->buddy_list, &area->free_list[migratetype]); - area->nr_free++; + __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); + + if (is_migrate_cma(migratetype)) + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); } /* Used for pages not on another list */ -static inline void add_to_free_list_tail(struct page *page, struct zone *zone, - unsigned int order, int migratetype) +static inline void __add_to_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype, + bool tail) { struct free_area *area = &zone->free_area[order]; - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, + "page type is %lu, passed migratetype is %d (nr=%d)\n", + get_pageblock_migratetype(page), migratetype, 1 << order); + + if (tail) + list_add_tail(&page->buddy_list, &area->free_list[migratetype]); + else + list_add(&page->buddy_list, &area->free_list[migratetype]); area->nr_free++; } @@ -662,16 +672,28 @@ static inline void add_to_free_list_tail(struct page *page, struct zone *zone, * allocation again (e.g., optimization for memory onlining). */ static inline void move_to_free_list(struct page *page, struct zone *zone, - unsigned int order, int migratetype) + unsigned int order, int old_mt, int new_mt) { struct free_area *area = &zone->free_area[order]; - list_move_tail(&page->buddy_list, &area->free_list[migratetype]); + /* Free page moving can fail, so it happens before the type update */ + VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt, + "page type is %lu, passed migratetype is %d (nr=%d)\n", + get_pageblock_migratetype(page), old_mt, 1 << order); + + list_move_tail(&page->buddy_list, &area->free_list[new_mt]); + + account_freepages(zone, -(1 << order), old_mt); + account_freepages(zone, 1 << order, new_mt); } -static inline void del_page_from_free_list(struct page *page, struct zone *zone, - unsigned int order) +static inline void __del_page_from_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype) { + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, + "page type is %lu, passed migratetype is %d (nr=%d)\n", + get_pageblock_migratetype(page), migratetype, 1 << order); + /* clear reported state and update reported page count */ if (page_reported(page)) __ClearPageReported(page); @@ -682,6 +704,13 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, zone->free_area[order].nr_free--; } +static inline void del_page_from_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype) +{ + __del_page_from_free_list(page, zone, order, migratetype); + account_freepages(zone, -(1 << order), migratetype); +} + static inline struct page *get_page_from_free_area(struct free_area *area, int migratetype) { @@ -753,16 +782,17 @@ static inline void __free_one_page(struct page *page, VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); VM_BUG_ON(migratetype == -1); - if (likely(!is_migrate_isolate(migratetype))) - __mod_zone_freepage_state(zone, 1 << order, migratetype); VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); VM_BUG_ON_PAGE(bad_range(zone, page), page); + account_freepages(zone, 1 << order, migratetype); + while (order < MAX_ORDER) { + int buddy_mt = migratetype; + if (compaction_capture(capc, page, order, migratetype)) { - __mod_zone_freepage_state(zone, -(1 << order), - migratetype); + account_freepages(zone, -(1 << order), migratetype); return; } @@ -777,19 +807,12 @@ static inline void __free_one_page(struct page *page, * pageblock isolation could cause incorrect freepage or CMA * accounting or HIGHATOMIC accounting. */ - int buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn); + buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn); - if (migratetype != buddy_mt) { - if (!migratetype_is_mergeable(migratetype) || - !migratetype_is_mergeable(buddy_mt)) - goto done_merging; - /* - * Match buddy type. This ensures that - * an expand() down the line puts the - * sub-blocks on the right freelists. - */ - set_pageblock_migratetype(buddy, migratetype); - } + if (migratetype != buddy_mt && + (!migratetype_is_mergeable(migratetype) || + !migratetype_is_mergeable(buddy_mt))) + goto done_merging; } /* @@ -797,9 +820,19 @@ static inline void __free_one_page(struct page *page, * merge with it and move up one order. */ if (page_is_guard(buddy)) - clear_page_guard(zone, buddy, order, migratetype); + clear_page_guard(zone, buddy, order); else - del_page_from_free_list(buddy, zone, order); + __del_page_from_free_list(buddy, zone, order, buddy_mt); + + if (unlikely(buddy_mt != migratetype)) { + /* + * Match buddy type. This ensures that an + * expand() down the line puts the sub-blocks + * on the right freelists. + */ + set_pageblock_migratetype(buddy, migratetype); + } + combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -816,10 +849,7 @@ static inline void __free_one_page(struct page *page, else to_tail = buddy_merge_likely(pfn, buddy_pfn, page, order); - if (to_tail) - add_to_free_list_tail(page, zone, order, migratetype); - else - add_to_free_list(page, zone, order, migratetype); + __add_to_free_list(page, zone, order, migratetype, to_tail); /* Notify page reporting subsystem of freed page */ if (!(fpi_flags & FPI_SKIP_REPORT_NOTIFY)) @@ -1308,6 +1338,9 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn, * * -- nyc */ +static inline void add_to_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype, + bool tail); static inline void expand(struct zone *zone, struct page *page, int low, int high, int migratetype) { @@ -1324,10 +1357,10 @@ static inline void expand(struct zone *zone, struct page *page, * Corresponding page table entries will not be touched, * pages will stay not present in virtual address space */ - if (set_page_guard(zone, &page[size], high, migratetype)) + if (set_page_guard(zone, &page[size], high)) continue; - add_to_free_list(&page[size], zone, high, migratetype); + add_to_free_list(&page[size], zone, high, migratetype, false); set_buddy_order(&page[size], high); } } @@ -1518,7 +1551,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, page = get_page_from_free_area(area, migratetype); if (!page) continue; - del_page_from_free_list(page, zone, current_order); + del_page_from_free_list(page, zone, current_order, migratetype); expand(zone, page, order, current_order, migratetype); trace_mm_page_alloc_zone_locked(page, order, migratetype, pcp_allowed_order(order) && @@ -1558,7 +1591,7 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, * type's freelist. */ static int move_freepages(struct zone *zone, unsigned long start_pfn, - unsigned long end_pfn, int migratetype) + unsigned long end_pfn, int old_mt, int new_mt) { struct page *page; unsigned long pfn; @@ -1580,12 +1613,14 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn, VM_BUG_ON_PAGE(page_zone(page) != zone, page); order = buddy_order(page); - move_to_free_list(page, zone, order, migratetype); + + move_to_free_list(page, zone, order, old_mt, new_mt); + pfn += 1 << order; pages_moved += 1 << order; } - set_pageblock_migratetype(pfn_to_page(start_pfn), migratetype); + set_pageblock_migratetype(pfn_to_page(start_pfn), new_mt); return pages_moved; } @@ -1643,7 +1678,7 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, } static int move_freepages_block(struct zone *zone, struct page *page, - int migratetype) + int old_mt, int new_mt) { unsigned long start_pfn, end_pfn; @@ -1651,7 +1686,7 @@ static int move_freepages_block(struct zone *zone, struct page *page, NULL, NULL)) return -1; - return move_freepages(zone, start_pfn, end_pfn, migratetype); + return move_freepages(zone, start_pfn, end_pfn, old_mt, new_mt); } #ifdef CONFIG_MEMORY_ISOLATION @@ -1723,7 +1758,6 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, int migratetype) { unsigned long start_pfn, end_pfn, pfn; - int nr_moved, mt; if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, NULL, NULL)) @@ -1738,11 +1772,9 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, if (pfn != start_pfn) { struct page *buddy = pfn_to_page(pfn); int order = buddy_order(buddy); - int mt = get_pfnblock_migratetype(buddy, pfn); - if (!is_migrate_isolate(mt)) - __mod_zone_freepage_state(zone, -(1UL << order), mt); - del_page_from_free_list(buddy, zone, order); + del_page_from_free_list(buddy, zone, order, + get_pfnblock_migratetype(buddy, pfn)); set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, buddy, pfn, order); return true; @@ -1750,23 +1782,17 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, /* We're the starting block of a larger buddy */ if (PageBuddy(page) && buddy_order(page) > pageblock_order) { - int mt = get_pfnblock_migratetype(page, pfn); int order = buddy_order(page); - if (!is_migrate_isolate(mt)) - __mod_zone_freepage_state(zone, -(1UL << order), mt); - del_page_from_free_list(page, zone, order); + del_page_from_free_list(page, zone, order, + get_pfnblock_migratetype(page, pfn)); set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, page, pfn, order); return true; } move: - mt = get_pfnblock_migratetype(page, start_pfn); - nr_moved = move_freepages(zone, start_pfn, end_pfn, migratetype); - if (!is_migrate_isolate(mt)) - __mod_zone_freepage_state(zone, -nr_moved, mt); - else if (!is_migrate_isolate(migratetype)) - __mod_zone_freepage_state(zone, nr_moved, migratetype); + move_freepages(zone, start_pfn, end_pfn, + get_pfnblock_migratetype(page, start_pfn), migratetype); return true; } #endif /* CONFIG_MEMORY_ISOLATION */ @@ -1880,7 +1906,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { - del_page_from_free_list(page, zone, current_order); + del_page_from_free_list(page, zone, current_order, block_type); change_pageblock_range(page, current_order, start_type); expand(zone, page, order, current_order, start_type); return page; @@ -1930,12 +1956,12 @@ steal_suitable_fallback(struct zone *zone, struct page *page, */ if (free_pages + alike_pages >= (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) { - move_freepages(zone, start_pfn, end_pfn, start_type); + move_freepages(zone, start_pfn, end_pfn, block_type, start_type); return __rmqueue_smallest(zone, order, start_type); } single_page: - del_page_from_free_list(page, zone, current_order); + del_page_from_free_list(page, zone, current_order, block_type); expand(zone, page, order, current_order, block_type); return page; } @@ -2005,7 +2031,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) mt = get_pageblock_migratetype(page); /* Only reserve normal pageblocks (i.e., they can merge with others) */ if (migratetype_is_mergeable(mt)) - if (move_freepages_block(zone, page, + if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) != -1) zone->nr_reserved_highatomic += pageblock_nr_pages; @@ -2046,11 +2072,13 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); + int mt; page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); if (!page) continue; + mt = get_pageblock_migratetype(page); /* * In page freeing path, migratetype change is racy so * we can counter several free pages in a pageblock @@ -2058,7 +2086,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * from highatomic to ac->migratetype. So we should * adjust the count once. */ - if (is_migrate_highatomic_page(page)) { + if (is_migrate_highatomic(mt)) { /* * It should never happen but changes to * locking could inadvertently allow a per-cpu @@ -2080,7 +2108,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * of pageblocks that cannot be completely freed * may increase. */ - ret = move_freepages_block(zone, page, ac->migratetype); + ret = move_freepages_block(zone, page, mt, + ac->migratetype); /* * Reserving this block already succeeded, so this should * not fail on zone boundaries. @@ -2251,12 +2280,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->pcp_list, list); - if (is_migrate_cma(get_pageblock_migratetype(page))) - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, - -(1 << order)); } - - __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); spin_unlock_irqrestore(&zone->lock, flags); return i; @@ -2766,11 +2790,9 @@ int __isolate_free_page(struct page *page, unsigned int order) watermark = zone->_watermark[WMARK_MIN] + (1UL << order); if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) return 0; - - __mod_zone_freepage_state(zone, -(1UL << order), mt); } - del_page_from_free_list(page, zone, order); + del_page_from_free_list(page, zone, order, mt); /* * Set the pageblock if the isolated page is at least half of a @@ -2785,7 +2807,7 @@ int __isolate_free_page(struct page *page, unsigned int order) * with others) */ if (migratetype_is_mergeable(mt)) - move_freepages_block(zone, page, + move_freepages_block(zone, page, mt, MIGRATE_MOVABLE); } } @@ -2870,8 +2892,6 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, return NULL; } } - __mod_zone_freepage_state(zone, -(1 << order), - get_pageblock_migratetype(page)); spin_unlock_irqrestore(&zone->lock, flags); } while (check_new_pages(page, order)); @@ -6972,8 +6992,9 @@ void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); + VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); - del_page_from_free_list(page, zone, order); + del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); @@ -7001,6 +7022,14 @@ bool is_free_buddy_page(struct page *page) EXPORT_SYMBOL(is_free_buddy_page); #ifdef CONFIG_MEMORY_FAILURE +static inline void add_to_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype, + bool tail) +{ + __add_to_free_list(page, zone, order, migratetype, tail); + account_freepages(zone, 1 << order, migratetype); +} + /* * Break down a higher-order page in sub-pages, and keep our target out of * buddy allocator. @@ -7023,10 +7052,10 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, current_buddy = page + size; } - if (set_page_guard(zone, current_buddy, high, migratetype)) + if (set_page_guard(zone, current_buddy, high)) continue; - add_to_free_list(current_buddy, zone, high, migratetype); + add_to_free_list(current_buddy, zone, high, migratetype, false); set_buddy_order(current_buddy, high); } } @@ -7052,12 +7081,11 @@ bool take_page_off_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page_head, pfn_head); - del_page_from_free_list(page_head, zone, page_order); + del_page_from_free_list(page_head, zone, page_order, + migratetype); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); SetPageHWPoisonTakenOff(page); - if (!is_migrate_isolate(migratetype)) - __mod_zone_freepage_state(zone, -1, migratetype); ret = true; break; } @@ -7162,7 +7190,7 @@ static bool try_to_accept_memory_one(struct zone *zone) list_del(&page->lru); last = list_empty(&zone->unaccepted_pages); - __mod_zone_freepage_state(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE); + account_freepages(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE); __mod_zone_page_state(zone, NR_UNACCEPTED, -MAX_ORDER_NR_PAGES); spin_unlock_irqrestore(&zone->lock, flags); @@ -7220,7 +7248,7 @@ static bool __free_unaccepted(struct page *page) spin_lock_irqsave(&zone->lock, flags); first = list_empty(&zone->unaccepted_pages); list_add_tail(&page->lru, &zone->unaccepted_pages); - __mod_zone_freepage_state(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE); + account_freepages(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE); __mod_zone_page_state(zone, NR_UNACCEPTED, MAX_ORDER_NR_PAGES); spin_unlock_irqrestore(&zone->lock, flags); -- 2.43.0

From: Vlastimil Babka <vbabka@suse.cz> mainline inclusion from mainline-v6.10-rc1 commit e1f42a577f63647dadf1abe4583053c03d6be045 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The function is now supposed to be called only on a single pageblock and checks start_pfn and end_pfn accordingly. Rename it to make this more obvious and drop the end_pfn parameter which can be determined trivially and none of the callers use it for anything else. Also make the (now internal) end_pfn exclusive, which is more common. Link: https://lkml.kernel.org/r/81b1d642-2ec0-49f5-89fc-19a3828419ff@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 43 ++++++++++++++++++++----------------------- 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e5e7651698ad..83e1cdcd7d24 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1590,18 +1590,18 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, * Change the type of a block and move all its free pages to that * type's freelist. */ -static int move_freepages(struct zone *zone, unsigned long start_pfn, - unsigned long end_pfn, int old_mt, int new_mt) +static int __move_freepages_block(struct zone *zone, unsigned long start_pfn, + int old_mt, int new_mt) { struct page *page; - unsigned long pfn; + unsigned long pfn, end_pfn; unsigned int order; int pages_moved = 0; VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1)); - VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn); + end_pfn = pageblock_end_pfn(start_pfn); - for (pfn = start_pfn; pfn <= end_pfn;) { + for (pfn = start_pfn; pfn < end_pfn;) { page = pfn_to_page(pfn); if (!PageBuddy(page)) { pfn++; @@ -1627,14 +1627,13 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn, static bool prep_move_freepages_block(struct zone *zone, struct page *page, unsigned long *start_pfn, - unsigned long *end_pfn, int *num_free, int *num_movable) { unsigned long pfn, start, end; pfn = page_to_pfn(page); start = pageblock_start_pfn(pfn); - end = pageblock_end_pfn(pfn) - 1; + end = pageblock_end_pfn(pfn); /* * The caller only has the lock for @zone, don't touch ranges @@ -1645,16 +1644,15 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, */ if (!zone_spans_pfn(zone, start)) return false; - if (!zone_spans_pfn(zone, end)) + if (!zone_spans_pfn(zone, end - 1)) return false; *start_pfn = start; - *end_pfn = end; if (num_free) { *num_free = 0; *num_movable = 0; - for (pfn = start; pfn <= end;) { + for (pfn = start; pfn < end;) { page = pfn_to_page(pfn); if (PageBuddy(page)) { int nr = 1 << buddy_order(page); @@ -1680,13 +1678,12 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, static int move_freepages_block(struct zone *zone, struct page *page, int old_mt, int new_mt) { - unsigned long start_pfn, end_pfn; + unsigned long start_pfn; - if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, - NULL, NULL)) + if (!prep_move_freepages_block(zone, page, &start_pfn, NULL, NULL)) return -1; - return move_freepages(zone, start_pfn, end_pfn, old_mt, new_mt); + return __move_freepages_block(zone, start_pfn, old_mt, new_mt); } #ifdef CONFIG_MEMORY_ISOLATION @@ -1757,10 +1754,9 @@ static void split_large_buddy(struct zone *zone, struct page *page, bool move_freepages_block_isolate(struct zone *zone, struct page *page, int migratetype) { - unsigned long start_pfn, end_pfn, pfn; + unsigned long start_pfn, pfn; - if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, - NULL, NULL)) + if (!prep_move_freepages_block(zone, page, &start_pfn, NULL, NULL)) return false; /* No splits needed if buddies can't span multiple blocks */ @@ -1791,8 +1787,9 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, return true; } move: - move_freepages(zone, start_pfn, end_pfn, - get_pfnblock_migratetype(page, start_pfn), migratetype); + __move_freepages_block(zone, start_pfn, + get_pfnblock_migratetype(page, start_pfn), + migratetype); return true; } #endif /* CONFIG_MEMORY_ISOLATION */ @@ -1892,7 +1889,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int alloc_flags, bool whole_block) { int free_pages, movable_pages, alike_pages; - unsigned long start_pfn, end_pfn; + unsigned long start_pfn; int block_type; block_type = get_pageblock_migratetype(page); @@ -1925,8 +1922,8 @@ steal_suitable_fallback(struct zone *zone, struct page *page, goto single_page; /* moving whole block can fail due to zone boundary conditions */ - if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn, - &free_pages, &movable_pages)) + if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages, + &movable_pages)) goto single_page; /* @@ -1956,7 +1953,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, */ if (free_pages + alike_pages >= (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) { - move_freepages(zone, start_pfn, end_pfn, block_type, start_type); + __move_freepages_block(zone, start_pfn, block_type, start_type); return __rmqueue_smallest(zone, order, start_type); } -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc1 commit 883dd161e9a83e188487debc562b1928917a4b39 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- expand() currently updates vmstat for every subpage. This is unnecessary, since they're all of the same zone and migratetype. Count added pages locally, then do a single vmstat update. Link: https://lkml.kernel.org/r/20240327190111.GC7597@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 83e1cdcd7d24..16f8b153d78b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1338,13 +1338,11 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn, * * -- nyc */ -static inline void add_to_free_list(struct page *page, struct zone *zone, - unsigned int order, int migratetype, - bool tail); static inline void expand(struct zone *zone, struct page *page, int low, int high, int migratetype) { unsigned long size = 1 << high; + unsigned long nr_added = 0; while (high > low) { high--; @@ -1360,9 +1358,11 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high)) continue; - add_to_free_list(&page[size], zone, high, migratetype, false); + __add_to_free_list(&page[size], zone, high, migratetype, false); set_buddy_order(&page[size], high); + nr_added += size; } + account_freepages(zone, nr_added, migratetype); } static void check_new_page_bad(struct page *page) -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.10-rc3 commit 7cc5a5d65011983952a9c62f170f5b79e24b1239 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Christoph reports a page allocator splat triggered by xfstests: generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30 XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273 XFS (nvme0n1): Ending clean mount XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 XFS (nvme1n1): Ending clean mount XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9 XFS (nvme1n1): Ending clean mount ------------[ cut here ]------------ page type is 3, passed migratetype is 1 (nr=512) WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0 Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug] CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:expand+0x1c5/0x1f0 Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3 RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000 RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009 R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000 FS: 00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x7b/0x120 ? expand+0x1c5/0x1f0 ? report_bug+0x191/0x1c0 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? expand+0x1c5/0x1f0 ? expand+0x1c5/0x1f0 __rmqueue_pcplist+0x3a9/0x730 get_page_from_freelist+0x7a0/0xf00 __alloc_pages_noprof+0x153/0x2e0 __folio_alloc_noprof+0x10/0xa0 __filemap_get_folio+0x16b/0x370 iomap_write_begin+0x496/0x680 While trying to service a movable allocation (page type 1), the page allocator runs into a two-pageblock buddy on the movable freelist whose second block is typed as highatomic (page type 3). This inconsistency is caused by the highatomic reservation system operating on single pageblocks, while MAX_ORDER can be bigger than that - in this configuration, pageblock_order is 9 while MAX_PAGE_ORDER is 10. The test case is observed to make several adjacent order-3 requests with __GFP_DIRECT_RECLAIM cleared, which marks the surrounding block as highatomic. Upon freeing, the blocks merge into an order-10 buddy. When the highatomic pool is drained later on, this order-10 buddy gets moved back to the movable list, but only the first pageblock is marked movable again. A subsequent expand() of this buddy warns about the tail being of a different type. This is a long-standing bug that's surfaced by the recent block type warnings added to the allocator. The consequences seem mostly benign, it just results in odd behavior: the highatomic tail blocks are not properly drained, instead they end up on the movable list first, then go back to the highatomic list after an alloc-free cycle. To fix this, make the highatomic reservation code aware that allocations/buddies can be larger than a pageblock. While it's an old quirk, the recently added type consistency warnings seem to be the most prominent consequence of it. Set the Fixes: tag accordingly to highlight this backporting dependency. Link: https://lkml.kernel.org/r/20240530114203.GA1222079@cmpxchg.org Fixes: e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Christoph Hellwig <hch@lst.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 16f8b153d78b..f43076d49e7e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1998,10 +1998,12 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, } /* - * Reserve a pageblock for exclusive use of high-order atomic allocations if - * there are no empty page blocks that contain a page with a suitable order + * Reserve the pageblock(s) surrounding an allocation request for + * exclusive use of high-order atomic allocations if there are no + * empty page blocks that contain a page with a suitable order */ -static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) +static void reserve_highatomic_pageblock(struct page *page, int order, + struct zone *zone) { int mt; unsigned long max_managed, flags; @@ -2027,10 +2029,17 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) /* Yoink! */ mt = get_pageblock_migratetype(page); /* Only reserve normal pageblocks (i.e., they can merge with others) */ - if (migratetype_is_mergeable(mt)) - if (move_freepages_block(zone, page, mt, - MIGRATE_HIGHATOMIC) != -1) - zone->nr_reserved_highatomic += pageblock_nr_pages; + if (!migratetype_is_mergeable(mt)) + goto out_unlock; + + if (order < pageblock_order) { + if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) == -1) + goto out_unlock; + zone->nr_reserved_highatomic += pageblock_nr_pages; + } else { + change_pageblock_range(page, order, MIGRATE_HIGHATOMIC); + zone->nr_reserved_highatomic += 1 << order; + } out_unlock: spin_unlock_irqrestore(&zone->lock, flags); @@ -2042,7 +2051,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) * intense memory pressure but failed atomic allocations should be easier * to recover from than an OOM. * - * If @force is true, try to unreserve a pageblock even though highatomic + * If @force is true, try to unreserve pageblocks even though highatomic * pageblock is exhausted. */ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, @@ -2084,6 +2093,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * adjust the count once. */ if (is_migrate_highatomic(mt)) { + unsigned long size; /* * It should never happen but changes to * locking could inadvertently allow a per-cpu @@ -2091,9 +2101,9 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * while unreserving so be safe and watch for * underflows. */ - zone->nr_reserved_highatomic -= min( - pageblock_nr_pages, - zone->nr_reserved_highatomic); + size = max(pageblock_nr_pages, 1UL << order); + size = min(size, zone->nr_reserved_highatomic); + zone->nr_reserved_highatomic -= size; } /* @@ -2105,11 +2115,19 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * of pageblocks that cannot be completely freed * may increase. */ - ret = move_freepages_block(zone, page, mt, - ac->migratetype); + if (order < pageblock_order) + ret = move_freepages_block(zone, page, mt, + ac->migratetype); + else { + move_to_free_list(page, zone, order, mt, + ac->migratetype); + change_pageblock_range(page, order, + ac->migratetype); + ret = 1; + } /* - * Reserving this block already succeeded, so this should - * not fail on zone boundaries. + * Reserving the block(s) already succeeded, + * so this should not fail on zone boundaries. */ WARN_ON_ONCE(ret == -1); if (ret > 0) { @@ -3456,7 +3474,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, * if the pageblock should be reserved for the future */ if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) - reserve_highatomic_pageblock(page, zone); + reserve_highatomic_pageblock(page, order, zone); return page; } else { -- 2.43.0

From: Yu Zhao <yuzhao@google.com> mainline inclusion from mainline-v6.12-rc7 commit c928807f6f6b6d595a7e199591ae297c81de3aeb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- OOM kills due to vastly overestimated free highatomic reserves were observed: ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB The second line above shows that the OOM kill was due to the following condition: free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB) And the third line shows there were no free pages in any MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. Therefore __zone_watermark_unusable_free() underestimated the usable free memory by over 1GB, which resulted in the unnecessary OOM kill above. The comments in __zone_watermark_unusable_free() warns about the potential risk, i.e., If the caller does not have rights to reserves below the min watermark then subtract the high-atomic reserves. This will over-estimate the size of the atomic reserve but it avoids a search. However, it is possible to keep track of free pages in reserved highatomic pageblocks with a new per-zone counter nr_free_highatomic protected by the zone lock, to avoid a search when calculating the usable free memory. And the cost would be minimal, i.e., simple arithmetics in the highatomic alloc/free/move paths. Note that since nr_free_highatomic can be relatively small, using a per-cpu counter might cause too much drift and defeat its purpose, in addition to the extra memory overhead. Dependson e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") - see [1] [akpm@linux-foundation.org: s/if/else if/, per Johannes, stealth whitespace tweak] Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com Link: https://lkml.kernel.org/r/0d0ddb33-fcdc-43e2-801f-0c1df2031afb@suse.cz [1] Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: Link Lin <linkl@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: include/linux/mmzone.h [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- include/linux/mmzone.h | 2 +- mm/page_alloc.c | 10 +++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ebaa85d5ac70..775e8cfc186c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1032,7 +1032,7 @@ struct zone { CACHELINE_PADDING(_pad3_); - KABI_RESERVE(1) + KABI_USE(1, unsigned long nr_free_highatomic) KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f43076d49e7e..ce4051c7da26 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -639,6 +639,8 @@ compaction_capture(struct capture_control *capc, struct page *page, static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { + lockdep_assert_held(&zone->lock); + if (is_migrate_isolate(migratetype)) return; @@ -646,6 +648,9 @@ static inline void account_freepages(struct zone *zone, int nr_pages, if (is_migrate_cma(migratetype)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); + else if (is_migrate_highatomic(migratetype)) + WRITE_ONCE(zone->nr_free_highatomic, + zone->nr_free_highatomic + nr_pages); } /* Used for pages not on another list */ @@ -3088,11 +3093,10 @@ static inline long __zone_watermark_unusable_free(struct zone *z, /* * If the caller does not have rights to reserves below the min - * watermark then subtract the high-atomic reserves. This will - * over-estimate the size of the atomic reserve but it avoids a search. + * watermark then subtract the free pages reserved for highatomic. */ if (likely(!(alloc_flags & ALLOC_RESERVES))) - unusable_free += z->nr_reserved_highatomic; + unusable_free += READ_ONCE(z->nr_free_highatomic); #ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */ -- 2.43.0

From: Kefeng Wang <wangkefeng.wang@huawei.com> mainline inclusion from mainline-v6.12-rc1 commit cd5f3193b432cd70cc1c19aba790300dd11ae934 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The gigantic page size may larger than memory block size, so memory offline always fails in this case after commit b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity"), offline_pages start_isolate_page_range start_isolate_page_range(isolate_before=true) isolate [isolate_start, isolate_start + pageblock_nr_pages) start_isolate_page_range(isolate_before=false) isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock __alloc_contig_migrate_range isolate_migratepages_range isolate_migratepages_block isolate_or_dissolve_huge_page if (hstate_is_gigantic(h)) return -ENOMEM; [ 15.815756] memory offlining [mem 0x3c0000000-0x3c7ffffff] failed due to failure to isolate range Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages, its pageblocks after being freed will get to the right free list. There is no need to have special handling code for them in start_isolate_page_range(). For both alloc_contig_range() and memory offline cases, the migration code after start_isolate_page_range() will be able to migrate gigantic PageHuge when possible. Let's clean up start_isolate_page_range() and fix the aforementioned memory offline failure issue all together. Let's clean up start_isolate_page_range() and fix the aforementioned memory offline failure issue all together. Link: https://lkml.kernel.org/r/20240820032630.1894770-1-wangkefeng.wang@huawei.co... Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_isolation.c | 28 +++------------------------- 1 file changed, 3 insertions(+), 25 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index fa446a57b98d..c942aa452755 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -398,30 +398,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, unsigned long head_pfn = page_to_pfn(head); unsigned long nr_pages = compound_nr(head); - if (head_pfn + nr_pages <= boundary_pfn) { - pfn = head_pfn + nr_pages; - continue; - } - -#if defined CONFIG_COMPACTION || defined CONFIG_CMA - if (PageHuge(page)) { - int page_mt = get_pageblock_migratetype(page); - struct compact_control cc = { - .nr_migratepages = 0, - .order = -1, - .zone = page_zone(pfn_to_page(head_pfn)), - .mode = MIGRATE_SYNC, - .ignore_skip_hint = true, - .no_set_skip_hint = true, - .gfp_mask = gfp_flags, - .alloc_contig = true, - }; - INIT_LIST_HEAD(&cc.migratepages); - - ret = __alloc_contig_migrate_range(&cc, head_pfn, - head_pfn + nr_pages, page_mt); - if (ret) - goto failed; + if (head_pfn + nr_pages <= boundary_pfn || + PageHuge(page)) { pfn = head_pfn + nr_pages; continue; } @@ -435,7 +413,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, */ VM_WARN_ON_ONCE_PAGE(PageLRU(page), page); VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page); -#endif + goto failed; } -- 2.43.0

From: Huan Yang <link@vivo.com> mainline inclusion from mainline-v6.12-rc1 commit 94deaf69dcd33462c61fa8cabb0883e3085a1046 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When page del from buddy and need expand, it will account free_pages in zone's migratetype. The current way is to subtract the page number of the current order when deleting, and then add it back when expanding. This is unnecessary, as when migrating the same type, we can directly record the difference between the high-order pages and the expand added, and then subtract it directly. This patch merge that, only when del and expand done, then account free_pages. Link: https://lkml.kernel.org/r/20240826064048.187790-1-link@vivo.com Signed-off-by: Huan Yang <link@vivo.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Conflicts: mm/page_allo.c [remove add_to_free_list() declaration which was used to fix compile error .] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ce4051c7da26..9d0b508dbc12 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1343,11 +1343,11 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn, * * -- nyc */ -static inline void expand(struct zone *zone, struct page *page, - int low, int high, int migratetype) +static inline unsigned int expand(struct zone *zone, struct page *page, int low, + int high, int migratetype) { - unsigned long size = 1 << high; - unsigned long nr_added = 0; + unsigned int size = 1 << high; + unsigned int nr_added = 0; while (high > low) { high--; @@ -1367,7 +1367,19 @@ static inline void expand(struct zone *zone, struct page *page, set_buddy_order(&page[size], high); nr_added += size; } - account_freepages(zone, nr_added, migratetype); + + return nr_added; +} + +static __always_inline void page_del_and_expand(struct zone *zone, + struct page *page, int low, + int high, int migratetype) +{ + int nr_pages = 1 << high; + + __del_page_from_free_list(page, zone, high, migratetype); + nr_pages -= expand(zone, page, low, high, migratetype); + account_freepages(zone, -nr_pages, migratetype); } static void check_new_page_bad(struct page *page) @@ -1556,8 +1568,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, page = get_page_from_free_area(area, migratetype); if (!page) continue; - del_page_from_free_list(page, zone, current_order, migratetype); - expand(zone, page, order, current_order, migratetype); + + page_del_and_expand(zone, page, order, current_order, + migratetype); trace_mm_page_alloc_zone_locked(page, order, migratetype, pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES); @@ -1908,9 +1921,12 @@ steal_suitable_fallback(struct zone *zone, struct page *page, /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { + unsigned int nr_added; + del_page_from_free_list(page, zone, current_order, block_type); change_pageblock_range(page, current_order, start_type); - expand(zone, page, order, current_order, start_type); + nr_added = expand(zone, page, order, current_order, start_type); + account_freepages(zone, nr_added, start_type); return page; } @@ -1963,8 +1979,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, } single_page: - del_page_from_free_list(page, zone, current_order, block_type); - expand(zone, page, order, current_order, block_type); + page_del_and_expand(zone, page, order, current_order, block_type); return page; } -- 2.43.0

From: Luoxi Li <kaixa@kiloview.com> mainline inclusion from mainline-v6.13-rc1 commit b7f058f827392022d8c689329f88c7b324d71dad category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- has_isolate_pageblock() has been unused since commit 55612e80e722 ("mm: page_alloc: close migratetype race between freeing and stealing") Remove it. Link: https://lkml.kernel.org/r/20241018092235.2764859-1-kaixa@kiloview.com Signed-off-by: Luoxi Li <kaixa@kiloview.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- include/linux/page-isolation.h | 8 -------- 1 file changed, 8 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index c16db0067090..73dc2c1841ec 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -3,10 +3,6 @@ #define __LINUX_PAGEISOLATION_H #ifdef CONFIG_MEMORY_ISOLATION -static inline bool has_isolate_pageblock(struct zone *zone) -{ - return zone->nr_isolate_pageblock; -} static inline bool is_migrate_isolate_page(struct page *page) { return get_pageblock_migratetype(page) == MIGRATE_ISOLATE; @@ -16,10 +12,6 @@ static inline bool is_migrate_isolate(int migratetype) return migratetype == MIGRATE_ISOLATE; } #else -static inline bool has_isolate_pageblock(struct zone *zone) -{ - return false; -} static inline bool is_migrate_isolate_page(struct page *page) { return false; -- 2.43.0

From: Yu Zhao <yuzhao@google.com> mainline inclusion from mainline-v6.12-rc1 commit e98337d11bbdfa3e3f0fb99aa93e40f97549e0cd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "mm/hugetlb: alloc/free gigantic folios", v2. Use __GFP_COMP for gigantic folios can greatly reduce not only the amount of code but also the allocation and free time. Approximate LOC to mm/hugetlb.c: +60, -240 Allocate and free 500 1GB hugeTLB memory without HVO by: time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages Before After Alloc ~13s ~10s Free ~15s <1s The above magnitude generally holds for multiple x86 and arm64 CPU models. Perf profile before: Alloc - 99.99% alloc_pool_huge_folio - __alloc_fresh_hugetlb_folio - 83.23% alloc_contig_pages_noprof - 47.46% alloc_contig_range_noprof - 20.96% isolate_freepages_range 16.10% split_page - 14.10% start_isolate_page_range - 12.02% undo_isolate_page_range Free - update_and_free_pages_bulk - 87.71% free_contig_range - 76.02% free_unref_page - 41.30% free_unref_page_commit - 32.58% free_pcppages_bulk - 24.75% __free_one_page 13.96% _raw_spin_trylock 12.27% __update_and_free_hugetlb_folio Perf profile after: Alloc - 99.99% alloc_pool_huge_folio alloc_gigantic_folio - alloc_contig_pages_noprof - 59.15% alloc_contig_range_noprof - 20.72% start_isolate_page_range 20.64% prep_new_page - 17.13% undo_isolate_page_range Free - update_and_free_pages_bulk - __folio_put - __free_pages_ok 7.46% free_tail_page_prepare - 1.97% free_one_page 1.86% __free_one_page This patch (of 3): Support __GFP_COMP in alloc_contig_range(). When the flag is set, upon success the function returns a large folio prepared by prep_new_page(), rather than a range of order-0 pages prepared by split_free_pages() (which is renamed from split_map_pages()). alloc_contig_range() can be used to allocate folios larger than MAX_PAGE_ORDER, e.g., gigantic hugeTLB folios. So on the free path, free_one_page() needs to handle that by split_large_buddy(). [akpm@linux-foundation.org: fix folio_alloc_gigantic_noprof() WARN expression, per Yu Liao] Link: https://lkml.kernel.org/r/20240814035451.773331-1-yuzhao@google.com Link: https://lkml.kernel.org/r/20240814035451.773331-2-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Frank van der Linden <fvdl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c mm/compaction.c include/linux/gfp.h [Conflicts due to commit 34a023dc8869 ("mm: handle profiling for fake memory allocations during compaction") and commit 34a023dc8869 ("mm: handle profiling for fake memory allocations during compaction") isn't merged.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- include/linux/gfp.h | 21 +++++++++ mm/compaction.c | 41 ++-------------- mm/page_alloc.c | 111 +++++++++++++++++++++++++++++++------------- 3 files changed, 106 insertions(+), 67 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 81ad5276030f..4fb856c6958e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -380,4 +380,25 @@ extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, #endif void free_contig_range(unsigned long pfn, unsigned long nr_pages); +#ifdef CONFIG_CONTIG_ALLOC +/* This should be paired with folio_put() rather than free_contig_range(). */ +static inline struct folio *folio_alloc_gigantic(int order, gfp_t gfp, + int nid, nodemask_t *node) +{ + struct page *page; + + if (WARN_ON(!order || !(gfp & __GFP_COMP))) + return NULL; + + page = alloc_contig_pages(1 << order, gfp, nid, node); + + return page ? page_folio(page) : NULL; +} +#else +static inline struct folio *folio_alloc_gigantic(int order, gfp_t gfp, + int nid, nodemask_t *node) +{ + return NULL; +} +#endif #endif /* __LINUX_GFP_H */ diff --git a/mm/compaction.c b/mm/compaction.c index ed0baf7404c8..e9053f27711c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -80,33 +80,6 @@ static inline bool is_via_compact_memory(int order) { return false; } #define COMPACTION_HPAGE_ORDER (PMD_SHIFT - PAGE_SHIFT) #endif -static void split_map_pages(struct list_head *freepages) -{ - unsigned int i, order; - struct page *page, *next; - LIST_HEAD(tmp_list); - - for (order = 0; order < NR_PAGE_ORDERS; order++) { - list_for_each_entry_safe(page, next, &freepages[order], lru) { - unsigned int nr_pages; - - list_del(&page->lru); - - nr_pages = 1 << order; - - post_alloc_hook(page, order, __GFP_MOVABLE); - if (order) - split_page(page, order); - - for (i = 0; i < nr_pages; i++) { - list_add(&page->lru, &tmp_list); - page++; - } - } - list_splice_init(&tmp_list, &freepages[0]); - } -} - static unsigned long release_free_list(struct list_head *freepages) { int order; @@ -737,11 +710,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, * * Non-free pages, invalid PFNs, or zone boundaries within the * [start_pfn, end_pfn) range are considered errors, cause function to - * undo its actions and return zero. + * undo its actions and return zero. cc->freepages[] are empty. * * Otherwise, function returns one-past-the-last PFN of isolated page * (which may be greater then end_pfn if end fell in a middle of - * a free page). + * a free page). cc->freepages[] contain free pages isolated. */ unsigned long isolate_freepages_range(struct compact_control *cc, @@ -749,10 +722,9 @@ isolate_freepages_range(struct compact_control *cc, { unsigned long isolated, pfn, block_start_pfn, block_end_pfn; int order; - struct list_head tmp_freepages[NR_PAGE_ORDERS]; for (order = 0; order < NR_PAGE_ORDERS; order++) - INIT_LIST_HEAD(&tmp_freepages[order]); + INIT_LIST_HEAD(&cc->freepages[order]); pfn = start_pfn; block_start_pfn = pageblock_start_pfn(pfn); @@ -783,7 +755,7 @@ isolate_freepages_range(struct compact_control *cc, break; isolated = isolate_freepages_block(cc, &isolate_start_pfn, - block_end_pfn, tmp_freepages, 0, true); + block_end_pfn, cc->freepages, 0, true); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -802,13 +774,10 @@ isolate_freepages_range(struct compact_control *cc, if (pfn < end_pfn) { /* Loop terminated early, cleanup. */ - release_free_list(tmp_freepages); + release_free_list(cc->freepages); return 0; } - /* __isolate_free_page() does not map the pages */ - split_map_pages(tmp_freepages); - /* We don't use freelists for anything. */ return pfn; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9d0b508dbc12..5a41c5199dcf 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1214,16 +1214,36 @@ static void free_pcppages_bulk(struct zone *zone, int count, spin_unlock_irqrestore(&zone->lock, flags); } +/* Split a multi-block free page into its individual pageblocks. */ +static void split_large_buddy(struct zone *zone, struct page *page, + unsigned long pfn, int order, fpi_t fpi) +{ + unsigned long end = pfn + (1 << order); + + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); + /* Caller removed page from freelist, buddy info cleared! */ + VM_WARN_ON_ONCE(PageBuddy(page)); + + if (order > pageblock_order) + order = pageblock_order; + + while (pfn != end) { + int mt = get_pfnblock_migratetype(page, pfn); + + __free_one_page(page, pfn, zone, order, mt, fpi); + pfn += 1 << order; + page = pfn_to_page(pfn); + } +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { unsigned long flags; - int migratetype; spin_lock_irqsave(&zone->lock, flags); - migratetype = get_pfnblock_migratetype(page, pfn); - __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); + split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); } @@ -1729,27 +1749,6 @@ static unsigned long find_large_buddy(unsigned long start_pfn) return start_pfn; } -/* Split a multi-block free page into its individual pageblocks */ -static void split_large_buddy(struct zone *zone, struct page *page, - unsigned long pfn, int order) -{ - unsigned long end_pfn = pfn + (1 << order); - - VM_WARN_ON_ONCE(order <= pageblock_order); - VM_WARN_ON_ONCE(pfn & (pageblock_nr_pages - 1)); - - /* Caller removed page from freelist, buddy info cleared! */ - VM_WARN_ON_ONCE(PageBuddy(page)); - - while (pfn != end_pfn) { - int mt = get_pfnblock_migratetype(page, pfn); - - __free_one_page(page, pfn, zone, pageblock_order, mt, FPI_NONE); - pfn += pageblock_nr_pages; - page = pfn_to_page(pfn); - } -} - /** * move_freepages_block_isolate - move free pages in block for page isolation * @zone: the zone @@ -1790,7 +1789,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(buddy, zone, order, get_pfnblock_migratetype(buddy, pfn)); set_pageblock_migratetype(page, migratetype); - split_large_buddy(zone, buddy, pfn, order); + split_large_buddy(zone, buddy, pfn, order, FPI_NONE); return true; } @@ -1801,7 +1800,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(page, zone, order, get_pfnblock_migratetype(page, pfn)); set_pageblock_migratetype(page, migratetype); - split_large_buddy(zone, page, pfn, order); + split_large_buddy(zone, page, pfn, order, FPI_NONE); return true; } move: @@ -6685,6 +6684,31 @@ int __alloc_contig_migrate_range(struct compact_control *cc, return (ret < 0) ? ret : 0; } +static void split_free_pages(struct list_head *list) +{ + int order; + + for (order = 0; order < NR_PAGE_ORDERS; order++) { + struct page *page, *next; + int nr_pages = 1 << order; + + list_for_each_entry_safe(page, next, &list[order], lru) { + int i; + + post_alloc_hook(page, order, __GFP_MOVABLE); + if (!order) + continue; + + split_page(page, order); + + /* Add all subpages to the order-0 head, in sequence. */ + list_del(&page->lru); + for (i = 0; i < nr_pages; i++) + list_add_tail(&page[i].lru, &list[0]); + } + } +} + /** * alloc_contig_range() -- tries to allocate given range of pages * @start: start PFN to allocate @@ -6807,12 +6831,25 @@ int alloc_contig_range(unsigned long start, unsigned long end, goto done; } - /* Free head and tail (if any) */ - if (start != outer_start) - free_contig_range(outer_start, start - outer_start); - if (end != outer_end) - free_contig_range(end, outer_end - end); + if (!(gfp_mask & __GFP_COMP)) { + split_free_pages(cc.freepages); + /* Free head and tail (if any) */ + if (start != outer_start) + free_contig_range(outer_start, start - outer_start); + if (end != outer_end) + free_contig_range(end, outer_end - end); + } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) { + struct page *head = pfn_to_page(start); + int order = ilog2(end - start); + + check_new_pages(head, order); + prep_new_page(head, order, gfp_mask, 0); + } else { + ret = -EINVAL; + WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n", + start, end, outer_start, outer_end); + } done: undo_isolate_page_range(start, end, migratetype); return ret; @@ -6921,6 +6958,18 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, void free_contig_range(unsigned long pfn, unsigned long nr_pages) { unsigned long count = 0; + struct folio *folio = pfn_folio(pfn); + + if (folio_test_large(folio)) { + int expected = folio_nr_pages(folio); + + if (nr_pages == expected) + folio_put(folio); + else + WARN(true, "PFN %lu: nr_pages %lu != expected %d\n", + pfn, nr_pages, expected); + return; + } for (; nr_pages--; pfn++) { struct page *page = pfn_to_page(pfn); -- 2.43.0

From: David Hildenbrand <david@redhat.com> mainline inclusion from mainline-v6.14-rc1 commit b7bc81e83bb40143ab4b8c5879054c32009f53d2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "mm/page_alloc: gfp flags cleanups for alloc_contig_*()", v2. Let's clean up the gfp flags handling, and support __GFP_ZERO, such that we can finally remove the TODO in memtrace code. This patch (of 6): The flags are no longer used, we can stop passing them to isolate_single_pageblock(). Link: https://lkml.kernel.org/r/20241203094732.200195-1-david@redhat.com Link: https://lkml.kernel.org/r/20241203094732.200195-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_isolation.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c942aa452755..ddee0901c77f 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -284,7 +284,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * within a free or in-use page. * @boundary_pfn: pageblock-aligned pfn that a page might cross * @flags: isolation flags - * @gfp_flags: GFP flags used for migrating pages * @isolate_before: isolate the pageblock before the boundary_pfn * @skip_isolation: the flag to skip the pageblock isolation in second * isolate_single_pageblock() @@ -303,8 +302,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * the in-use page then splitting the free page. */ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, - gfp_t gfp_flags, bool isolate_before, bool skip_isolation, - int migratetype) + bool isolate_before, bool skip_isolation, int migratetype) { unsigned long start_pfn; unsigned long isolate_pageblock; @@ -484,7 +482,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, bool skip_isolation = false; /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */ - ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false, + ret = isolate_single_pageblock(isolate_start, flags, false, skip_isolation, migratetype); if (ret) return ret; @@ -493,7 +491,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, skip_isolation = true; /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, + ret = isolate_single_pageblock(isolate_end, flags, true, skip_isolation, migratetype); if (ret) { unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); -- 2.43.0

From: David Hildenbrand <david@redhat.com> mainline inclusion from mainline-v6.14-rc1 commit 2202f51e9215bf9601402d6aa09e10c1dbb2a72d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The single user is in page_alloc.c. Link: https://lkml.kernel.org/r/20241203094732.200195-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/internal.h | 4 ---- mm/page_alloc.c | 5 ++--- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index b1b1b9c5dc0c..a6272a6dea06 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -784,10 +784,6 @@ int isolate_migratepages_range(struct compact_control *cc, unsigned long low_pfn, unsigned long end_pfn); -int __alloc_contig_migrate_range(struct compact_control *cc, - unsigned long start, unsigned long end, - int migratetype); - /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ void init_cma_reserved_pageblock(struct page *page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5a41c5199dcf..28bf21ae0ebb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6608,9 +6608,8 @@ static void alloc_contig_dump_pages(struct list_head *page_list) * @migratetype: using migratetype to filter the type of migration in * trace_mm_alloc_contig_migrate_range_info. */ -int __alloc_contig_migrate_range(struct compact_control *cc, - unsigned long start, unsigned long end, - int migratetype) +static int __alloc_contig_migrate_range(struct compact_control *cc, + unsigned long start, unsigned long end, int migratetype) { /* This function is based on compact_zone() from compaction.c. */ unsigned int nr_reclaimed; -- 2.43.0

From: David Hildenbrand <david@redhat.com> mainline inclusion from mainline-v6.14-rc1 commit f6037a4a686523dee1967ef7620349822e019ff8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- It's all a bit complicated for alloc_contig_range(). For example, we don't support many flags, so let's start bailing out on unsupported ones -- ignoring the placement hints, as we are already given the range to allocate. While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we simply create yet another GFP mask whereby we ignore the reclaim flags specify by the caller. That looks very inconsistent. Let's clean it up, constructing the gfp flags used for compaction/migration exactly once. Update the documentation of the gfp_mask parameter for alloc_contig_range() and alloc_contig_pages(). Link: https://lkml.kernel.org/r/20241203094732.200195-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts in __alloc_contig_migrate_range().] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 28bf21ae0ebb..f1901eaf711f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6618,7 +6618,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, int ret = 0; struct migration_target_control mtc = { .nid = zone_to_nid(cc->zone), - .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, + .gfp_mask = cc->gfp_mask, }; struct page *page; unsigned long total_mapped = 0; @@ -6708,6 +6708,39 @@ static void split_free_pages(struct list_head *list) } } +static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask) +{ + const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM; + const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN; + const gfp_t cc_action_mask = __GFP_RETRY_MAYFAIL | __GFP_NOWARN; + + /* + * We are given the range to allocate; node, mobility and placement + * hints are irrelevant at this point. We'll simply ignore them. + */ + gfp_mask &= ~(GFP_ZONEMASK | __GFP_RECLAIMABLE | __GFP_WRITE | + __GFP_HARDWALL | __GFP_THISNODE | __GFP_MOVABLE); + + /* + * We only support most reclaim flags (but not NOFAIL/NORETRY), and + * selected action flags. + */ + if (gfp_mask & ~(reclaim_mask | action_mask)) + return -EINVAL; + + /* + * Flags to control page compaction/migration/reclaim, to free up our + * page range. Migratable pages are movable, __GFP_MOVABLE is implied + * for them. + * + * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set, + * keep doing that to not degrade callers. + */ + *gfp_cc_mask = (gfp_mask & (reclaim_mask | cc_action_mask)) | + __GFP_HARDWALL | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL; + return 0; +} + /** * alloc_contig_range() -- tries to allocate given range of pages * @start: start PFN to allocate @@ -6716,7 +6749,9 @@ static void split_free_pages(struct list_head *list) * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks * in range must have the same migratetype and it must * be either of the two. - * @gfp_mask: GFP mask to use during compaction + * @gfp_mask: GFP mask. Node/zone/placement hints are ignored; only some + * action and reclaim modifiers are supported. Reclaim modifiers + * control allocation behavior during compaction/migration/reclaim. * * The PFN range does not have to be pageblock aligned. The PFN range must * belong to a single zone. @@ -6742,11 +6777,14 @@ int alloc_contig_range(unsigned long start, unsigned long end, .mode = MIGRATE_SYNC, .ignore_skip_hint = true, .no_set_skip_hint = true, - .gfp_mask = current_gfp_context(gfp_mask), .alloc_contig = true, }; INIT_LIST_HEAD(&cc.migratepages); + gfp_mask = current_gfp_context(gfp_mask); + if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask)) + return -EINVAL; + /* * What we do here is we mark all pageblocks in range as * MIGRATE_ISOLATE. Because pageblock and max order pages may @@ -6898,7 +6936,9 @@ static bool zone_spans_last_pfn(const struct zone *zone, /** * alloc_contig_pages() -- tries to find and allocate contiguous range of pages * @nr_pages: Number of contiguous pages to allocate - * @gfp_mask: GFP mask to limit search and used during compaction + * @gfp_mask: GFP mask. Node/zone/placement hints limit the search; only some + * action and reclaim modifiers are supported. Reclaim modifiers + * control allocation behavior during compaction/migration/reclaim. * @nid: Target node * @nodemask: Mask for other possible nodes * -- 2.43.0

From: David Hildenbrand <david@redhat.com> mainline inclusion from mainline-v6.14-rc1 commit 7b755570064fcb9cde37afd48f6bc65151097ba7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In the __GFP_COMP case, we already pass the gfp_flags to prep_new_page()->post_alloc_hook(). However, in the !__GFP_COMP case, we essentially pass only hardcoded __GFP_MOVABLE to post_alloc_hook(), preventing some action modifiers from being effective.. Let's pass our now properly adjusted gfp flags there as well. This way, we can now support __GFP_ZERO for alloc_contig_*(). As a side effect, we now also support __GFP_SKIP_ZERO and__GFP_ZEROTAGS; but we'll keep the more special stuff (KASAN, NOLOCKDEP) disabled for now. It's worth noting that with __GFP_ZERO, we might unnecessarily zero pages when we have to release part of our range using free_contig_range() again. This can be optimized in the future, if ever required; the caller we'll be converting (powernv/memtrace) next won't trigger this. Link: https://lkml.kernel.org/r/20241203094732.200195-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f1901eaf711f..eaae4051f2c7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6683,7 +6683,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, return (ret < 0) ? ret : 0; } -static void split_free_pages(struct list_head *list) +static void split_free_pages(struct list_head *list, gfp_t gfp_mask) { int order; @@ -6694,7 +6694,7 @@ static void split_free_pages(struct list_head *list) list_for_each_entry_safe(page, next, &list[order], lru) { int i; - post_alloc_hook(page, order, __GFP_MOVABLE); + post_alloc_hook(page, order, gfp_mask); if (!order) continue; @@ -6711,7 +6711,8 @@ static void split_free_pages(struct list_head *list) static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask) { const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM; - const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN; + const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN | + __GFP_ZERO | __GFP_ZEROTAGS | __GFP_SKIP_ZERO; const gfp_t cc_action_mask = __GFP_RETRY_MAYFAIL | __GFP_NOWARN; /* @@ -6869,7 +6870,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, } if (!(gfp_mask & __GFP_COMP)) { - split_free_pages(cc.freepages); + split_free_pages(cc.freepages, gfp_mask); /* Free head and tail (if any) */ if (start != outer_start) -- 2.43.0

From: David Hildenbrand <david@redhat.com> mainline inclusion from mainline-v6.13-rc4 commit faeec8e23c10bd30e8aa759a2eb3018dae00f924 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In split_large_buddy(), we might call pfn_to_page() on a PFN that might not exist. In corner cases, such as when freeing the highest pageblock in the last memory section, this could result with CONFIG_SPARSEMEM && !CONFIG_SPARSEMEM_EXTREME in __pfn_to_section() returning NULL and and __section_mem_map_addr() dereferencing that NULL pointer. Let's fix it, and avoid doing a pfn_to_page() call for the first iteration, where we already have the page. So far this was found by code inspection, but let's just CC stable as the fix is easy. Link: https://lkml.kernel.org/r/20241210093437.174413-1-david@redhat.com Fixes: fd919a85cd55 ("mm: page_isolation: prepare for hygienic freelists") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Closes: https://lkml.kernel.org/r/e1a898ba-a717-4d20-9144-29df1a6c8813@suse.cz Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index eaae4051f2c7..ba8d280c3b72 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1227,13 +1227,15 @@ static void split_large_buddy(struct zone *zone, struct page *page, if (order > pageblock_order) order = pageblock_order; - while (pfn != end) { + do { int mt = get_pfnblock_migratetype(page, pfn); __free_one_page(page, pfn, zone, order, mt, fpi); pfn += 1 << order; + if (pfn == end) + break; page = pfn_to_page(pfn); - } + } while (1); } static void free_one_page(struct zone *zone, struct page *page, -- 2.43.0

mainline inclusion from mainline-v6.15-rc1 commit c2f6ea38fc1b640aa7a2e155cc1c0410ff91afa2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The fallback code searches for the biggest buddy first in an attempt to steal the whole block and encourage type grouping down the line. The approach used to be this: - Non-movable requests will split the largest buddy and steal the remainder. This splits up contiguity, but it allows subsequent requests of this type to fall back into adjacent space. - Movable requests go and look for the smallest buddy instead. The thinking is that movable requests can be compacted, so grouping is less important than retaining contiguity. c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") enforces freelist type hygiene, which restricts stealing to either claiming the whole block or just taking the requested chunk; no additional pages or buddy remainders can be stolen any more. The patch mishandled when to switch to finding the smallest buddy in that new reality. As a result, it may steal the exact request size, but from the biggest buddy. This causes fracturing for no good reason. Fix this by committing to the new behavior: either steal the whole block, or fall back to the smallest buddy. Remove single-page stealing from steal_suitable_fallback(). Rename it to try_to_steal_block() to make the intentions clear. If this fails, always fall back to the smallest buddy. The following is from 4 runs of mmtest's thpchallenge. "Pollute" is single page fallback, "steal" is conversion of a partially used block. The numbers for free block conversions (omitted) are comparable. vanilla patched @pollute[unmovable from reclaimable]: 27 106 @pollute[unmovable from movable]: 82 46 @pollute[reclaimable from unmovable]: 256 83 @pollute[reclaimable from movable]: 46 8 @pollute[movable from unmovable]: 4841 868 @pollute[movable from reclaimable]: 5278 12568 @steal[unmovable from reclaimable]: 11 12 @steal[unmovable from movable]: 113 49 @steal[reclaimable from unmovable]: 19 34 @steal[reclaimable from movable]: 47 21 @steal[movable from unmovable]: 250 183 @steal[movable from reclaimable]: 81 93 The allocator appears to do a better job at keeping stealing and polluting to the first fallback preference. As a result, the numbers for "from movable" - the least preferred fallback option, and most detrimental to compactability - are down across the board. Link: https://lkml.kernel.org/r/20250225001023.1494422-2-hannes@cmpxchg.org Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts due to MAX_ORDER] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 80 +++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 46 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ba8d280c3b72..eecb3600fac3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1899,13 +1899,12 @@ static inline bool boost_watermark(struct zone *zone) * can claim the whole pageblock for the requested migratetype. If not, we check * the pageblock for constituent pages; if at least half of the pages are free * or compatible, we can still claim the whole block, so pages freed in the - * future will be put on the correct free list. Otherwise, we isolate exactly - * the order we need from the fallback block and leave its migratetype alone. + * future will be put on the correct free list. */ static struct page * -steal_suitable_fallback(struct zone *zone, struct page *page, - int current_order, int order, int start_type, - unsigned int alloc_flags, bool whole_block) +try_to_steal_block(struct zone *zone, struct page *page, + int current_order, int order, int start_type, + unsigned int alloc_flags) { int free_pages, movable_pages, alike_pages; unsigned long start_pfn; @@ -1918,7 +1917,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, * highatomic accounting. */ if (is_migrate_highatomic(block_type)) - goto single_page; + return NULL; /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { @@ -1939,14 +1938,10 @@ steal_suitable_fallback(struct zone *zone, struct page *page, if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD)) set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); - /* We are not allowed to try stealing from the whole block */ - if (!whole_block) - goto single_page; - /* moving whole block can fail due to zone boundary conditions */ if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages, &movable_pages)) - goto single_page; + return NULL; /* * Determine how many pages are compatible with our allocation. @@ -1979,9 +1974,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, return __rmqueue_smallest(zone, order, start_type); } -single_page: - page_del_and_expand(zone, page, order, current_order, block_type); - return page; + return NULL; } /* @@ -2163,14 +2156,19 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, } /* - * Try finding a free buddy page on the fallback list and put it on the free - * list of requested migratetype, possibly along with other pages from the same - * block, depending on fragmentation avoidance heuristics. Returns true if - * fallback was found so that __rmqueue_smallest() can grab it. + * Try finding a free buddy page on the fallback list. + * + * This will attempt to steal a whole pageblock for the requested type + * to ensure grouping of such requests in the future. + * + * If a whole block cannot be stolen, regress to __rmqueue_smallest() + * logic to at least break up as little contiguity as possible. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. + * + * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, @@ -2204,45 +2202,35 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, if (fallback_mt == -1) continue; - /* - * We cannot steal all free pages from the pageblock and the - * requested migratetype is movable. In that case it's better to - * steal and split the smallest available page instead of the - * largest available page, because even if the next movable - * allocation falls back into a different pageblock than this - * one, it won't cause permanent fragmentation. - */ - if (!can_steal && start_migratetype == MIGRATE_MOVABLE - && current_order > order) - goto find_smallest; + if (!can_steal) + break; - goto do_steal; + page = get_page_from_free_area(area, fallback_mt); + page = try_to_steal_block(zone, page, current_order, order, + start_migratetype, alloc_flags); + if (page) + goto got_one; } - return NULL; + if (alloc_flags & ALLOC_NOFRAGMENT) + return NULL; -find_smallest: + /* No luck stealing blocks. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, start_migratetype, false, &can_steal); - if (fallback_mt != -1) - break; - } - - /* - * This should not happen - we already found a suitable fallback - * when looking for the largest page. - */ - VM_BUG_ON(current_order > MAX_ORDER); + if (fallback_mt == -1) + continue; -do_steal: - page = get_page_from_free_area(area, fallback_mt); + page = get_page_from_free_area(area, fallback_mt); + page_del_and_expand(zone, page, order, current_order, fallback_mt); + goto got_one; + } - /* take off list, maybe claim block, expand remainder */ - page = steal_suitable_fallback(zone, page, current_order, order, - start_migratetype, alloc_flags, can_steal); + return NULL; +got_one: trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, fallback_mt); -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.15-rc1 commit 020396a581dc69be2d30939fabde6c029d847034 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The freelist hygiene patches made migratetype accesses fully protected under the zone->lock. Remove remnants of handling the race conditions that existed before from the MIGRATE_HIGHATOMIC code. Link: https://lkml.kernel.org/r/20250225001023.1494422-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 50 ++++++++++++++++--------------------------------- 1 file changed, 16 insertions(+), 34 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index eecb3600fac3..a82685824455 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1904,20 +1904,10 @@ static inline bool boost_watermark(struct zone *zone) static struct page * try_to_steal_block(struct zone *zone, struct page *page, int current_order, int order, int start_type, - unsigned int alloc_flags) + int block_type, unsigned int alloc_flags) { int free_pages, movable_pages, alike_pages; unsigned long start_pfn; - int block_type; - - block_type = get_pageblock_migratetype(page); - - /* - * This can happen due to races and we want to prevent broken - * highatomic accounting. - */ - if (is_migrate_highatomic(block_type)) - return NULL; /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { @@ -2092,33 +2082,22 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); - int mt; + unsigned long size; page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); if (!page) continue; - mt = get_pageblock_migratetype(page); /* - * In page freeing path, migratetype change is racy so - * we can counter several free pages in a pageblock - * in this loop although we changed the pageblock type - * from highatomic to ac->migratetype. So we should - * adjust the count once. + * It should never happen but changes to + * locking could inadvertently allow a per-cpu + * drain to add pages to MIGRATE_HIGHATOMIC + * while unreserving so be safe and watch for + * underflows. */ - if (is_migrate_highatomic(mt)) { - unsigned long size; - /* - * It should never happen but changes to - * locking could inadvertently allow a per-cpu - * drain to add pages to MIGRATE_HIGHATOMIC - * while unreserving so be safe and watch for - * underflows. - */ - size = max(pageblock_nr_pages, 1UL << order); - size = min(size, zone->nr_reserved_highatomic); - zone->nr_reserved_highatomic -= size; - } + size = max(pageblock_nr_pages, 1UL << order); + size = min(size, zone->nr_reserved_highatomic); + zone->nr_reserved_highatomic -= size; /* * Convert to ac->migratetype and avoid the normal @@ -2130,10 +2109,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * may increase. */ if (order < pageblock_order) - ret = move_freepages_block(zone, page, mt, + ret = move_freepages_block(zone, page, + MIGRATE_HIGHATOMIC, ac->migratetype); else { - move_to_free_list(page, zone, order, mt, + move_to_free_list(page, zone, order, + MIGRATE_HIGHATOMIC, ac->migratetype); change_pageblock_range(page, order, ac->migratetype); @@ -2207,7 +2188,8 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = get_page_from_free_area(area, fallback_mt); page = try_to_steal_block(zone, page, current_order, order, - start_migratetype, alloc_flags); + start_migratetype, fallback_mt, + alloc_flags); if (page) goto got_one; } -- 2.43.0

From: gaoxiang17 <gaoxiang17@xiaomi.com> mainline inclusion from mainline-v6.14-rc1 commit 6025ea5abbe5d813d6a41c78e6ea14259fb503f4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- [akpm@linux-foundation.org: tweak grammar, fit to 80 cols] Link: https://lkml.kernel.org/r/20240920122030.159751-1-gxxa03070307@gmail.com Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a82685824455..867a346b2fa8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1848,6 +1848,14 @@ static bool can_steal_fallback(unsigned int order, int start_mt) if (order >= pageblock_order) return true; + /* + * Movable pages won't cause permanent fragmentation, so when you alloc + * small pages, you just need to temporarily steal unmovable or + * reclaimable pages that are closest to the request size. After a + * while, memory compaction may occur to form large contiguous pages, + * and the next movable allocation may not need to steal. Unmovable and + * reclaimable allocations need to actually steal pages. + */ if (order >= pageblock_order / 2 || start_mt == MIGRATE_RECLAIMABLE || start_mt == MIGRATE_UNMOVABLE || -- 2.43.0

From: Brendan Jackman <jackmanb@google.com> mainline inclusion from mainline-v6.15-rc1 commit e47f1f56dd82cc6d91f5c4d914a534aa03cd12ca category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "mm/page_alloc: Some clarifications for migratetype fallback", v4. A couple of patches to try and make the code easier to follow. This patch (of 2): This code is rather confusing because: 1. "Steal" is sometimes used to refer to the general concept of allocating from a from a block of a fallback migratetype (steal_suitable_fallback()) but sometimes it refers specifically to converting a whole block's migratetype (can_steal_fallback()). 2. can_steal_fallback() sounds as though it's answering the question "am I functionally permitted to allocate from that other type" but in fact it is encoding a heuristic preference. 3. The same piece of data has different names in different places: can_steal vs whole_block. This reinforces point 2 because it looks like the different names reflect a shift in intent from "am I allowed to steal" to "do I want to steal", but no such shift exists. Fix 1. by avoiding the term "steal" in ambiguous contexts. Start using the term "claim" to refer to the special case of stealing the entire block. Fix 2. by using "should" instead of "can", and also rename its parameters and add some commentary to make it more explicit what they mean. Fix 3. by adopting the new "claim" terminology universally for this set of variables. Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-0-cb2ef1a4e610@google.co... Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-1-cb2ef1a4e610@google.co... Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/internal.h mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/compaction.c | 4 +-- mm/internal.h | 2 +- mm/page_alloc.c | 72 ++++++++++++++++++++++++------------------------- 3 files changed, 39 insertions(+), 39 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index e9053f27711c..98be4f7c07dd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2302,7 +2302,7 @@ static enum compact_result __compact_finished(struct compact_control *cc) ret = COMPACT_NO_SUITABLE_PAGE; for (order = cc->order; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &cc->zone->free_area[order]; - bool can_steal; + bool claim_block; /* Job done if page is free of the right migratetype */ if (!free_area_empty(area, migratetype)) @@ -2319,7 +2319,7 @@ static enum compact_result __compact_finished(struct compact_control *cc) * other migratetype buddy lists. */ if (find_suitable_fallback(area, order, migratetype, - true, &can_steal) != -1) + true, &claim_block) != -1) /* * Movable pages are OK in any pageblock. If we are * stealing for a non-movable allocation, make sure diff --git a/mm/internal.h b/mm/internal.h index a6272a6dea06..3e68202070af 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -790,7 +790,7 @@ void init_cma_reserved_pageblock(struct page *page); #endif /* CONFIG_COMPACTION || CONFIG_CMA */ int find_suitable_fallback(struct free_area *area, unsigned int order, - int migratetype, bool only_stealable, bool *can_steal); + int migratetype, bool claim_only, bool *claim_block); static inline bool free_area_empty(struct free_area *area, int migratetype) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 867a346b2fa8..83f8ec48f2ef 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1826,22 +1826,22 @@ static void change_pageblock_range(struct page *pageblock_page, /* * When we are falling back to another migratetype during allocation, try to - * steal extra free pages from the same pageblocks to satisfy further - * allocations, instead of polluting multiple pageblocks. + * claim entire blocks to satisfy further allocations, instead of polluting + * multiple pageblocks. * - * If we are stealing a relatively large buddy page, it is likely there will - * be more free pages in the pageblock, so try to steal them all. For - * reclaimable and unmovable allocations, we steal regardless of page size, - * as fragmentation caused by those allocations polluting movable pageblocks - * is worse than movable allocations stealing from unmovable and reclaimable - * pageblocks. + * If we are stealing a relatively large buddy page, it is likely there will be + * more free pages in the pageblock, so try to claim the whole block. For + * reclaimable and unmovable allocations, we try to claim the whole block + * regardless of page size, as fragmentation caused by those allocations + * polluting movable pageblocks is worse than movable allocations stealing from + * unmovable and reclaimable pageblocks. */ -static bool can_steal_fallback(unsigned int order, int start_mt) +static bool should_try_claim_block(unsigned int order, int start_mt) { /* * Leaving this order check is intended, although there is * relaxed order check in next check. The reason is that - * we can actually steal whole pageblock if this condition met, + * we can actually claim the whole pageblock if this condition met, * but, below check doesn't guarantee it and that is just heuristic * so could be changed anytime. */ @@ -1854,7 +1854,7 @@ static bool can_steal_fallback(unsigned int order, int start_mt) * reclaimable pages that are closest to the request size. After a * while, memory compaction may occur to form large contiguous pages, * and the next movable allocation may not need to steal. Unmovable and - * reclaimable allocations need to actually steal pages. + * reclaimable allocations need to actually claim the whole block. */ if (order >= pageblock_order / 2 || start_mt == MIGRATE_RECLAIMABLE || @@ -1903,14 +1903,14 @@ static inline bool boost_watermark(struct zone *zone) } /* - * This function implements actual steal behaviour. If order is large enough, we - * can claim the whole pageblock for the requested migratetype. If not, we check - * the pageblock for constituent pages; if at least half of the pages are free - * or compatible, we can still claim the whole block, so pages freed in the - * future will be put on the correct free list. + * This function implements actual block claiming behaviour. If order is large + * enough, we can claim the whole pageblock for the requested migratetype. If + * not, we check the pageblock for constituent pages; if at least half of the + * pages are free or compatible, we can still claim the whole block, so pages + * freed in the future will be put on the correct free list. */ static struct page * -try_to_steal_block(struct zone *zone, struct page *page, +try_to_claim_block(struct zone *zone, struct page *page, int current_order, int order, int start_type, int block_type, unsigned int alloc_flags) { @@ -1977,12 +1977,14 @@ try_to_steal_block(struct zone *zone, struct page *page, /* * Check whether there is a suitable fallback freepage with requested order. - * If only_stealable is true, this function returns fallback_mt only if - * we can steal other freepages all together. This would help to reduce + * Sets *claim_block to instruct the caller whether it should convert a whole + * pageblock to the returned migratetype. + * If only_claim is true, this function returns fallback_mt only if + * we would do this whole-block claiming. This would help to reduce * fragmentation due to mixed migratetype pages in one pageblock. */ int find_suitable_fallback(struct free_area *area, unsigned int order, - int migratetype, bool only_stealable, bool *can_steal) + int migratetype, bool only_claim, bool *claim_block) { int i; int fallback_mt; @@ -1990,19 +1992,16 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (area->nr_free == 0) return -1; - *can_steal = false; + *claim_block = false; for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) { fallback_mt = fallbacks[migratetype][i]; if (free_area_empty(area, fallback_mt)) continue; - if (can_steal_fallback(order, migratetype)) - *can_steal = true; + if (should_try_claim_block(order, migratetype)) + *claim_block = true; - if (!only_stealable) - return fallback_mt; - - if (*can_steal) + if (*claim_block || !only_claim) return fallback_mt; } @@ -2147,11 +2146,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, /* * Try finding a free buddy page on the fallback list. * - * This will attempt to steal a whole pageblock for the requested type + * This will attempt to claim a whole pageblock for the requested type * to ensure grouping of such requests in the future. * - * If a whole block cannot be stolen, regress to __rmqueue_smallest() - * logic to at least break up as little contiguity as possible. + * If a whole block cannot be claimed, steal an individual page, regressing to + * __rmqueue_smallest() logic to at least break up as little contiguity as + * possible. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop @@ -2168,7 +2168,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, int min_order = order; struct page *page; int fallback_mt; - bool can_steal; + bool claim_block; /* * Do not steal pages from freelists belonging to other pageblocks @@ -2187,15 +2187,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, --current_order) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, - start_migratetype, false, &can_steal); + start_migratetype, false, &claim_block); if (fallback_mt == -1) continue; - if (!can_steal) + if (!claim_block) break; page = get_page_from_free_area(area, fallback_mt); - page = try_to_steal_block(zone, page, current_order, order, + page = try_to_claim_block(zone, page, current_order, order, start_migratetype, fallback_mt, alloc_flags); if (page) @@ -2205,11 +2205,11 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, if (alloc_flags & ALLOC_NOFRAGMENT) return NULL; - /* No luck stealing blocks. Find the smallest fallback page */ + /* No luck claiming pageblock. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, - start_migratetype, false, &can_steal); + start_migratetype, false, &claim_block); if (fallback_mt == -1) continue; -- 2.43.0

From: Brendan Jackman <jackmanb@google.com> mainline inclusion from mainline-v6.13-rc4 commit a14efee04796dd3f614eaf5348ca1ac099c21349 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There's lots of text here but it's a little hard to follow, this is an attempt to break it up and align its structure more closely with the code. Reword the top-level function comment to just explain what question the function answers from the point of view of the caller. Break up the internal logic into different sections that can have their own commentary describing why that part of the rationale is present. Note the page_group_by_mobility_disabled logic is not explained in the commentary, that is outside the scope of this patch... Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-2-cb2ef1a4e610@google.co... Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 46 ++++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 83f8ec48f2ef..d934070471ba 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1825,16 +1825,9 @@ static void change_pageblock_range(struct page *pageblock_page, } /* - * When we are falling back to another migratetype during allocation, try to - * claim entire blocks to satisfy further allocations, instead of polluting - * multiple pageblocks. - * - * If we are stealing a relatively large buddy page, it is likely there will be - * more free pages in the pageblock, so try to claim the whole block. For - * reclaimable and unmovable allocations, we try to claim the whole block - * regardless of page size, as fragmentation caused by those allocations - * polluting movable pageblocks is worse than movable allocations stealing from - * unmovable and reclaimable pageblocks. + * When we are falling back to another migratetype during allocation, should we + * try to claim an entire block to satisfy further allocations, instead of + * polluting multiple pageblocks? */ static bool should_try_claim_block(unsigned int order, int start_mt) { @@ -1849,19 +1842,32 @@ static bool should_try_claim_block(unsigned int order, int start_mt) return true; /* - * Movable pages won't cause permanent fragmentation, so when you alloc - * small pages, you just need to temporarily steal unmovable or - * reclaimable pages that are closest to the request size. After a - * while, memory compaction may occur to form large contiguous pages, - * and the next movable allocation may not need to steal. Unmovable and - * reclaimable allocations need to actually claim the whole block. + * Above a certain threshold, always try to claim, as it's likely there + * will be more free pages in the pageblock. + */ + if (order >= pageblock_order / 2) + return true; + + /* + * Unmovable/reclaimable allocations would cause permanent + * fragmentations if they fell back to allocating from a movable block + * (polluting it), so we try to claim the whole block regardless of the + * allocation size. Later movable allocations can always steal from this + * block, which is less problematic. */ - if (order >= pageblock_order / 2 || - start_mt == MIGRATE_RECLAIMABLE || - start_mt == MIGRATE_UNMOVABLE || - page_group_by_mobility_disabled) + if (start_mt == MIGRATE_RECLAIMABLE || start_mt == MIGRATE_UNMOVABLE) return true; + if (page_group_by_mobility_disabled) + return true; + + /* + * Movable pages won't cause permanent fragmentation, so when you alloc + * small pages, we just need to temporarily steal unmovable or + * reclaimable pages that are closest to the request size. After a + * while, memory compaction may occur to form large contiguous pages, + * and the next movable allocation may not need to steal. + */ return false; } -- 2.43.0

From: Johannes Weiner <hannes@cmpxchg.org> mainline inclusion from mainline-v6.15-rc3 commit 90abee6d7895d5eef18c91d870d8168be4e76e9d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBC4T0 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") as the root cause of a 56.4% regression in vm-scalability::lru-file-mmap-read. Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion"), as the root cause for a regression in worst-case zone->lock+irqoff hold times. Both of these patches modify the page allocator's fallback path to be less greedy in an effort to stave off fragmentation. The flip side of this is that fallbacks are also less productive each time around, which means the fallback search can run much more frequently. Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the percpu cache by allocating a large batch of pages in a loop. It highlights how once the native freelists are exhausted, the fallback code first scans orders top-down for whole blocks to claim, then falls back to a bottom-up search for the smallest buddy to steal. For the next batch page, it goes through the same thing again. This can be made more efficient. Since rmqueue_bulk() holds the zone->lock over the entire batch, the freelists are not subject to outside changes; when the search for a block to claim has already failed, there is no point in trying again for the next page. Modify __rmqueue() to remember the last successful fallback mode, and restart directly from there on the next rmqueue_bulk() iteration. Oliver confirms that this improves beyond the regression that the test robot reported against c2f6ea38fc1b: commit: f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap") c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy") acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput Carlos confirms that worst-case times are almost fully recovered compared to before the earlier culprit patch: 2dd482ba627d (before freelist hygiene): 1ms c0cd6f557b90 (after freelist hygiene): 90ms next-20250319 (steal smallest buddy): 280ms this patch : 8ms [jackmanb@google.com: comment updates] Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com [hannes@cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin] Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Brendan Jackman <jackmanb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Carlos Song <carlos.song@nxp.com> Tested-by: Carlos Song <carlos.song@nxp.com> Tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com Reviewed-by: Brendan Jackman <jackmanb@google.com> Tested-by: Shivank Garg <shivankg@amd.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/page_alloc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/page_alloc.c | 113 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 79 insertions(+), 34 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d934070471ba..385f6e48e44b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2150,23 +2150,15 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, } /* - * Try finding a free buddy page on the fallback list. - * - * This will attempt to claim a whole pageblock for the requested type - * to ensure grouping of such requests in the future. - * - * If a whole block cannot be claimed, steal an individual page, regressing to - * __rmqueue_smallest() logic to at least break up as little contiguity as - * possible. + * Try to allocate from some fallback migratetype by claiming the entire block, + * i.e. converting it to the allocation's start migratetype. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. - * - * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype, +__rmqueue_claim(struct zone *zone, int order, int start_migratetype, unsigned int alloc_flags) { struct free_area *area; @@ -2204,14 +2196,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = try_to_claim_block(zone, page, current_order, order, start_migratetype, fallback_mt, alloc_flags); - if (page) - goto got_one; + if (page) { + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; + } } - if (alloc_flags & ALLOC_NOFRAGMENT) - return NULL; + return NULL; +} + +/* + * Try to steal a single page from some fallback migratetype. Leave the rest of + * the block as its current migratetype, potentially causing fragmentation. + */ +static __always_inline struct page * +__rmqueue_steal(struct zone *zone, int order, int start_migratetype) +{ + struct free_area *area; + int current_order; + struct page *page; + int fallback_mt; + bool claim_block; - /* No luck claiming pageblock. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, @@ -2221,25 +2228,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = get_page_from_free_area(area, fallback_mt); page_del_and_expand(zone, page, order, current_order, fallback_mt); - goto got_one; + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; } return NULL; - -got_one: - trace_mm_page_alloc_extfrag(page, order, current_order, - start_migratetype, fallback_mt); - - return page; } +enum rmqueue_mode { + RMQUEUE_NORMAL, + RMQUEUE_CMA, + RMQUEUE_CLAIM, + RMQUEUE_STEAL, +}; + /* * Do the hard work of removing an element from the buddy allocator. * Call me with the zone->lock already held. */ static __always_inline struct page * __rmqueue(struct zone *zone, unsigned int order, int migratetype, - unsigned int alloc_flags) + unsigned int alloc_flags, enum rmqueue_mode *mode) { struct page *page; @@ -2258,16 +2268,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, } } - page = __rmqueue_smallest(zone, order, migratetype); - if (unlikely(!page)) { - if (alloc_flags & ALLOC_CMA) + /* + * First try the freelists of the requested migratetype, then try + * fallbacks modes with increasing levels of fragmentation risk. + * + * The fallback logic is expensive and rmqueue_bulk() calls in + * a loop with the zone->lock held, meaning the freelists are + * not subject to any outside changes. Remember in *mode where + * we found pay dirt, to save us the search on the next call. + */ + switch (*mode) { + case RMQUEUE_NORMAL: + page = __rmqueue_smallest(zone, order, migratetype); + if (page) + return page; + fallthrough; + case RMQUEUE_CMA: + if (alloc_flags & ALLOC_CMA) { page = __rmqueue_cma_fallback(zone, order); - - if (!page) - page = __rmqueue_fallback(zone, order, migratetype, - alloc_flags); + if (page) { + *mode = RMQUEUE_CMA; + return page; + } + } + fallthrough; + case RMQUEUE_CLAIM: + page = __rmqueue_claim(zone, order, migratetype, alloc_flags); + if (page) { + /* Replenished preferred freelist, back to normal mode. */ + *mode = RMQUEUE_NORMAL; + return page; + } + fallthrough; + case RMQUEUE_STEAL: + if (!(alloc_flags & ALLOC_NOFRAGMENT)) { + page = __rmqueue_steal(zone, order, migratetype); + if (page) { + *mode = RMQUEUE_STEAL; + return page; + } + } } - return page; + return NULL; } /* @@ -2279,13 +2321,14 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, unsigned int alloc_flags) { + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; unsigned long flags; int i; spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, - alloc_flags); + alloc_flags, &rmqm); if (unlikely(page == NULL)) break; @@ -2896,7 +2939,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { - page = __rmqueue(zone, order, migratetype, alloc_flags); + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; + + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm); /* * If the allocation fails, allow OOM handling and -- 2.43.0

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/17857 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZV7... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/17857 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZV7...
participants (2)
-
Jinjiang Tu
-
patchwork bot