From: David Hildenbrand david@redhat.com
mainline inclusion from mainline-v5.10-rc1 commit 7fef431be9c9ac255838a9578331567b9dba4477 category: feature bugzilla: 182882 CVE: NA
__free_pages_core() is used when exposing fresh memory to the buddy during system boot and when onlining memory in generic_online_page().
generic_online_page() is used in two cases:
1. Direct memory onlining in online_pages(). 2. Deferred memory onlining in memory-ballooning-like mechanisms (HyperV balloon and virtio-mem), when parts of a section are kept fake-offline to be fake-onlined later on.
In 1, we already place pages to the tail of the freelist. Pages will be freed to MIGRATE_ISOLATE lists first and moved to the tail of the freelists via undo_isolate_page_range().
In 2, we currently don't implement a proper rule. In case of virtio-mem, where we currently always online MAX_ORDER - 1 pages, the pages will be placed to the HEAD of the freelist - undesireable. While the hyper-v balloon calls generic_online_page() with single pages, usually it will call it on successive single pages in a larger block.
The pages are fresh, so place them to the tail of the freelist and avoid the PCP. In __free_pages_core(), remove the now superflouos call to set_page_refcounted() and add a comment regarding page initialization and the refcount.
Note: In 2. we currently don't shuffle. If ever relevant (page shuffling is usually of limited use in virtualized environments), we might want to shuffle after a sequence of generic_online_page() calls in the relevant callers.
Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Oscar Salvador osalvador@suse.de Reviewed-by: Wei Yang richard.weiyang@linux.alibaba.com Acked-by: Pankaj Gupta pankaj.gupta.linux@gmail.com Acked-by: Michal Hocko mhocko@suse.com Cc: Alexander Duyck alexander.h.duyck@linux.intel.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Dave Hansen dave.hansen@intel.com Cc: Mike Rapoport rppt@kernel.org Cc: "K. Y. Srinivasan" kys@microsoft.com Cc: Haiyang Zhang haiyangz@microsoft.com Cc: Stephen Hemminger sthemmin@microsoft.com Cc: Wei Liu wei.liu@kernel.org Cc: Matthew Wilcox willy@infradead.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Michal Hocko mhocko@kernel.org Cc: Scott Cheloha cheloha@linux.ibm.com Link: https://lkml.kernel.org/r/20201005121534.15649-5-david@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/page_alloc.c [Peng Liu: adjust context] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_alloc.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9baa829f6a29c..a914fc3b589be 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -221,7 +221,8 @@ bool pm_suspended_storage(void) unsigned int pageblock_order __read_mostly; #endif
-static void __free_pages_ok(struct page *page, unsigned int order); +static void __free_pages_ok(struct page *page, unsigned int order, + fpi_t fpi_flags);
/* * results with 256, 32 in the lowmem_reserve sysctl: @@ -622,7 +623,7 @@ static void bad_page(struct page *page, const char *reason, void free_compound_page(struct page *page) { mem_cgroup_uncharge(page); - __free_pages_ok(page, compound_order(page)); + __free_pages_ok(page, compound_order(page), FPI_NONE); }
void prep_compound_page(struct page *page, unsigned int order) @@ -1225,14 +1226,14 @@ static void free_pcppages_bulk(struct zone *zone, int count, static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, - int migratetype) + int migratetype, fpi_t fpi_flags) { spin_lock(&zone->lock); if (unlikely(has_isolate_pageblock(zone) || is_migrate_isolate(migratetype))) { migratetype = get_pfnblock_migratetype(page, pfn); } - __free_one_page(page, pfn, zone, order, migratetype, FPI_NONE); + __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock(&zone->lock); }
@@ -1304,7 +1305,8 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) } }
-static void __free_pages_ok(struct page *page, unsigned int order) +static void __free_pages_ok(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long flags; int migratetype; @@ -1316,7 +1318,8 @@ static void __free_pages_ok(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); - free_one_page(page_zone(page), page, pfn, order, migratetype); + free_one_page(page_zone(page), page, pfn, order, migratetype, + fpi_flags); local_irq_restore(flags); }
@@ -1326,6 +1329,11 @@ void __free_pages_core(struct page *page, unsigned int order) struct page *p = page; unsigned int loop;
+ /* + * When initializing the memmap, __init_single_page() sets the refcount + * of all pages to 1 ("allocated"/"not free"). We have to set the + * refcount of all involved pages to 0. + */ prefetchw(p); for (loop = 0; loop < (nr_pages - 1); loop++, p++) { prefetchw(p + 1); @@ -1335,8 +1343,11 @@ void __free_pages_core(struct page *page, unsigned int order) __ClearPageReserved(p); set_page_count(p, 0);
- set_page_refcounted(page); - __free_pages(page, order); + /* + * Bypass PCP and place fresh pages right to the tail, primarily + * relevant for memory onlining. + */ + __free_pages_ok(page, order, FPI_TO_TAIL); }
#if defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) || \ @@ -2892,7 +2903,8 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) */ if (migratetype >= MIGRATE_PCPTYPES) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, 0, migratetype); + free_one_page(zone, page, pfn, 0, migratetype, + FPI_NONE); return; } migratetype = MIGRATE_MOVABLE; @@ -4645,7 +4657,7 @@ static inline void free_the_page(struct page *page, unsigned int order) if (order == 0) /* Via pcp? */ free_unref_page(page); else - __free_pages_ok(page, order); + __free_pages_ok(page, order, FPI_NONE); }
void __free_pages(struct page *page, unsigned int order)