Gavin Shan (1): mm: migrate: fix THP's mapcount on isolation
Kemeng Shi (1): mm/compaction: correctly return failure with bogus compound_order in strict mode
Liu Shixin (2): bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page bootmem: use kmemleak_free_part_phys in free_bootmem_page
Naoya Horiguchi (1): mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
Xueshi Hu (1): mm/hugetlb: fix nodes huge page allocation when there are surplus pages
Yuan Can (1): mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
include/linux/bootmem_info.h | 2 ++ mm/compaction.c | 28 ++++++++++++++-------------- mm/huge_memory.c | 7 ++----- mm/hugetlb.c | 4 +++- mm/hugetlb_vmemmap.c | 2 +- 5 files changed, 22 insertions(+), 21 deletions(-)
From: Naoya Horiguchi naoya.horiguchi@nec.com
mainline inclusion from mainline-v6.0 commit 2b7aa91ba0e86b8643f5d3c83874c80599c731d7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
NULL pointer dereference is triggered when calling thp split via debugfs on the system with offlined memory blocks. With debug option enabled, the following kernel messages are printed out:
page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000 flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff) raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: unmovable page page:000000007d7ab72e is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1248! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 16 PID: 20964 Comm: bash Tainted: G I 6.0.0-rc3-foll-numa+ #41 ... RIP: 0010:split_huge_pages_write+0xcf4/0xe30
This shows that page_to_nid() in page_zone() is unexpectedly called for an offlined memmap.
Use pfn_to_online_page() to get struct page in PFN walker.
Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: Naoya Horiguchi naoya.horiguchi@nec.com Co-developed-by: David Hildenbrand david@redhat.com Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Oscar Salvador osalvador@suse.de Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org [5.10+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/huge_memory.c [ Context conflicts. ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/huge_memory.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb293d17a1049..e41be42456673 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2882,11 +2882,8 @@ static int split_huge_pages_set(void *data, u64 val) for_each_populated_zone(zone) { max_zone_pfn = zone_end_pfn(zone); for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) { - if (!pfn_valid(pfn)) - continue; - - page = pfn_to_page(pfn); - if (!get_page_unless_zero(page)) + page = pfn_to_online_page(pfn); + if (!page || !get_page_unless_zero(page)) continue;
if (zone != page_zone(page))
From: Gavin Shan gshan@redhat.com
mainline inclusion from mainline-v6.1-rc8 commit 829ae0f81ce093d674ff2256f66a714753e9ce32 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The issue is reported when removing memory through virtio_mem device. The transparent huge page, experienced copy-on-write fault, is wrongly regarded as pinned. The transparent huge page is escaped from being isolated in isolate_migratepages_block(). The transparent huge page can't be migrated and the corresponding memory block can't be put into offline state.
Fix it by replacing page_mapcount() with total_mapcount(). With this, the transparent huge page can be isolated and migrated, and the memory block can be put into offline state. Besides, The page's refcount is increased a bit earlier to avoid the page is released when the check is executed.
Link: https://lkml.kernel.org/r/20221124095523.31061-1-gshan@redhat.com Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") Signed-off-by: Gavin Shan gshan@redhat.com Reported-by: Zhenyu Zhang zhenyzha@redhat.com Tested-by: Zhenyu Zhang zhenyzha@redhat.com Suggested-by: David Hildenbrand david@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Hugh Dickins hughd@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: William Kucharski william.kucharski@oracle.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org [5.7+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/compaction.c [ Context conflicts. ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 94ab4e9d1eb64..fb339ceb5bde6 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -965,29 +965,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_fail; }
+ /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto isolate_fail; + /* * Migration will fail if an anonymous page is pinned in memory, * so avoid taking lru_lock and isolating it unnecessarily in an * admittedly racy check. */ if (!page_mapping(page) && - page_count(page) > page_mapcount(page)) - goto isolate_fail; + (page_count(page) - 1) > total_mapcount(page)) + goto isolate_fail_put;
/* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) - goto isolate_fail; - - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (unlikely(!get_page_unless_zero(page))) - goto isolate_fail; + goto isolate_fail_put;
if (__isolate_lru_page_prepare(page, isolate_mode) != 0) goto isolate_fail_put;
mainline inclusion from mainline-v6.7-rc1 commit 80203f1ca086835100843f1474bd6dd4a48cc73b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem") fix an overlaps existing problem of kmemleak. But the problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in this case, free_bootmem_page() will call free_reserved_page() directly.
Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when HAVE_BOOTMEM_INFO_NODE is disabled.
Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Liu Shixin liushixin2@huawei.com Acked-by: Muchun Song songmuchun@bytedance.com Cc: Matthew Wilcox willy@infradead.org Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- include/linux/bootmem_info.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h index 2bc8b1f69c93c..888eb660d3f6a 100644 --- a/include/linux/bootmem_info.h +++ b/include/linux/bootmem_info.h @@ -3,6 +3,7 @@ #define __LINUX_BOOTMEM_INFO_H
#include <linux/mm.h> +#include <linux/kmemleak.h>
/* * Types for free bootmem stored in page->lru.next. These have to be in @@ -59,6 +60,7 @@ static inline void get_page_bootmem(unsigned long info, struct page *page,
static inline void free_bootmem_page(struct page *page) { + kmemleak_free_part(page_to_virt(page), PAGE_SIZE); free_reserved_page(page); } #endif
mainline inclusion from mainline-v6.7-rc1 commit 80203f1ca086835100843f1474bd6dd4a48cc73b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Since kmemleak_alloc_phys() rather than kmemleak_alloc() was called from memblock_alloc_range_nid(), kmemleak_free_part_phys() should be used to delete kmemleak object in free_bootmem_page(). In debug mode, there are following warning:
kmemleak: Partially freeing unknown object at 0xffff97345aff7000 (size 4096)
Link: https://lkml.kernel.org/r/20231018102952.3339837-3-liushixin2@huawei.com Fixes: 028725e73375 ("bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page") Signed-off-by: Liu Shixin liushixin2@huawei.com Acked-by: Catalin Marinas catalin.marinas@arm.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Patrick Wang patrick.wang.shcn@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- include/linux/bootmem_info.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h index 888eb660d3f6a..7395db4f44b09 100644 --- a/include/linux/bootmem_info.h +++ b/include/linux/bootmem_info.h @@ -60,7 +60,7 @@ static inline void get_page_bootmem(unsigned long info, struct page *page,
static inline void free_bootmem_page(struct page *page) { - kmemleak_free_part(page_to_virt(page), PAGE_SIZE); + kmemleak_free_part_phys(PFN_PHYS(page_to_pfn(page)), PAGE_SIZE); free_reserved_page(page); } #endif
From: Yuan Can yuancan@huawei.com
mainline inclusion from mainline-v6.7-rc1 commit 2eaa6c2abb9dd55041a05c20c451790c124d5cf0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The decreasing of hugetlb pages number failed with the following message given:
sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.6+0x84/0xe4 show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 warn_alloc+0x100/0x1bc __alloc_pages_slowpath.constprop.107+0xa40/0xad8 __alloc_pages+0x244/0x2d0 hugetlb_vmemmap_restore+0x104/0x1e4 __update_and_free_hugetlb_folio+0x44/0x1f4 update_and_free_hugetlb_folio+0x20/0x68 update_and_free_pages_bulk+0x4c/0xac set_max_huge_pages+0x198/0x334 nr_hugepages_store_common+0x118/0x178 nr_hugepages_store+0x18/0x24 kobj_attr_store+0x18/0x2c sysfs_kf_write+0x40/0x54 kernfs_fop_write_iter+0x164/0x1dc vfs_write+0x3a8/0x460 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.1+0x6c/0xe4 do_el0_svc+0x38/0x94 el0_svc+0x28/0x74 el0t_64_sync_handler+0xa0/0xc4 el0t_64_sync+0x174/0x178 Mem-Info: ...
The reason is that the hugetlb pages being released are allocated from movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages need to be allocated from the same node during the hugetlb pages releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable node is always failed. Fix this problem by removing __GFP_THISNODE.
Link: https://lkml.kernel.org/r/20230905124503.24899-1-yuancan@huawei.com Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") Signed-off-by: Yuan Can yuancan@huawei.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mike Kravetz mike.kravetz@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/hugetlb_vmemmap.c [ The gfp_mask is still the parameter of vmemmap_remap_alloc(). ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/hugetlb_vmemmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 7ec8560d267d7..460325ad729ea 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -254,7 +254,7 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) * discarded vmemmap pages must be allocated and remapping. */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, - GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); + GFP_KERNEL | __GFP_NORETRY); if (!ret) { ClearHPageVmemmapOptimized(head); static_branch_dec(&hugetlb_optimize_vmemmap_key);
From: Xueshi Hu xueshi.hu@smartx.com
mainline inclusion from mainline-v6.7-rc1 commit b72b3c9c34c825c81d205241c5f822fc7835923f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In set_nr_huge_pages(), local variable "count" is used to record persistent_huge_pages(), but when it cames to nodes huge page allocation, the semantics changes to nr_huge_pages. When there exists surplus huge pages and using the interface under /sys/devices/system/node/node*/hugepages to change huge page pool size, this difference can result in the allocation of an unexpected number of huge pages.
Steps to reproduce the bug:
Starting with:
Node 0 Node 1 Total HugePages_Total 0.00 0.00 0.00 HugePages_Free 0.00 0.00 0.00 HugePages_Surp 0.00 0.00 0.00
create 100 huge pages in Node 0 and consume it, then set Node 0 's nr_hugepages to 0.
yields:
Node 0 Node 1 Total HugePages_Total 200.00 0.00 200.00 HugePages_Free 0.00 0.00 0.00 HugePages_Surp 200.00 0.00 200.00
write 100 to Node 1's nr_hugepages
echo 100 > /sys/devices/system/node/node1/\ hugepages/hugepages-2048kB/nr_hugepages
gets:
Node 0 Node 1 Total HugePages_Total 200.00 400.00 600.00 HugePages_Free 0.00 400.00 400.00 HugePages_Surp 200.00 0.00 200.00
Kernel is expected to create only 100 huge pages and it gives 200.
Link: https://lkml.kernel.org/r/20230829033343.467779-1-xueshi.hu@smartx.com Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes") Signed-off-by: Xueshi Hu xueshi.hu@smartx.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Andi Kleen andi@firstfloor.org Cc: Lee Schermerhorn lee.schermerhorn@hp.com Cc: Mel Gorman mel@csn.ul.ie Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/hugetlb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 54e2eefdf0b46..5f04adac38bb9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3048,7 +3048,9 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, if (nid != NUMA_NO_NODE) { unsigned long old_count = count;
- count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; + count += persistent_huge_pages(h) - + (h->nr_huge_pages_node[nid] - + h->surplus_huge_pages_node[nid]); /* * User may have specified a large count value which caused the * above calculation to overflow. In this case, they wanted
From: Kemeng Shi shikemeng@huaweicloud.com
mainline inclusion from mainline-v6.7-rc1 commit 3da0272a4c7d0d37b47b28e87014f421296fc2be category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAR7B3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In strict mode, we should return 0 if there is any hole in pageblock. If we successfully isolated pages at beginning at pageblock and then have a bogus compound_order outside pageblock in next page. We will abort search loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn, we will treat it as a successful isolation in strict mode as blockpfn is not < end_pfn and return partial isolated pages. Then isolate_freepages_range may success unexpectly with hole in isolated range.
Link: https://lkml.kernel.org/r/20230901155141.249860-4-shikemeng@huaweicloud.com Fixes: 9fcd6d2e052e ("mm, compaction: skip compound pages by order in free scanner") Signed-off-by: Kemeng Shi shikemeng@huaweicloud.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Acked-by: Mel Gorman mgorman@techsingularity.net Cc: David Hildenbrand david@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/compaction.c [ Conflict because MAX_ORDER is redefined in commit 23baf831a32c and context conflicts with commit 56d48d8dbefb and dc13292cccfd. ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/compaction.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index fb339ceb5bde6..6035aa46c8ac8 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -599,10 +599,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (PageCompound(page)) { const unsigned int order = compound_order(page);
- if (likely(order < MAX_ORDER)) { + if (blockpfn + (1UL << order) <= end_pfn) { blockpfn += (1UL << order) - 1; cursor += (1UL << order) - 1; } + goto isolate_fail; }
@@ -657,8 +658,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, spin_unlock_irqrestore(&cc->zone->lock, flags);
/* - * There is a tiny chance that we have read bogus compound_order(), - * so be careful to not go outside of the pageblock. + * Be careful to not go outside of the pageblock. */ if (unlikely(blockpfn > end_pfn)) blockpfn = end_pfn;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/11562 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/11562 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X...