Baolin Wang (1): fs: improve dump_mapping() robustness
Matthew Wilcox (Oracle) (2): mm: add __dump_folio() mm: improve dumping of mapcount and page_type
fs/inode.c | 3 +- include/linux/mm.h | 7 +++ include/linux/mmzone.h | 3 + mm/debug.c | 129 +++++++++++++++++++++++------------------ 4 files changed, 84 insertions(+), 58 deletions(-)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10317 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/U...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10317 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/U...
From: Baolin Wang baolin.wang@linux.alibaba.com
mainline inclusion from mainline-v6.9-rc1 commit 8b3d838139bcd1e552f1899191f734264ce2a1a5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IACHGW
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
We met a kernel crash issue when running stress-ng testing, and the system crashes when printing the dentry name in dump_mapping().
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : dentry_name+0xd8/0x224 lr : pointer+0x22c/0x370 sp : ffff800025f134c0 ...... Call trace: dentry_name+0xd8/0x224 pointer+0x22c/0x370 vsnprintf+0x1ec/0x730 vscnprintf+0x2c/0x60 vprintk_store+0x70/0x234 vprintk_emit+0xe0/0x24c vprintk_default+0x3c/0x44 vprintk_func+0x84/0x2d0 printk+0x64/0x88 __dump_page+0x52c/0x530 dump_page+0x14/0x20 set_migratetype_isolate+0x110/0x224 start_isolate_page_range+0xc4/0x20c offline_pages+0x124/0x474 memory_block_offline+0x44/0xf4 memory_subsys_offline+0x3c/0x70 device_offline+0xf0/0x120 ......
The root cause is that, one thread is doing page migration, and we will use the target page's ->mapping field to save 'anon_vma' pointer between page unmap and page move, and now the target page is locked and refcount is 1.
Currently, there is another stress-ng thread performing memory hotplug, attempting to offline the target page that is being migrated. It discovers that the refcount of this target page is 1, preventing the offline operation, thus proceeding to dump the page. However, page_mapping() of the target page may return an incorrect file mapping to crash the system in dump_mapping(), since the target page->mapping only saves 'anon_vma' pointer without setting PAGE_MAPPING_ANON flag.
The page migration issue has been fixed by commit d1adb25df711 ("mm: migrate: fix getting incorrect page mapping during page migration"). In addition, Matthew suggested we should also improve dump_mapping()'s robustness to resilient against the kernel crash [1].
With checking the 'dentry.parent' and 'dentry.d_name.name' used by dentry_name(), I can see dump_mapping() will output the invalid dentry instead of crashing the system when this issue is reproduced again.
[12211.189128] page:fffff7de047741c0 refcount:1 mapcount:0 mapping:ffff989117f55ea0 index:0x1 pfn:0x211dd07 [12211.189144] aops:0x0 ino:1 invalid dentry:74786574206e6870 [12211.189148] flags: 0x57ffffc0000001(locked|node=1|zone=2|lastcpupid=0x1fffff) [12211.189150] page_type: 0xffffffff() [12211.189153] raw: 0057ffffc0000001 0000000000000000 dead000000000122 ffff989117f55ea0 [12211.189154] raw: 0000000000000001 0000000000000001 00000001ffffffff 0000000000000000 [12211.189155] page dumped because: unmovable page
[1] https://lore.kernel.org/all/ZXxn%2F0oixJxxAnpF@casper.infradead.org/
Suggested-by: Matthew Wilcox willy@infradead.org Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Link: https://lore.kernel.org/r/937ab1f87328516821d39be672b6bc18861d9d3e.170539142... Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c index ae1a6410b53d..2d8b8d353750 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -589,7 +589,8 @@ void dump_mapping(const struct address_space *mapping) }
dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias); - if (get_kernel_nofault(dentry, dentry_ptr)) { + if (get_kernel_nofault(dentry, dentry_ptr) || + !dentry.d_parent || !dentry.d_name.name) { pr_warn("aops:%ps ino:%lx invalid dentry:%px\n", a_ops, ino, dentry_ptr); return;
From: "Matthew Wilcox (Oracle)" willy@infradead.org
mainline inclusion from mainline-v6.9-rc1 commit fae7d834c43ccdb9fcecaf4d0f33145d884b3e5c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IACHGW
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Turn __dump_page() into a wrapper around __dump_folio(). Snapshot the page & folio into a stack variable so we don't hit BUG_ON() if an allocation is freed under us and what was a folio pointer becomes a pointer to a tail page.
[willy@infradead.org: fix build issue] Link: https://lkml.kernel.org/r/ZeAKCyTn_xS3O9cE@casper.infradead.org [willy@infradead.org: fix __dump_folio] Link: https://lkml.kernel.org/r/ZeJJegP8zM7S9GTy@casper.infradead.org [willy@infradead.org: fix pointer confusion] Link: https://lkml.kernel.org/r/ZeYa00ixxC4k1ot-@casper.infradead.org [akpm@linux-foundation.org: s/printk/pr_warn/] Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/mm.h | 7 +++ include/linux/mmzone.h | 3 + mm/debug.c | 128 +++++++++++++++++++++++------------------ 3 files changed, 83 insertions(+), 55 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 7d485ce6f94d..cd7d5f78477d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2084,6 +2084,13 @@ static inline long folio_nr_pages(struct folio *folio) #endif }
+/* Only hugetlbfs can allocate folios larger than MAX_ORDER */ +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE +#define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER) +#else +#define MAX_FOLIO_NR_PAGES MAX_ORDER_NR_PAGES +#endif + /* * compound_nr() returns the number of pages in this potentially compound * page. compound_nr() can be called on a tail page, and is defined to diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1f1c4facbdd4..f8d0756cf4b2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -75,9 +75,12 @@ extern const char * const migratetype_names[MIGRATE_TYPES]; #ifdef CONFIG_CMA # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) # define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA) +# define is_migrate_cma_folio(folio, pfn) (MIGRATE_CMA == \ + get_pfnblock_flags_mask(&folio->page, pfn, MIGRATETYPE_MASK)) #else # define is_migrate_cma(migratetype) false # define is_migrate_cma_page(_page) false +# define is_migrate_cma_folio(folio, pfn) false #endif
static inline bool is_migrate_movable(int mt) diff --git a/mm/debug.c b/mm/debug.c index ee533a5ceb79..12fed592c955 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -51,84 +51,102 @@ const struct trace_print_flags vmaflag_names[] = { {0, NULL} };
-static void __dump_page(struct page *page) +static void __dump_folio(struct folio *folio, struct page *page, + unsigned long pfn, unsigned long idx) { - struct folio *folio = page_folio(page); - struct page *head = &folio->page; - struct address_space *mapping; - bool compound = PageCompound(page); - /* - * Accessing the pageblock without the zone lock. It could change to - * "isolate" again in the meantime, but since we are just dumping the - * state for debugging, it should be fine to accept a bit of - * inaccuracy here due to racing. - */ - bool page_cma = is_migrate_cma_page(page); - int mapcount; + struct address_space *mapping = folio_mapping(folio); + int mapcount = 0; char *type = "";
- if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) { - /* - * Corrupt page, so we cannot call page_mapping. Instead, do a - * safe subset of the steps that page_mapping() does. Caution: - * this will be misleading for tail pages, PageSwapCache pages, - * and potentially other situations. (See the page_mapping() - * implementation for what's missing here.) - */ - unsigned long tmp = (unsigned long)page->mapping; - - if (tmp & PAGE_MAPPING_ANON) - mapping = NULL; - else - mapping = (void *)(tmp & ~PAGE_MAPPING_FLAGS); - head = page; - folio = (struct folio *)page; - compound = false; - } else { - mapping = page_mapping(page); - } - /* - * Avoid VM_BUG_ON() in page_mapcount(). - * page->_mapcount space in struct page is used by sl[aou]b pages to - * encode own info. + * page->_mapcount space in struct page is used by slab pages to + * encode own info, and we must avoid calling page_folio() again. */ - mapcount = PageSlab(head) ? 0 : page_mapcount(page); - - pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n", - page, page_ref_count(head), mapcount, mapping, - page_to_pgoff(page), page_to_pfn(page)); - if (compound) { - pr_warn("head:%p order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n", - head, compound_order(head), + if (!folio_test_slab(folio)) { + mapcount = atomic_read(&page->_mapcount) + 1; + if (folio_test_large(folio)) + mapcount += folio_entire_mapcount(folio); + } + + pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n", + folio_ref_count(folio), mapcount, mapping, + folio->index + idx, pfn); + if (folio_test_large(folio)) { + pr_warn("head: order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n", + folio_order(folio), folio_entire_mapcount(folio), folio_nr_pages_mapped(folio), atomic_read(&folio->_pincount)); }
#ifdef CONFIG_MEMCG - if (head->memcg_data) - pr_warn("memcg:%lx\n", head->memcg_data); + if (folio->memcg_data) + pr_warn("memcg:%lx\n", folio->memcg_data); #endif - if (PageKsm(page)) + if (folio_test_ksm(folio)) type = "ksm "; - else if (PageAnon(page)) + else if (folio_test_anon(folio)) type = "anon "; else if (mapping) dump_mapping(mapping); BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
- pr_warn("%sflags: %pGp%s\n", type, &head->flags, - page_cma ? " CMA" : ""); - pr_warn("page_type: %pGt\n", &head->page_type); + /* + * Accessing the pageblock without the zone lock. It could change to + * "isolate" again in the meantime, but since we are just dumping the + * state for debugging, it should be fine to accept a bit of + * inaccuracy here due to racing. + */ + pr_warn("%sflags: %pGp%s\n", type, &folio->flags, + is_migrate_cma_folio(folio, pfn) ? " CMA" : ""); + pr_warn("page_type: %pGt\n", &folio->page.page_type);
print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, sizeof(unsigned long), page, sizeof(struct page), false); - if (head != page) + if (folio_test_large(folio)) print_hex_dump(KERN_WARNING, "head: ", DUMP_PREFIX_NONE, 32, - sizeof(unsigned long), head, - sizeof(struct page), false); + sizeof(unsigned long), folio, + 2 * sizeof(struct page), false); +} + +static void __dump_page(const struct page *page) +{ + struct folio *foliop, folio; + struct page precise; + unsigned long pfn = page_to_pfn(page); + unsigned long idx, nr_pages = 1; + int loops = 5; + +again: + memcpy(&precise, page, sizeof(*page)); + foliop = page_folio(&precise); + if (foliop == (struct folio *)&precise) { + idx = 0; + if (!folio_test_large(foliop)) + goto dump; + foliop = (struct folio *)page; + } else { + idx = folio_page_idx(foliop, page); + } + + if (idx < MAX_FOLIO_NR_PAGES) { + memcpy(&folio, foliop, 2 * sizeof(struct page)); + nr_pages = folio_nr_pages(&folio); + foliop = &folio; + } + + if (idx > nr_pages) { + if (loops-- > 0) + goto again; + pr_warn("page does not match folio\n"); + precise.compound_head &= ~1UL; + foliop = (struct folio *)&precise; + idx = 0; + } + +dump: + __dump_folio(foliop, &precise, pfn, idx); }
void dump_page(struct page *page, const char *reason)
From: "Matthew Wilcox (Oracle)" willy@infradead.org
mainline inclusion from mainline-v6.10-rc1 commit 8f790d0c7cfed047a1f7aad3fcddd7a979bf7232 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IACHGW
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
For pages that have a page_type, set the mapcount to 0, which will reduce the confusion in people reading page dumps ("Why does this page have a mapcount of -128?"). Now that hugetlbfs is a page_type, read the entire_mapcount for any large folio; this is fine for all folios as no user reuses the entire_mapcount field.
For pages which do not have a page type, do not print it to reduce clutter.
Link: https://lkml.kernel.org/r/20240321142448.1645400-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: David Hildenbrand david@redhat.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Oscar Salvador osalvador@suse.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/debug.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/mm/debug.c b/mm/debug.c index 12fed592c955..0dd516a6640e 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -55,18 +55,14 @@ static void __dump_folio(struct folio *folio, struct page *page, unsigned long pfn, unsigned long idx) { struct address_space *mapping = folio_mapping(folio); - int mapcount = 0; + int mapcount = atomic_read(&page->_mapcount) + 1; char *type = "";
- /* - * page->_mapcount space in struct page is used by slab pages to - * encode own info, and we must avoid calling page_folio() again. - */ - if (!folio_test_slab(folio)) { - mapcount = atomic_read(&page->_mapcount) + 1; - if (folio_test_large(folio)) - mapcount += folio_entire_mapcount(folio); - } + /* Open-code page_mapcount() to avoid looking up a stale folio */ + if (mapcount < 0) + mapcount = 0; + if (folio_test_large(folio)) + mapcount += folio_entire_mapcount(folio);
pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n", folio_ref_count(folio), mapcount, mapping, @@ -99,7 +95,8 @@ static void __dump_folio(struct folio *folio, struct page *page, */ pr_warn("%sflags: %pGp%s\n", type, &folio->flags, is_migrate_cma_folio(folio, pfn) ? " CMA" : ""); - pr_warn("page_type: %pGt\n", &folio->page.page_type); + if (page_has_type(&folio->page)) + pr_warn("page_type: %pGt\n", &folio->page.page_type);
print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, sizeof(unsigned long), page,