From: Nicholas Piggin npiggin@gmail.com
mainline inclusion from mainline-v5.18-rc4 commit 3b8000ae185cb068adbda5f966a3835053c85fd4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6LD0S CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range).
However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index.
This is not compatible with compound sub-pages, and can cause bad page state issues like
BUG: Bad page state in process swapper/0 pfn:00743 page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743 flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: corrupted mapping in tail page Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810 Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) bad_page+0x12c/0x170 free_tail_pages_check+0xe8/0x190 free_pcp_prepare+0x31c/0x4e0 free_unref_page+0x40/0x1b0 __vunmap+0x1d8/0x420 ...
The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages.
Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.... Signed-off-by: Nicholas Piggin npiggin@gmail.com Cc: Paul Menzel pmenzel@molgen.mpg.de Cc: Song Liu songliubraving@fb.com Cc: Rick Edgecombe rick.p.edgecombe@intel.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
conflicts: mm/vmalloc.c
Signed-off-by: ZhangPeng zhangpeng362@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/vmalloc.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index dadbea29241d..a97d3011aa9a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2637,14 +2637,17 @@ static void __vunmap(const void *addr, int deallocate_pages) vm_remove_mappings(area, deallocate_pages);
if (deallocate_pages) { - unsigned int page_order = vm_area_page_order(area); int i;
- for (i = 0; i < area->nr_pages; i += 1U << page_order) { + for (i = 0; i < area->nr_pages; i++) { struct page *page = area->pages[i];
BUG_ON(!page); - __free_pages(page, page_order); + /* + * High-order allocs for huge vmallocs are split, so + * can be freed as an array of order-0 allocations + */ + __free_pages(page, 0); } atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
@@ -2924,8 +2927,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page *page; int p;
- /* Compound pages required for remap_vmalloc_page */ - page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order); + page = alloc_pages_node(node, gfp_mask, page_order); if (unlikely(!page)) { /* Successfully allocated i pages, free them in __vfree() */ area->nr_pages = i; @@ -2937,6 +2939,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; }
+ /* + * Higher order allocations must be able to be treated as + * indepdenent small pages by callers (as they can with + * small-page vmallocs). Some drivers do their own refcounting + * on vmalloc_to_page() pages, some use page->mapping, + * page->lru, etc. + */ + if (page_order) + split_page(page, page_order); + for (p = 0; p < (1U << page_order); p++) area->pages[i + p] = page + p;