From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v5.16-rc1 commit d0fe47c64152a63ceed4b9f29ac56371407fa7b4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I57LS2 CVE: NA backport: openEuler-22.03-LTS
--------------------------------
After commit f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free"), the check for free nonslab page is replaced by VM_BUG_ON_PAGE, which only check with CONFIG_DEBUG_VM enabled, but this config may impact performance, so it only for debug.
Commit 0937502af7c9 ("slub: Add check for kfree() of non slab objects.") add the ability, which should be needed in any configs to catch the invalid free, they even could be potential issue, eg, memory corruption, use after free and double free, so replace VM_BUG_ON_PAGE to WARN_ON_ONCE, add object address printing to help use to debug the issue.
Link: https://lkml.kernel.org/r/20210930070214.61499-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox willy@infradead.org Cc: Shakeel Butt shakeelb@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rienjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/slub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c index 7a7b0bf82b8e..98452815a066 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3209,7 +3209,9 @@ static inline void free_nonslab_page(struct page *page, void *object) { unsigned int order = compound_order(page);
- VM_BUG_ON_PAGE(!PageCompound(page), page); + if (WARN_ON_ONCE(!PageCompound(page))) + pr_warn_once("object pointer: 0x%p\n", object); + kfree_hook(object); mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B, -(PAGE_SIZE << order)); __free_pages(page, order);
From: Guo Mengqi guomengqi3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I54I59 CVE: NA backport: openEuler-22.03-LTS
--------------------------------------------------
When hisi_oom_notifier_call calls spg_overview_show, it requires the global rwsem sp_group_sem, which had been held by another process when oomed. This leads to kernel hungtask. At another position the unecessary sp_group_sem causes an ABBA deadlock.
[ 1934.549016] INFO: task klogd:2757 blocked for more than 120 seconds. [ 1934.562408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1934.570231] klogd D 0 2757 2746 0x00000000 [ 1934.575707] Call trace: [ 1934.578162] __switch_to+0xe8/0x150 [ 1934.581648] __schedule+0x250/0x558 [ 1934.585133] schedule+0x30/0xf0 [ 1934.588267] rwsem_down_read_failed+0x10c/0x188 [ 1934.592788] down_read+0x60/0x68 [ 1934.596015] spg_overview_show.part.31+0xc8/0xf8 [ 1934.600622] spg_overview_show+0x2c/0x38 [ 1934.604543] hisi_oom_notifier_call+0xe8/0x120 [ 1934.608975] out_of_memory+0x7c/0x570 [ 1934.612631] __alloc_pages_nodemask+0xcfc/0xd98 [ 1934.617158] alloc_pages_current+0x88/0xf0 [ 1934.621246] __page_cache_alloc+0x8c/0xd8 [ 1934.625247] page_cache_alloc_inode+0x48/0x58 [ 1934.629595] filemap_fault+0x360/0x8e0 [ 1934.633341] ext4_filemap_fault+0x38/0x128 [ 1934.637431] __do_fault+0x50/0x218 [ 1934.640822] __handle_mm_fault+0x69c/0x9c8 [ 1934.644909] handle_mm_fault+0xf8/0x200 [ 1934.648740] do_page_fault+0x220/0x508 [ 1934.652477] do_translation_fault+0xa8/0xbc [ 1934.656652] do_mem_abort+0x68/0x118 [ 1934.660216] do_el0_ia_bp_hardening+0x6c/0xd8 [ 1934.664565] el0_ia+0x20/0x24
Signed-off-by: Guo Mengqi guomengqi3@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/share_pool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/share_pool.c b/mm/share_pool.c index 3a37418378f6..cd45852919a1 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -4042,9 +4042,9 @@ void spg_overview_show(struct seq_file *seq) atomic_read(&sp_overall_stat.spa_total_num)); }
- down_read(&sp_group_sem); + down_read(&sp_spg_stat_sem); idr_for_each(&sp_spg_stat_idr, idr_spg_stat_cb, seq); - up_read(&sp_group_sem); + up_read(&sp_spg_stat_sem);
if (seq != NULL) seq_puts(seq, "\n");
From: Wang Wensheng wangwensheng4@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I54I7W CVE: NA backport: openEuler-22.03-LTS
--------------------------------------------------
There is a ABBA deadlock in the process of sp_group_add_task and proc_stat_show().
PROCESS A: sp_group_add_task() acquire sp_group_sem write lock ->sp_init_proc_stat() acquire sp_spg_stat_sem write lock PROCESS B: proc_stat_show() acquire sp_spg_stat_sem read lock ->idr_proc_stat_cb() acquire sp_group_sem read lock
Here we choose the simplest way that acquires sp_group_sem and sp_stat_sem read lock subsequently in proc_stat_show(), since it just has effect on the process of debug feature.
Signed-off-by: Wang Wensheng wangwensheng4@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/share_pool.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/share_pool.c b/mm/share_pool.c index cd45852919a1..85d175def6ae 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -4083,7 +4083,6 @@ static int idr_proc_stat_cb(int id, void *p, void *data) long sp_res, sp_res_nsize, non_sp_res, non_sp_shm;
/* to prevent ABBA deadlock, first hold sp_group_sem */ - down_read(&sp_group_sem); mutex_lock(&spg_stat->lock); hash_for_each(spg_stat->hash, i, spg_proc_stat, gnode) { proc_stat = spg_proc_stat->proc_stat; @@ -4112,7 +4111,6 @@ static int idr_proc_stat_cb(int id, void *p, void *data) seq_putc(seq, '\n'); } mutex_unlock(&spg_stat->lock); - up_read(&sp_group_sem); return 0; }
@@ -4130,10 +4128,16 @@ static int proc_stat_show(struct seq_file *seq, void *offset) byte2kb(atomic64_read(&kthread_stat.alloc_size)), byte2kb(atomic64_read(&kthread_stat.k2u_size)));
- /* pay attention to potential ABBA deadlock */ + /* + * This ugly code is just for fixing the ABBA deadlock against + * sp_group_add_task. + */ + down_read(&sp_group_sem); down_read(&sp_spg_stat_sem); idr_for_each(&sp_spg_stat_idr, idr_proc_stat_cb, seq); up_read(&sp_spg_stat_sem); + up_read(&sp_group_sem); + return 0; }
From: Zhang Jian zhangjian210@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I54IL8 CVE: NA backport: openEuler-22.03-LTS
-----------------------------
When passing numa id to sp_alloc, sometimes numa id does not work. This is because memory policy will change numa id to a preferred one if memory policy is set. Fix the error by mbind virtual address to desired numa id.
Signed-off-by: Zhang Jian zhangjian210@huawei.com Signed-off-by: Zhou Guanghui zhouguanghui1@huawei.com Signed-off-by: Wang Wensheng wangwensheng4@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/mempolicy.h | 10 ++++++ mm/mempolicy.c | 14 ++++++--- mm/share_pool.c | 66 +++++++++++++++++++++++++-------------- 3 files changed, 62 insertions(+), 28 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 64ab4398ba90..133a8b5bdf9e 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -201,6 +201,9 @@ extern bool vma_migratable(struct vm_area_struct *vma); extern int mpol_misplaced(struct page *, struct vm_area_struct *, unsigned long); extern void mpol_put_task_policy(struct task_struct *);
+extern long __do_mbind(unsigned long start, unsigned long len, + unsigned short mode, unsigned short mode_flags, + nodemask_t *nmask, unsigned long flags, struct mm_struct *mm); #else
struct mempolicy {}; @@ -301,6 +304,13 @@ static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma, return -1; /* no node preference */ }
+static long __do_mbind(unsigned long start, unsigned long len, + unsigned short mode, unsigned short mode_flags, + nodemask_t *nmask, unsigned long flags, struct mm_struct *mm) +{ + return 0; +} + static inline void mpol_put_task_policy(struct task_struct *task) { } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5ce39dbc84e1..d2326f9a38a8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1308,11 +1308,10 @@ static struct page *new_page(struct page *page, unsigned long start) } #endif
-static long do_mbind(unsigned long start, unsigned long len, - unsigned short mode, unsigned short mode_flags, - nodemask_t *nmask, unsigned long flags) +long __do_mbind(unsigned long start, unsigned long len, + unsigned short mode, unsigned short mode_flags, + nodemask_t *nmask, unsigned long flags, struct mm_struct *mm) { - struct mm_struct *mm = current->mm; struct mempolicy *new; unsigned long end; int err; @@ -1411,6 +1410,13 @@ static long do_mbind(unsigned long start, unsigned long len, return err; }
+static long do_mbind(unsigned long start, unsigned long len, + unsigned short mode, unsigned short mode_flags, + nodemask_t *nmask, unsigned long flags) +{ + return __do_mbind(start, len, mode, mode_flags, nmask, flags, current->mm); +} + /* * User space interface with variable sized bitmaps for nodelists. */ diff --git a/mm/share_pool.c b/mm/share_pool.c index 85d175def6ae..76088952d0a5 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -16,7 +16,6 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ - #define pr_fmt(fmt) "share pool: " fmt
#include <linux/share_pool.h> @@ -2157,6 +2156,7 @@ struct sp_alloc_context { bool need_fallocate; struct timespec64 start; struct timespec64 end; + bool have_mbind; };
static void trace_sp_alloc_begin(struct sp_alloc_context *ac) @@ -2298,6 +2298,7 @@ static int sp_alloc_prepare(unsigned long size, unsigned long sp_flags, ac->sp_flags = sp_flags; ac->state = ALLOC_NORMAL; ac->need_fallocate = false; + ac->have_mbind = false; return 0; }
@@ -2391,7 +2392,7 @@ static void sp_alloc_fallback(struct sp_area *spa, struct sp_alloc_context *ac) }
static int sp_alloc_populate(struct mm_struct *mm, struct sp_area *spa, - struct sp_group_node *spg_node, struct sp_alloc_context *ac) + struct sp_alloc_context *ac) { int ret = 0; unsigned long sp_addr = spa->va_start; @@ -2423,25 +2424,20 @@ static int sp_alloc_populate(struct mm_struct *mm, struct sp_area *spa, if (ret) sp_add_work_compact(); } - if (ret) { - if (spa->spg != spg_none) - sp_alloc_unmap(list_next_entry(spg_node, proc_node)->master->mm, spa, spg_node); - else - sp_munmap(mm, spa->va_start, spa->real_size); - - if (unlikely(fatal_signal_pending(current))) - pr_warn_ratelimited("allocation failed, current thread is killed\n"); - else - pr_warn_ratelimited("allocation failed due to mm populate failed(potential no enough memory when -12): %d\n", - ret); - sp_fallocate(spa); /* need this, otherwise memleak */ - sp_alloc_fallback(spa, ac); - } else { - ac->need_fallocate = true; - } return ret; }
+static long sp_mbind(struct mm_struct *mm, unsigned long start, unsigned long len, + unsigned long node) +{ + nodemask_t nmask; + + nodes_clear(nmask); + node_set(node, nmask); + return __do_mbind(start, len, MPOL_BIND, MPOL_F_STATIC_NODES, + &nmask, MPOL_MF_STRICT, mm); +} + static int __sp_alloc_mmap_populate(struct mm_struct *mm, struct sp_area *spa, struct sp_group_node *spg_node, struct sp_alloc_context *ac) { @@ -2457,7 +2453,34 @@ static int __sp_alloc_mmap_populate(struct mm_struct *mm, struct sp_area *spa, return ret; }
- ret = sp_alloc_populate(mm, spa, spg_node, ac); + if (!ac->have_mbind) { + ret = sp_mbind(mm, spa->va_start, spa->real_size, spa->node_id); + if (ret < 0) { + pr_err("cannot bind the memory range to specified node:%d, err:%d\n", + spa->node_id, ret); + goto err; + } + ac->have_mbind = true; + } + + ret = sp_alloc_populate(mm, spa, ac); + if (ret) { +err: + if (spa->spg != spg_none) + sp_alloc_unmap(list_next_entry(spg_node, proc_node)->master->mm, spa, spg_node); + else + sp_munmap(mm, spa->va_start, spa->real_size); + + if (unlikely(fatal_signal_pending(current))) + pr_warn_ratelimited("allocation failed, current thread is killed\n"); + else + pr_warn_ratelimited("allocation failed due to mm populate failed(potential no enough memory when -12): %d\n", + ret); + sp_fallocate(spa); /* need this, otherwise memleak */ + sp_alloc_fallback(spa, ac); + } else + ac->need_fallocate = true; + return ret; }
@@ -2479,11 +2502,6 @@ static int sp_alloc_mmap_populate(struct sp_area *spa, if (mmap_ret) { if (ac->state != ALLOC_COREDUMP) return mmap_ret; - if (ac->spg == spg_none) { - sp_alloc_unmap(mm, spa, spg_node); - pr_err("dvpp allocation failed due to coredump"); - return mmap_ret; - } ac->state = ALLOC_NORMAL; continue; }
From: Li Zhengyu lizhengyu3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I59IQS CVE: NA
--------------------------------
Use non nmi ipi to support backtrace on arm64 with nmi unsupported. It has been tested on qemu.
Signed-off-by: Li Zhengyu lizhengyu3@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/ipi_nmi.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c index 3b105852fc17..2cf28e511b23 100644 --- a/arch/arm64/kernel/ipi_nmi.c +++ b/arch/arm64/kernel/ipi_nmi.c @@ -33,12 +33,24 @@ void arm64_send_nmi(cpumask_t *mask) __ipi_send_mask(ipi_nmi_desc, mask); }
+static void ipi_cpu_backtrace(void *info) +{ + printk_safe_enter(); + nmi_cpu_backtrace(get_irq_regs()); + printk_safe_exit(); +} + +static void arm64_send_ipi(cpumask_t *mask) +{ + smp_call_function_many(mask, ipi_cpu_backtrace, NULL, false); +} + bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) { if (!ipi_nmi_desc) - return false; - - nmi_trigger_cpumask_backtrace(mask, exclude_self, arm64_send_nmi); + nmi_trigger_cpumask_backtrace(mask, exclude_self, arm64_send_ipi); + else + nmi_trigger_cpumask_backtrace(mask, exclude_self, arm64_send_nmi);
return true; }
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
mainline inclusion from mainline-v5.19 commit 6f6f84aa215f7b6665ccbb937db50860f9ec2989 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I58J1U CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
KASAN report null-ptr-deref as follows:
BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd] Write of size 8 at addr 000000000000005d by task a.out/852
CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xab/0x120 ? nfsd_mkdir+0x71/0x1c0 [nfsd] ? nfsd_fill_super+0xc6/0xe0 [nfsd] nfsd_fill_super+0xc6/0xe0 [nfsd] ? nfsd_mkdir+0x1c0/0x1c0 [nfsd] get_tree_keyed+0x8e/0x100 vfs_get_tree+0x41/0xf0 __do_sys_fsconfig+0x590/0x670 ? fscontext_read+0x180/0x180 ? anon_inode_getfd+0x4f/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae
This can be reproduce by concurrent operations: 1. fsopen(nfsd)/fsconfig 2. insmod/rmmod nfsd
Since the nfsd file system is registered before than nfsd_net allocated, the caller may get the file_system_type and use the nfsd_net before it allocated, then null-ptr-deref occurred.
So init_nfsd() should call register_filesystem() last.
Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com
Conflicts: fs/nfsd/nfsctl.c
Signed-off-by: Luo Meng luomeng12@huawei.com Reviewed-by: zhangxiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/nfsd/nfsctl.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index dedec4771ecc..5b09b82a4e59 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1540,20 +1540,20 @@ static int __init init_nfsd(void) retval = create_proc_exports_entry(); if (retval) goto out_free_lockd; - retval = register_filesystem(&nfsd_fs_type); - if (retval) - goto out_free_exports; retval = register_pernet_subsys(&nfsd_net_ops); if (retval < 0) - goto out_free_filesystem; + goto out_free_exports; retval = register_cld_notifier(); + if (retval) + goto out_free_subsys; + retval = register_filesystem(&nfsd_fs_type); if (retval) goto out_free_all; return 0; out_free_all: + unregister_cld_notifier(); +out_free_subsys: unregister_pernet_subsys(&nfsd_net_ops); -out_free_filesystem: - unregister_filesystem(&nfsd_fs_type); out_free_exports: remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); @@ -1570,6 +1570,7 @@ static int __init init_nfsd(void)
static void __exit exit_nfsd(void) { + unregister_filesystem(&nfsd_fs_type); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); nfsd_drc_slab_free(); @@ -1579,7 +1580,6 @@ static void __exit exit_nfsd(void) nfsd_lockd_shutdown(); nfsd4_free_slabs(); nfsd4_exit_pnfs(); - unregister_filesystem(&nfsd_fs_type); }
MODULE_AUTHOR("Olaf Kirch okir@monad.swb.de");