From: Li Zetao lizetao1@huawei.com
mainline inclusion from mainline-v6.1-rc5 commit d4af56c5c7c6781ca6ca8075e2cf5bc119ed33d1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9J6XR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
There is a memory leak reported by kmemleak:
unreferenced object 0xffff88817231ce40 (size 224): comm "mount.cifs", pid 19308, jiffies 4295917571 (age 405.880s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60 c0 b2 00 81 88 ff ff 98 83 01 42 81 88 ff ff `..........B.... backtrace: [<ffffffff81936171>] __alloc_file+0x21/0x250 [<ffffffff81937051>] alloc_empty_file+0x41/0xf0 [<ffffffff81937159>] alloc_file+0x59/0x710 [<ffffffff81937964>] alloc_file_pseudo+0x154/0x210 [<ffffffff81741dbf>] __shmem_file_setup+0xff/0x2a0 [<ffffffff817502cd>] shmem_zero_setup+0x8d/0x160 [<ffffffff817cc1d5>] mmap_region+0x1075/0x19d0 [<ffffffff817cd257>] do_mmap+0x727/0x1110 [<ffffffff817518b2>] vm_mmap_pgoff+0x112/0x1e0 [<ffffffff83adf955>] do_syscall_64+0x35/0x80 [<ffffffff83c0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
The root cause was traced to an error handing path in mmap_region() when arch_validate_flags() or mas_preallocate() fails. In the shared anonymous mapping sence, vma will be setuped and mapped with a new shared anonymous file via shmem_zero_setup(). So in this case, the file resource needs to be released.
Fix it by calling fput(vma->vm_file) and unmap_region() when arch_validate_flags() or mas_preallocate() returns an error in the shared anonymous mapping sence.
Link: https://lkml.kernel.org/r/20221028073717.1179380-1-lizetao1@huawei.com Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()") Signed-off-by: Li Zetao lizetao1@huawei.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
conflicts: mm/mmap.c
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- mm/mmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c index bddd7f0f88b9..7a70757f4008 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1914,6 +1914,8 @@ static unsigned long __mmap_region(struct mm_struct *mm, struct file *file, error = -EINVAL; if (file) goto close_and_free_vma; + else if (vma->vm_file) + goto unmap_and_free_vma; else goto free_vma; } @@ -1967,7 +1969,7 @@ static unsigned long __mmap_region(struct mm_struct *mm, struct file *file,
/* Undo any partial mapping done by a device driver. */ unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); - if (vm_flags & VM_SHARED) + if (file && (vm_flags & VM_SHARED)) mapping_unmap_writable(file->f_mapping); allow_write_and_free_vma: if (vm_flags & VM_DENYWRITE)
From: Christophe Leroy christophe.leroy@csgroup.eu
mainline inclusion from mainline-v6.0-rc1 commit 980bbf7ca72012d317617fcdbfabe8708e4cef29 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9J6ZB CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?9...
--------------------------------
mark_initmem_nx() calls either mmu_mark_initmem_nx() or set_memory_attr() based on return from v_block_mapped() of _sinittext.
But we can now handle text and data independently, so that text may be mapped by block even when data is mapped by pages.
On the 8xx for instance, at startup 32Mbytes of memory are pinned in TLB. So the pinned entries need to go away for sinittext.
In next patch a BAT will be set to also covers sinittext on book3s/32. So it will also be needed to call mmu_mark_initmem_nx() even when data above sinittext is not mapped with BATs.
As this is highly dependent on the platform, call mmu_mark_initmem_nx() regardless of data block mapping. Then the platform will know what to do.
Modify 8xx mmu_mark_initmem_nx() so that inittext mapping is modified only when pagealloc debug and kfence are not active, otherwise inittext is mapped with standard pages. And don't do anything on kernel text which is already mapped with PAGE_KERNEL_TEXT.
Fixes: da1adea07576 ("powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/db3fc14f3bfa6215b0786ef58a6e2bc1e1f964d7.165520280...
conflicts: arch/powerpc/mm/nohash/8xx.c arch/powerpc/mm/pgtable_32.c
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- arch/powerpc/mm/nohash/8xx.c | 4 ++-- arch/powerpc/mm/pgtable_32.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c index cf35643a74cf..e745f4f62385 100644 --- a/arch/powerpc/mm/nohash/8xx.c +++ b/arch/powerpc/mm/nohash/8xx.c @@ -183,8 +183,8 @@ void mmu_mark_initmem_nx(void) unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : etext8; unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
- mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, false); - mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false); + if (!debug_pagealloc_enabled_or_kfence()) + mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
if (IS_ENABLED(CONFIG_PIN_TLB_TEXT)) mmu_pin_tlb(block_mapped_ram, false); diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 14aa3a11de52..7959c353e674 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -184,9 +184,9 @@ void mark_initmem_nx(void) unsigned long numpages = PFN_UP((unsigned long)_einittext) - PFN_DOWN((unsigned long)_sinittext);
- if (v_block_mapped((unsigned long)_sinittext)) - mmu_mark_initmem_nx(); - else + mmu_mark_initmem_nx(); + + if (!v_block_mapped((unsigned long)_sinittext)) { change_page_attr(page, numpages, PAGE_KERNEL); }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/6549 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/W...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/6549 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/W...
From: Dave Hansen dave.hansen@linux.intel.com
mainline inclusion from mainline-v5.18-rc2 commit d39268ad24c0fd0665d0c5cf55a7c1a0ebf94766 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9J6ZO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?d...
--------------------------------
0day reported a regression on a microbenchmark which is intended to stress the TLB flushing path:
https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/
It pointed at a commit from Nadav which intended to remove retpoline overhead in the TLB flushing path by taking the 'cond'-ition in on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into 'cpumask'. That allowed the code to use a bunch of earlier direct calls instead of later indirect calls that need a retpoline.
But, in practice, threads can go idle (and into lazy TLB mode where they don't need to flush their TLB) between the early and late calls. It works in this direction and not in the other because TLB-flushing threads tend to hold mmap_lock for write. Contention on that lock causes threads to _go_ idle right in this early/late window.
There was not any performance data in the original commit specific to the retpoline overhead. I did a few tests on a system with retpolines:
https://lore.kernel.org/all/dd8be93c-ded6-b962-50d4-96b1c3afb2b7@intel.com/
which showed a possible small win. But, that small win pales in comparison with the bigger loss induced on non-retpoline systems.
Revert the patch that removed the retpolines. This was not a clean revert, but it was self-contained enough not to be too painful.
Fixes: 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()") Reported-by: kernel test robot oliver.sang@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Nadav Amit namit@vmware.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-... Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- arch/x86/mm/tlb.c | 37 +++++-------------------------------- 1 file changed, 5 insertions(+), 32 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index f4b162f273f5..9dc0e2e55fc5 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -789,13 +789,11 @@ static void flush_tlb_func(void *info) nr_invalidate); }
-static bool tlb_is_not_lazy(int cpu) +static bool tlb_is_not_lazy(int cpu, void *data) { return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu); }
-static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask); - DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared);
@@ -824,36 +822,11 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask, * up on the new contents of what used to be page tables, while * doing a speculative memory access. */ - if (info->freed_tables) { + if (info->freed_tables) on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true); - } else { - /* - * Although we could have used on_each_cpu_cond_mask(), - * open-coding it has performance advantages, as it eliminates - * the need for indirect calls or retpolines. In addition, it - * allows to use a designated cpumask for evaluating the - * condition, instead of allocating one. - * - * This code works under the assumption that there are no nested - * TLB flushes, an assumption that is already made in - * flush_tlb_mm_range(). - * - * cond_cpumask is logically a stack-local variable, but it is - * more efficient to have it off the stack and not to allocate - * it on demand. Preemption is disabled and this code is - * non-reentrant. - */ - struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask); - int cpu; - - cpumask_clear(cond_cpumask); - - for_each_cpu(cpu, cpumask) { - if (tlb_is_not_lazy(cpu)) - __cpumask_set_cpu(cpu, cond_cpumask); - } - on_each_cpu_mask(cond_cpumask, flush_tlb_func, (void *)info, true); - } + else + on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func, + (void *)info, 1, cpumask); }
void flush_tlb_multi(const struct cpumask *cpumask,
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/6547 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/6...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/6547 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/6...
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/6548 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/6548 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...