From: Jann Horn jannh@google.com
mainline inclusion from mainline-v6.2-rc7 commit 023f47a8250c6bdb4aebe744db4bf7f73414028b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBE8BO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires it to be locked.
Page table traversal is allowed under any one of the mmap lock, the anon_vma lock (if the VMA is associated with an anon_vma), and the mapping lock (if the VMA is associated with a mapping); and so to be able to remove page tables, we must hold all three of them. retract_page_tables() bails out if an ->anon_vma is attached, but does this check before holding the mmap lock (as the comment above the check explains).
If we racily merged an existing ->anon_vma (shared with a child process) from a neighboring VMA, subsequent rmap traversals on pages belonging to the child will be able to see the page tables that we are concurrently removing while assuming that nothing else can access them.
Repeat the ->anon_vma check once we hold the mmap lock to ensure that there really is no concurrent page table access.
Hitting this bug causes a lockdep warning in collapse_and_free_pmd(), in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)". It can also lead to use-after-free access.
Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_P... Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Jann Horn jannh@google.com Reported-by: Zach O'Keefe zokeefe@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@intel.linux.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/khugepaged.c [Conflicts due to 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") is not merged.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/khugepaged.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1d5a985ae4cd..a3f45bca187b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1623,7 +1623,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * has higher cost too. It would also probably require locking * the anon_vma. */ - if (vma->anon_vma) + if (READ_ONCE(vma->anon_vma)) continue; addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); if (addr & ~HPAGE_PMD_MASK) @@ -1642,6 +1642,18 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { + /* + * Re-check whether we have an ->anon_vma, because + * collapse_and_free_pmd() requires that either no + * ->anon_vma exists or the anon_vma is locked. + * We already checked ->anon_vma above, but that check + * is racy because ->anon_vma can be populated under the + * mmap lock in read mode. + */ + if (vma->anon_vma) { + mmap_write_unlock(mm); + continue; + } if (!khugepaged_test_exit(mm)) { struct mmu_notifier_range range;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/14330 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/14330 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...