Jinjiang Tu (1): mm/ksm: remove redundant code in ksm_fork
Lorenzo Stoakes (2): fork: do not invoke uffd on fork if error occurs fork: only invoke khugepaged, ksm hooks if no error
fs/userfaultfd.c | 28 ++++++++++++++++++++++++++++ include/linux/ksm.h | 19 ++++--------------- include/linux/userfaultfd_k.h | 5 +++++ kernel/fork.c | 12 ++++++------ 4 files changed, 43 insertions(+), 21 deletions(-)
mainline inclusion from mainline-v6.10-rc1 commit 7edea4c6fdf23754c77582a0377791e1aa9d2700 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB23JR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Since commit 3c6f33b7273a ("mm: support fork/exec for prctl"), when a child process is forked, the MMF_VM_MERGE_ANY flag will be inherited in mm_init(). So, it's unnecessary to set the flag in ksm_fork().
Link: https://lkml.kernel.org/r/20240402024934.1093361-1-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu tujinjiang@huawei.com Reviewed-by: David Hildenbrand david@redhat.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Nanyong Sun sunnanyong@huawei.com Cc: Rik van Riel riel@surriel.com Cc: Stefan Roesch shr@devkernel.io Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/ksm.h | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/include/linux/ksm.h b/include/linux/ksm.h index c8144f9ca9d8..2a4088ce24a0 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -56,16 +56,8 @@ static inline long mm_ksm_zero_pages(struct mm_struct *mm)
static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { - int ret; - - if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) { - ret = __ksm_enter(mm); - if (ret) - return ret; - } - - if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags)) - set_bit(MMF_VM_MERGE_ANY, &mm->flags); + if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) + return __ksm_enter(mm);
return 0; }
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
mainline inclusion from mainline-v6.12-rc6 commit f64e67e5d3a45a4a04286c47afade4b518acd47b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB23JR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Patch series "fork: do not expose incomplete mm on fork".
During fork we may place the virtual memory address space into an inconsistent state before the fork operation is complete.
In addition, we may encounter an error during the fork operation that indicates that the virtual memory address space is invalidated.
As a result, we should not be exposing it in any way to external machinery that might interact with the mm or VMAs, machinery that is not designed to deal with incomplete state.
We specifically update the fork logic to defer khugepaged and ksm to the end of the operation and only to be invoked if no error arose, and disallow uffd from observing fork events should an error have occurred.
This patch (of 2):
Currently on fork we expose the virtual address space of a process to userland unconditionally if uffd is registered in VMAs, regardless of whether an error arose in the fork.
This is performed in dup_userfaultfd_complete() which is invoked unconditionally, and performs two duties - invoking registered handlers for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down userfaultfd_fork_ctx objects established in dup_userfaultfd().
This is problematic, because the virtual address space may not yet be correctly initialised if an error arose.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") makes this more pertinent as we may be in a state where entries in the maple tree are not yet consistent.
We address this by, on fork error, ensuring that we roll back state that we would otherwise expect to clean up through the event being handled by userland and perform the memory freeing duty otherwise performed by dup_userfaultfd_complete().
We do this by implementing a new function, dup_userfaultfd_fail(), which performs the same loop, only decrementing reference counts.
Note that we perform mmgrab() on the parent and child mm's, however userfaultfd_ctx_put() will mmdrop() this once the reference count drops to zero, so we will avoid memory leaks correctly here.
Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.172901437... Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Jann Horn jannh@google.com Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Jan Kara jack@suse.cz Cc: Linus Torvalds torvalds@linuxfoundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- fs/userfaultfd.c | 28 ++++++++++++++++++++++++++++ include/linux/userfaultfd_k.h | 5 +++++ kernel/fork.c | 5 ++++- 3 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e7cb9d70cad9..c64ad1fca4e4 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -777,6 +777,34 @@ void dup_userfaultfd_complete(struct list_head *fcs) } }
+void dup_userfaultfd_fail(struct list_head *fcs) +{ + struct userfaultfd_fork_ctx *fctx, *n; + + /* + * An error has occurred on fork, we will tear memory down, but have + * allocated memory for fctx's and raised reference counts for both the + * original and child contexts (and on the mm for each as a result). + * + * These would ordinarily be taken care of by a user handling the event, + * but we are no longer doing so, so manually clean up here. + * + * mm tear down will take care of cleaning up VMA contexts. + */ + list_for_each_entry_safe(fctx, n, fcs, list) { + struct userfaultfd_ctx *octx = fctx->orig; + struct userfaultfd_ctx *ctx = fctx->new; + + atomic_dec(&octx->mmap_changing); + VM_BUG_ON(atomic_read(&octx->mmap_changing) < 0); + userfaultfd_ctx_put(octx); + userfaultfd_ctx_put(ctx); + + list_del(&fctx->list); + kfree(fctx); + } +} + void mremap_userfaultfd_prep(struct vm_area_struct *vma, struct vm_userfaultfd_ctx *vm_ctx) { diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 9427d5fccf7b..aa8e3725f103 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -186,6 +186,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); +void dup_userfaultfd_fail(struct list_head *);
extern void mremap_userfaultfd_prep(struct vm_area_struct *, struct vm_userfaultfd_ctx *); @@ -261,6 +262,10 @@ static inline void dup_userfaultfd_complete(struct list_head *l) { }
+static inline void dup_userfaultfd_fail(struct list_head *l) +{ +} + static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma, struct vm_userfaultfd_ctx *ctx) { diff --git a/kernel/fork.c b/kernel/fork.c index 97a89ab68a26..452fb0b4014e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -807,7 +807,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, mmap_write_unlock(mm); flush_tlb_mm(oldmm); mmap_write_unlock(oldmm); - dup_userfaultfd_complete(&uf); + if (!retval) + dup_userfaultfd_complete(&uf); + else + dup_userfaultfd_fail(&uf); fail_uprobe_end: uprobe_end_dup_mmap(); return retval;
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
mainline inclusion from mainline-v6.12-rc6 commit 985da552a98e27096444508ce5d853244019111f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB23JR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
There is no reason to invoke these hooks early against an mm that is in an incomplete state.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") makes this more pertinent as we may be in a state where entries in the maple tree are not yet consistent.
Their placement early in dup_mmap() only appears to have been meaningful for early error checking, and since functionally it'd require a very small allocation to fail (in practice 'too small to fail') that'd only occur in the most dire circumstances, meaning the fork would fail or be OOM'd in any case.
Since both khugepaged and KSM tracking are there to provide optimisations to memory performance rather than critical functionality, it doesn't really matter all that much if, under such dire memory pressure, we fail to register an mm with these.
As a result, we follow the example of commit d2081b2bf819 ("mm: khugepaged: make khugepaged_enter() void function") and make ksm_fork() a void function also.
We only expose the mm to these functions once we are done with them and only if no error occurred in the fork operation.
Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.172901437... Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Jann Horn jannh@google.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Jan Kara jack@suse.cz Cc: Linus Torvalds torvalds@linuxfoundation.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: include/linux/ksm.h [Context conflicts.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/ksm.h | 9 +++------ kernel/fork.c | 7 ++----- 2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 2a4088ce24a0..691c1f54254e 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -54,12 +54,10 @@ static inline long mm_ksm_zero_pages(struct mm_struct *mm) return atomic_long_read(&mm->ksm_zero_pages); }
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) +static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) - return __ksm_enter(mm); - - return 0; + __ksm_enter(mm); }
static inline void ksm_exit(struct mm_struct *mm) @@ -103,9 +101,8 @@ static inline int ksm_disable(struct mm_struct *mm) return 0; }
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) +static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { - return 0; }
static inline void ksm_exit(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index 452fb0b4014e..5c2c16ff75a8 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -685,11 +685,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, mm->exec_vm = oldmm->exec_vm; mm->stack_vm = oldmm->stack_vm;
- retval = ksm_fork(mm, oldmm); - if (retval) - goto out; - khugepaged_fork(mm, oldmm); - /* Use __mt_dup() to efficiently build an identical maple tree. */ retval = __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL); if (unlikely(retval)) @@ -792,6 +787,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, vma_iter_free(&vmi); if (!retval) { mt_set_in_rcu(vmi.mas.tree); + ksm_fork(mm, oldmm); + khugepaged_fork(mm, oldmm); } else if (mpnt) { /* * The entire maple tree has already been duplicated. If the
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/13215 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/13215 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7...