From: Ma Wupeng mawupeng1@huawei.com
Backport memory failure related patches to olk-5.10.
Miaohe Lin (2): mm/hwpoison: fix potential pte_unmap_unlock pte error mm, hwpoison: kill procs if unmap fails
mm/memory-failure.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-)
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v5.15-rc1 commit ea3732f7a1cf636284388988d1a1e56d5cba6044 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5LNL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the first pte is equal to poisoned_pfn, i.e. check_hwpoisoned_entry() return 1, the wrong ptep - 1 would be passed to pte_unmap_unlock().
Link: https://lkml.kernel.org/r/20210814105131.48814-3-linmiaohe@huawei.com Fixes: ad9c59c24095 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- mm/memory-failure.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a5ab0b2f213b..b83482faf38a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -623,7 +623,7 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, { struct hwp_walk *hwp = (struct hwp_walk *)walk->private; int ret = 0; - pte_t *ptep; + pte_t *ptep, *mapped_pte; spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmdp, walk->vma); @@ -636,14 +636,15 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, if (pmd_trans_unstable(pmdp)) goto out;
- ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); + mapped_pte = ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp, + addr, &ptl); for (; addr != end; ptep++, addr += PAGE_SIZE) { ret = check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT, hwp->pfn, &hwp->tk); if (ret == 1) break; } - pte_unmap_unlock(ptep - 1, ptl); + pte_unmap_unlock(mapped_pte, ptl); out: cond_resched(); return ret;
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v6.1-rc1 commit 0792a4a6195a6d67a4ead2554da393cbd8dcdf5a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5LNL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If try_to_unmap() fails, the hwpoisoned page still resides in the address space of some processes. We should kill these processes or the hwpoisoned page might be consumed later. collect_procs() is always called to collect relevant processes now so they can be killed later if unmap fails.
[linmiaohe@huawei.com: v2] Link: https://lkml.kernel.org/r/20220823032346.4260-6-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220818130016.45313-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/memory-failure.c [Ma Wupeng: context conflicts] Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- mm/memory-failure.c | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b83482faf38a..1cab0871c77d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -6,16 +6,16 @@ * High level machine check handler. Handles pages reported by the * hardware as being corrupted usually due to a multi-bit ECC memory or cache * failure. - * + * * In addition there is a "soft offline" entry point that allows stop using * not-yet-corrupted-by-suspicious pages without killing anything. * * Handles page cache pages in various states. The tricky part - * here is that we can access any page asynchronously in respect to - * other VM users, because memory failures could happen anytime and - * anywhere. This could violate some of their assumptions. This is why - * this code has to be extremely careful. Generally it tries to use - * normal locking rules, as in get the standard locks, even if that means + * here is that we can access any page asynchronously in respect to + * other VM users, because memory failures could happen anytime and + * anywhere. This could violate some of their assumptions. This is why + * this code has to be extremely careful. Generally it tries to use + * normal locking rules, as in get the standard locks, even if that means * the error handling takes potentially a long time. * * It can be very tempting to add handling for obscure cases here. @@ -25,12 +25,12 @@ * https://git.kernel.org/cgit/utils/cpu/mce/mce-test.git/ * - The case actually shows up as a frequent (top 10) page state in * tools/vm/page-types when running a real workload. - * + * * There are several operations here with exponential complexity because - * of unsuitable VM data structures. For example the operation to map back - * from RMAP chains to processes has to walk the complete process list and + * of unsuitable VM data structures. For example the operation to map back + * from RMAP chains to processes has to walk the complete process list and * has non linear complexity with the number. But since memory corruptions - * are rare we hope to get away with this. This avoids impacting the core + * are rare we hope to get away with this. This avoids impacting the core * VM. */ #include <linux/kernel.h> @@ -1184,7 +1184,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, struct address_space *mapping; LIST_HEAD(tokill); bool unmap_success = true; - int kill = 1, forcekill; + int forcekill; struct page *hpage = *hpagep; bool mlocked = PageMlocked(hpage);
@@ -1227,7 +1227,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, if (page_mkclean(hpage)) { SetPageDirty(hpage); } else { - kill = 0; ttu |= TTU_IGNORE_HWPOISON; pr_info("Memory failure: %#lx: corrupted page was clean: dropped without side effects\n", pfn); @@ -1238,12 +1237,8 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, * First collect all the processes that have the page * mapped in dirty form. This has to be done before try_to_unmap, * because ttu takes the rmap data structures down. - * - * Error handling: We ignore errors here because - * there's nothing that can be done. */ - if (kill) - collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED); + collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
if (!PageHuge(hpage)) { unmap_success = try_to_unmap(hpage, ttu); @@ -1290,7 +1285,8 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, * use a more force-full uncatchable kill to prevent * any accesses to the poisoned memory. */ - forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL); + forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) || + !unmap_success; kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
return unmap_success;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/13363 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/R...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/13363 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/R...