From: Ma Wupeng mawupeng1@huawei.com
During page eject, machine check safe is enhanced as follow:
During soft offline page, page is copied to new page in kernel. If the origin page has UCE, there will lead to kernel panic.
In order to solve this problem, use machine check safe to catch this error which can be achieved by using copy_mc_to_kernel to replace copy_page. Signal SIGBUS will be send to user task if this UCE is consumed by this situation to avoid kernel panic.
Changelog since v1: - remove unused ret in patch #4 - add a bugfix for page eject
Jiaqi Yan (1): mm/hwpoison: introduce copy_mc_highpage
Ma Wupeng (6): mm: page_eject: Return right value during removal mm/hwpoison: arm64: introduce copy_mc_highpage mm/hwpoison: introduce copy_mc_highpages mm/hwpoison: add migrate_page_mc_extra() mm: Update PF_COREDUMP_MCS to PF_MCS mm: page_eject: Add mc support during offline page
arch/arm64/include/asm/page.h | 1 + arch/arm64/mm/copypage.c | 13 ++++++ drivers/ras/page_eject.c | 10 ++++- fs/coredump.c | 4 +- include/linux/highmem.h | 52 ++++++++++++++++++++++++ include/linux/sched.h | 2 +- lib/iov_iter.c | 2 +- mm/migrate.c | 76 ++++++++++++++++++++++++++++++----- 8 files changed, 145 insertions(+), 15 deletions(-)
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
--------------------------------
Invalid value will be returned if there is no suitable pfn during search, fix it.
Fixes: ecbd5d7cb9c7 ("mm: page_eject: Introuduce page ejection") Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- drivers/ras/page_eject.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/page_eject.c b/drivers/ras/page_eject.c index 1d609384b692..13db543edfbc 100644 --- a/drivers/ras/page_eject.c +++ b/drivers/ras/page_eject.c @@ -20,18 +20,19 @@ struct ejected_pfn {
static struct ejected_pfn *page_eject_remove_pfn_locked(unsigned long pfn) { - struct ejected_pfn *item = NULL, *next; + struct ejected_pfn *item, *next, *ret = NULL;
mutex_lock(&eject_page_mutex); list_for_each_entry_safe(item, next, &eject_page_list, list) { if (pfn == item->pfn) { list_del(&item->list); + ret = item; break; } } mutex_unlock(&eject_page_mutex);
- return item; + return ret; }
static void page_eject_add_pfn_locked(struct ejected_pfn *item)
From: Jiaqi Yan jiaqiyan@google.com
mainline inclusion from mainline-v5.4-rc1 commit 6efc7afb5cc98488410d44695685d003d832534d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Similar to how copy_mc_user_highpage is implemented for copy_user_highpage on #MC supported architecture, introduce the #MC handled version of copy_highpage.
This helper has immediate usage when khugepaged wants to copy file-backed memory pages and tolerate #MC.
Link: https://lkml.kernel.org/r/20230329151121.949896-3-jiaqiyan@google.com Signed-off-by: Jiaqi Yan jiaqiyan@google.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: David Stevens stevensd@chromium.org Cc: Hugh Dickins hughd@google.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: "Kirill A. Shutemov" kirill@shutemov.name Cc: Miaohe Lin linmiaohe@huawei.com Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Tong Tiangen tongtiangen@huawei.com Cc: Tony Luck tony.luck@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (wupeng: backport copy_mc_highpage for this patch) Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- include/linux/highmem.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index cc5fe6c620ad..366198ebba71 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -396,4 +396,32 @@ static inline void memcpy_to_page(struct page *page, size_t offset, kunmap_atomic(to); }
+#ifdef copy_mc_to_kernel +/* + * If architecture supports machine check exception handling, define the + * #MC versions of copy_user_highpage and copy_highpage. They copy a memory + * page with #MC in source page (@from) handled, and return the number + * of bytes not copied if there was a #MC, otherwise 0 for success. + */ +static inline int copy_mc_highpage(struct page *to, struct page *from) +{ + char *vfrom, *vto; + int ret; + + vfrom = kmap_atomic(from); + vto = kmap_atomic(to); + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); + kunmap_atomic(vto); + kunmap_atomic(vfrom); + + return ret; +} +#else +static inline int copy_mc_highpage(struct page *to, struct page *from) +{ + copy_highpage(to, from); + return 0; +} +#endif + #endif /* _LINUX_HIGHMEM_H */
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
--------------------------------
Introduce copy_mc_highpage for arm64 to support do_mte.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- arch/arm64/include/asm/page.h | 1 + arch/arm64/mm/copypage.c | 13 +++++++++++++ include/linux/highmem.h | 2 ++ 3 files changed, 16 insertions(+)
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 4d3ba27b96cb..09b898a3e57c 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -31,6 +31,7 @@ void copy_highpage(struct page *to, struct page *from); #ifdef CONFIG_ARCH_HAS_COPY_MC extern void copy_page_mc(void *to, const void *from); void copy_highpage_mc(struct page *to, struct page *from); +int copy_mc_highpage(struct page *to, struct page *from); #define __HAVE_ARCH_COPY_HIGHPAGE_MC
void copy_user_highpage_mc(struct page *to, struct page *from, diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c index 7c1705c5e9d5..0696820d72ab 100644 --- a/arch/arm64/mm/copypage.c +++ b/arch/arm64/mm/copypage.c @@ -61,4 +61,17 @@ void copy_user_highpage_mc(struct page *to, struct page *from, flush_dcache_page(to); } EXPORT_SYMBOL_GPL(copy_user_highpage_mc); + +int copy_mc_highpage(struct page *to, struct page *from) +{ + void *kto = page_address(to); + void *kfrom = page_address(from); + int ret; + + ret = copy_mc_to_kernel(kto, kfrom, PAGE_SIZE); + if (!ret) + do_mte(to, from, kto, kfrom, true); + + return ret; +} #endif diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 366198ebba71..94592d7630f4 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -397,6 +397,7 @@ static inline void memcpy_to_page(struct page *page, size_t offset, }
#ifdef copy_mc_to_kernel +#ifndef __HAVE_ARCH_COPY_USER_HIGHPAGE_MC /* * If architecture supports machine check exception handling, define the * #MC versions of copy_user_highpage and copy_highpage. They copy a memory @@ -416,6 +417,7 @@ static inline int copy_mc_highpage(struct page *to, struct page *from)
return ret; } +#endif #else static inline int copy_mc_highpage(struct page *to, struct page *from) {
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
--------------------------------
Commit 6efc7afb5cc9 ("mm/hwpoison: introduce copy_mc_highpage") bring mc support to copy_mc_highpage, however during huge page copy, copy_mc_highpages is needed. intruduce copy_mc_highpages to support huge page copy mc.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- include/linux/highmem.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 94592d7630f4..ebfee2b672d3 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -418,12 +418,34 @@ static inline int copy_mc_highpage(struct page *to, struct page *from) return ret; } #endif + +/* Return -EFAULT if there was a #MC during copy, otherwise 0 for success. */ +static inline int copy_mc_highpages(struct page *to, struct page *from, int nr_pages) +{ + int ret = 0; + int i; + + for (i = 0; i < nr_pages; i++) { + cond_resched(); + ret = copy_mc_highpage(to + i, from + i); + if (ret) + return -EFAULT; + } + + return ret; +} #else static inline int copy_mc_highpage(struct page *to, struct page *from) { copy_highpage(to, from); return 0; } + +static inline int copy_mc_highpages(struct page *to, struct page *from, int nr_pages) +{ + copy_highpages(to, from, nr_pages); + return 0; +} #endif
#endif /* _LINUX_HIGHMEM_H */
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
--------------------------------
During page migration, page is copied in kernel space. If the origin page has UCE, there will lead to kernel panic.
In order to solve this problem, use machine check safe to catch this error which can be achieved by using copy_mc_to_kernel to replace copy_page. Signal SIGBUS will be send to user task if this UCE is consumed by this situation to avoid kernel panic.
Add a new param to copy_huge_page to support mc. If mc is set copy_mc_higepage will be called rather than copy_hugepage during memory copy.
Since migrate_page_move_mapping() is done before page copy, rollback is hard due to race condition. Do copy page at the start of function migrate_page_mc_extra() to solve this problem.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- mm/migrate.c | 70 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 61 insertions(+), 9 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 646918708922..dca35d8ba464 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -548,24 +548,33 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, * arithmetic will work across the entire page. We need something more * specialized. */ -static void __copy_gigantic_page(struct page *dst, struct page *src, - int nr_pages) +static int __copy_gigantic_page(struct page *dst, struct page *src, + int nr_pages, bool mc) { - int i; + int i, ret = 0; struct page *dst_base = dst; struct page *src_base = src;
for (i = 0; i < nr_pages; ) { cond_resched(); - copy_highpage(dst, src); + + if (mc) { + ret = copy_mc_highpage(dst, src); + if (ret) + return -EFAULT; + } else { + copy_highpage(dst, src); + }
i++; dst = mem_map_next(dst, dst_base, i); src = mem_map_next(src, src_base, i); } + + return ret; }
-static void copy_huge_page(struct page *dst, struct page *src) +static int __copy_huge_page(struct page *dst, struct page *src, bool mc) { int nr_pages;
@@ -574,17 +583,29 @@ static void copy_huge_page(struct page *dst, struct page *src) struct hstate *h = page_hstate(src); nr_pages = pages_per_huge_page(h);
- if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) { - __copy_gigantic_page(dst, src, nr_pages); - return; - } + if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) + return __copy_gigantic_page(dst, src, nr_pages, mc); } else { /* thp page */ BUG_ON(!PageTransHuge(src)); nr_pages = thp_nr_pages(src); }
+ if (mc) + return copy_mc_highpages(dst, src, nr_pages); + copy_highpages(dst, src, nr_pages); + return 0; +} + +static int copy_huge_page(struct page *dst, struct page *src) +{ + return __copy_huge_page(dst, src, false); +} + +static int copy_mc_huge_page(struct page *dst, struct page *src) +{ + return __copy_huge_page(dst, src, true); }
/* @@ -674,6 +695,37 @@ void migrate_page_copy(struct page *newpage, struct page *page) } EXPORT_SYMBOL(migrate_page_copy);
+static int migrate_page_copy_mc(struct page *newpage, struct page *page) +{ + int rc; + + if (PageHuge(page) || PageTransHuge(page)) + rc = copy_mc_huge_page(newpage, page); + else + rc = copy_mc_highpage(newpage, page); + + return rc; +} + +static int migrate_page_mc_extra(struct address_space *mapping, + struct page *newpage, struct page *page, + enum migrate_mode mode, int extra_count) +{ + int rc; + + rc = migrate_page_copy_mc(newpage, page); + if (rc) + return rc; + + rc = migrate_page_move_mapping(mapping, newpage, page, extra_count); + if (rc != MIGRATEPAGE_SUCCESS) + return rc; + + migrate_page_states(newpage, page); + + return rc; +} + /************************************************************ * Migration functions ***********************************************************/
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
--------------------------------
Update PF_COREDUMP_MCS to PF_MCS to indicate machine check safe support for specific functions.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- fs/coredump.c | 4 ++-- include/linux/sched.h | 2 +- lib/iov_iter.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c index 6c0d0a42fda9..535c3fdc1598 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -907,9 +907,9 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start, if (page) { void *kaddr = kmap(page);
- current->flags |= PF_COREDUMP_MCS; + current->flags |= PF_MCS; stop = !dump_emit(cprm, kaddr, PAGE_SIZE); - current->flags &= ~PF_COREDUMP_MCS; + current->flags &= ~PF_MCS; kunmap(page); put_page(page); } else { diff --git a/include/linux/sched.h b/include/linux/sched.h index 579e47c22980..8ccbca99ace1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1661,7 +1661,7 @@ extern struct pid *cad_pid; #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ -#define PF_COREDUMP_MCS 0x01000000 /* Task coredump support machine check safe */ +#define PF_MCS 0x01000000 /* Mc is support for specific function(eg. coredump) for this task */ #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_NOCMA 0x10000000 /* All allocation request will have _GFP_MOVABLE cleared */ diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 0a4b7aa47097..ce8c225237f5 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -752,7 +752,7 @@ EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
static void *memcpy_iter(void *to, const void *from, __kernel_size_t size) { - if (IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && current->flags & PF_COREDUMP_MCS) + if (IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && current->flags & PF_MCS) return (void *)copy_mc_to_kernel(to, from, size); else return memcpy(to, from, size);
From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8K5CO
--------------------------------
During offline page in page ejection, page is copied in kernel space in migrate page, set PF_MCS to task flags which will call migrate_page_mc_extra to support mc.
Signed-off-by: Ma Wupeng mawupeng1@huawei.com --- drivers/ras/page_eject.c | 5 +++++ mm/migrate.c | 6 ++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/ras/page_eject.c b/drivers/ras/page_eject.c index 13db543edfbc..2b5745dab89a 100644 --- a/drivers/ras/page_eject.c +++ b/drivers/ras/page_eject.c @@ -77,8 +77,13 @@ static int page_eject_offline_page(unsigned long pfn) * if soft_offline_page return 0 because PageHWPoison, this pfn * will add to list and this add will be removed during online * since it is poisoned. + * + * Update task flag with PF_MCS to enable mc support during page + * migration. */ + current->flags |= PF_MCS; ret = soft_offline_page(pfn, 0); + current->flags &= ~PF_MCS; if (ret) { pr_err("page fail to be offlined, soft_offline_page failed(%d), pfn=%#lx\n", ret, pfn); diff --git a/mm/migrate.c b/mm/migrate.c index dca35d8ba464..9d40b1264a8b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -738,6 +738,12 @@ int migrate_page_extra(struct address_space *mapping,
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
+ if (unlikely(IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && + (current->flags & PF_MCS) && + (mode != MIGRATE_SYNC_NO_COPY))) + return migrate_page_mc_extra(mapping, newpage, page, mode, + extra_count); + rc = migrate_page_move_mapping(mapping, newpage, page, extra_count);
if (rc != MIGRATEPAGE_SUCCESS)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3214 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3214 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...