The folio migration is widely used in kernel, memory compaction, memory hotplug, soft offline page, numa balance, memory demote/promotion, etc, but once access a poisoned source folio when migrating, the kernel will panic.
There is a mechanism in the kernel to recover from uncorrectable memory errors, ARCH_HAS_COPY_MC(eg, Machine Check Safe Memory Copy on x86), which is already used in NVDIMM or core-mm paths(eg, CoW, khugepaged, coredump, ksm copy), see copy_mc_to_{user,kernel}, copy_mc_{user_}highpage callers.
This series of patches provide the recovery mechanism from folio copy for the widely used folio migration. Please note, because folio migration is no guarantee of success, so we could chose to make folio migration tolerant of memory failures, adding folio_mc_copy() which is a #MC versions of folio_copy(), once accessing a poisoned source folio, we could return error and make the folio migration fail, and this could avoid the similar panic shown below.
CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0 pc : copy_page+0x10/0xc0 lr : copy_highpage+0x38/0x50 ... Call trace: copy_page+0x10/0xc0 folio_copy+0x78/0x90 migrate_folio_extra+0x54/0xa0 move_to_new_folio+0xd8/0x1f0 migrate_folio_move+0xb8/0x300 migrate_pages_batch+0x528/0x788 migrate_pages_sync+0x8c/0x258 migrate_pages+0x440/0x528 soft_offline_in_use_page+0x2ec/0x3c0 soft_offline_page+0x238/0x310 soft_offline_page_store+0x6c/0xc0 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x130/0x1c8 new_sync_write+0xa4/0x138 vfs_write+0x238/0x2d8 ksys_write+0x74/0x110
Kefeng Wang (8): mm: migrate: simplify __buffer_migrate_folio() mm: migrate_device: unify migrate folio for MIGRATE_SYNC_NO_COPY mm: migrate: remove migrate_folio_extra() mm: move memory_failure_queue() into copy_mc_[user]_highpage() mm: add folio_mc_copy() mm: migrate: split folio_migrate_mapping() mm: migrate: support poisoned recover from migrate folio fs: hugetlbfs: support poisoned recover from hugetlbfs_migrate_folio()
fs/hugetlbfs/inode.c | 5 +- include/linux/highmem.h | 6 +++ include/linux/migrate.h | 2 - include/linux/mm.h | 1 + mm/ksm.c | 1 - mm/memory.c | 13 ++--- mm/migrate.c | 102 ++++++++++++++++++++-------------------- mm/migrate_device.c | 16 ++++--- mm/util.c | 17 +++++++ 9 files changed, 88 insertions(+), 75 deletions(-)
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 01878f10f8e01e6ca1040ccc19b76e10ff7678ad category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "mm: cleanup MIGRATE_SYNC_NO_COPY mode".
Commit 2916ecc0f9d4 ("mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY") introduce a new MIGRATE_SYNC_NO_COPY mode to allow to offload the copy to a device DMA engine, which is only used __migrate_device_pages() to decide whether or not copy the old page, and the MIGRATE_SYNC_NO_COPY mode only used in hmm, a easy way is just to call the folio_migrate_mapping() and folio_migrate_flags(), which help to remove the MIGRATE_SYNC_NO_COPY mode.
This patch (of 5):
Use filemap_migrate_folio() helper to simplify __buffer_migrate_folio().
Link: https://lkml.kernel.org/r/20240524052843.182275-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240524052843.182275-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Tony Luck tony.luck@intel.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/migrate.c [Conflicts due to context "spin_unlock(&mapping->i_private_lock)" inconsistency]
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- mm/migrate.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index f2f3f3cf3fe2..7bef491330e6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -792,24 +792,16 @@ static int __buffer_migrate_folio(struct address_space *mapping, } }
- rc = folio_migrate_mapping(mapping, dst, src, 0); + rc = filemap_migrate_folio(mapping, dst, src, mode); if (rc != MIGRATEPAGE_SUCCESS) goto unlock_buffers;
- folio_attach_private(dst, folio_detach_private(src)); - bh = head; do { folio_set_bh(bh, dst, bh_offset(bh)); bh = bh->b_this_page; } while (bh != head);
- if (mode != MIGRATE_SYNC_NO_COPY) - folio_migrate_copy(dst, src); - else - folio_migrate_flags(dst, src); - - rc = MIGRATEPAGE_SUCCESS; unlock_buffers: if (check_refs) spin_unlock(&mapping->private_lock);
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 15b0c79cfadad6f84ad773b9e4bd95e8a93a0846 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The __migrate_device_pages() won't copy page so MIGRATE_SYNC_NO_COPY passed into migrate_folio()/migrate_folio_extra(), actually a easy way is just to call folio_migrate_mapping()/folio_migrate_flags(), converting it to unify and simplify the migrate device pages, which also remove the only call for MIGRATE_SYNC_NO_COPY.
Link: https://lkml.kernel.org/r/20240524052843.182275-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Jane Chu jane.chu@oracle.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/migrate_device.c [Conflicts due to folio convert in __migrate_device_pages()]
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- mm/migrate_device.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6998768a7297..58636163731a 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -695,7 +695,7 @@ static void __migrate_device_pages(unsigned long *src_pfns, struct page *newpage = migrate_pfn_to_page(dst_pfns[i]); struct page *page = migrate_pfn_to_page(src_pfns[i]); struct address_space *mapping; - int r; + int r, extra_cnt = 0;
if (!newpage) { src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; @@ -758,15 +758,17 @@ static void __migrate_device_pages(unsigned long *src_pfns, continue; }
+ BUG_ON(folio_test_writeback(page_folio(page))); + if (migrate && migrate->fault_page == page) - r = migrate_folio_extra(mapping, page_folio(newpage), - page_folio(page), - MIGRATE_SYNC_NO_COPY, 1); - else - r = migrate_folio(mapping, page_folio(newpage), - page_folio(page), MIGRATE_SYNC_NO_COPY); + extra_cnt = 1; + r = folio_migrate_mapping(mapping, page_folio(newpage), + page_folio(page), extra_cnt); if (r != MIGRATEPAGE_SUCCESS) src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; + else + folio_migrate_flags(page_folio(newpage), + page_folio(page)); }
if (notified)
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 940d6683c79950b21b3762124eabfa9b2f6fee96 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
migrate_folio_extra() is only called in migrate.c now, convert it a static function and take a new src_private argument which could be shared by migrate_folio() and filemap_migrate_folio() to simplify code a bit.
Link: https://lkml.kernel.org/r/20240524052843.182275-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Jane Chu jane.chu@oracle.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- include/linux/migrate.h | 2 -- mm/migrate.c | 33 +++++++++++---------------------- 2 files changed, 11 insertions(+), 24 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 2ce13e8a309b..517f70b70620 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -63,8 +63,6 @@ extern const char *migrate_reason_names[MR_TYPES]; #ifdef CONFIG_MIGRATION
void putback_movable_pages(struct list_head *l); -int migrate_folio_extra(struct address_space *mapping, struct folio *dst, - struct folio *src, enum migrate_mode mode, int extra_count); int migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode); int migrate_pages(struct list_head *l, new_folio_t new, free_folio_t free, diff --git a/mm/migrate.c b/mm/migrate.c index 7bef491330e6..00530c0b9405 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -673,18 +673,19 @@ EXPORT_SYMBOL(folio_migrate_copy); * Migration functions ***********************************************************/
-int migrate_folio_extra(struct address_space *mapping, struct folio *dst, - struct folio *src, enum migrate_mode mode, int extra_count) +static int __migrate_folio(struct address_space *mapping, struct folio *dst, + struct folio *src, void *src_private, + enum migrate_mode mode) { int rc;
- BUG_ON(folio_test_writeback(src)); /* Writeback must be complete */ - - rc = folio_migrate_mapping(mapping, dst, src, extra_count); - + rc = folio_migrate_mapping(mapping, dst, src, 0); if (rc != MIGRATEPAGE_SUCCESS) return rc;
+ if (src_private) + folio_attach_private(dst, folio_detach_private(src)); + if (mode != MIGRATE_SYNC_NO_COPY) folio_migrate_copy(dst, src); else @@ -705,9 +706,10 @@ int migrate_folio_extra(struct address_space *mapping, struct folio *dst, * Folios are locked upon entry and exit. */ int migrate_folio(struct address_space *mapping, struct folio *dst, - struct folio *src, enum migrate_mode mode) + struct folio *src, enum migrate_mode mode) { - return migrate_folio_extra(mapping, dst, src, mode, 0); + BUG_ON(folio_test_writeback(src)); /* Writeback must be complete */ + return __migrate_folio(mapping, dst, src, NULL, mode); } EXPORT_SYMBOL(migrate_folio);
@@ -861,20 +863,7 @@ EXPORT_SYMBOL_GPL(buffer_migrate_folio_norefs); int filemap_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { - int ret; - - ret = folio_migrate_mapping(mapping, dst, src, 0); - if (ret != MIGRATEPAGE_SUCCESS) - return ret; - - if (folio_get_private(src)) - folio_attach_private(dst, folio_detach_private(src)); - - if (mode != MIGRATE_SYNC_NO_COPY) - folio_migrate_copy(dst, src); - else - folio_migrate_flags(dst, src); - return MIGRATEPAGE_SUCCESS; + return __migrate_folio(mapping, dst, src, folio_get_private(src), mode); } EXPORT_SYMBOL_GPL(filemap_migrate_folio);
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 28bdacbcb36d093e23734acccecd139f5fc05f67 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "mm: migrate: support poison recover from migrate folio", v5.
The folio migration is widely used in kernel, memory compaction, memory hotplug, soft offline page, numa balance, memory demote/promotion, etc, but once access a poisoned source folio when migrating, the kernel will panic.
There is a mechanism in the kernel to recover from uncorrectable memory errors, ARCH_HAS_COPY_MC(eg, Machine Check Safe Memory Copy on x86), which is already used in NVDIMM or core-mm paths(eg, CoW, khugepaged, coredump, ksm copy), see copy_mc_to_{user,kernel}, copy_mc_{user_}highpage callers.
This series of patches provide the recovery mechanism from folio copy for the widely used folio migration. Please note, because folio migration is no guarantee of success, so we could chose to make folio migration tolerant of memory failures, adding folio_mc_copy() which is a #MC versions of folio_copy(), once accessing a poisoned source folio, we could return error and make the folio migration fail, and this could avoid the similar panic shown below.
CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0 pc : copy_page+0x10/0xc0 lr : copy_highpage+0x38/0x50 ... Call trace: copy_page+0x10/0xc0 folio_copy+0x78/0x90 migrate_folio_extra+0x54/0xa0 move_to_new_folio+0xd8/0x1f0 migrate_folio_move+0xb8/0x300 migrate_pages_batch+0x528/0x788 migrate_pages_sync+0x8c/0x258 migrate_pages+0x440/0x528 soft_offline_in_use_page+0x2ec/0x3c0 soft_offline_page+0x238/0x310 soft_offline_page_store+0x6c/0xc0 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x130/0x1c8 new_sync_write+0xa4/0x138 vfs_write+0x238/0x2d8 ksys_write+0x74/0x110
This patch (of 5):
There is a memory_failure_queue() call after copy_mc_[user]_highpage(), see callers, eg, CoW/KSM page copy, it is used to mark the source page as h/w poisoned and unmap it from other tasks, and the upcomming poison recover from migrate folio will do the similar thing, so let's move the memory_failure_queue() into the copy_mc_[user]_highpage() instead of adding it into each user, this should also enhance the handling of poisoned page in khugepaged.
Link: https://lkml.kernel.org/r/20240626085328.608006-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240626085328.608006-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Jane Chu jane.chu@oracle.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Lance Yang ioworker0@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/memory.c include/linux/highmem.h [Conflicts due to: 1) mm/memory.c: folio convert in copy_subpage. 2) include/linux/highmem.h: return value in copy_mc_<user>_highpage.]
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- include/linux/highmem.h | 6 ++++++ mm/ksm.c | 1 - mm/memory.c | 13 ++++--------- 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 254e1f69a6f5..967fd264ddd8 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -353,6 +353,9 @@ static inline int copy_mc_user_highpage(struct page *to, struct page *from, kunmap_local(vto); kunmap_local(vfrom);
+ if (ret) + memory_failure_queue(page_to_pfn(from), 0); + return ret ? -EFAULT : 0; } #endif @@ -371,6 +374,9 @@ static inline int copy_mc_highpage(struct page *to, struct page *from) kunmap_local(vto); kunmap_local(vfrom);
+ if (ret) + memory_failure_queue(page_to_pfn(from), 0); + return ret ? -EFAULT : 0; } #endif diff --git a/mm/ksm.c b/mm/ksm.c index de0de7ba1d6b..71f72570db4e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2828,7 +2828,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio, if (copy_mc_user_highpage(folio_page(new_folio, 0), page, addr, vma)) { folio_put(new_folio); - memory_failure_queue(folio_pfn(folio), 0); return ERR_PTR(-EHWPOISON); } folio_set_dirty(new_folio); diff --git a/mm/memory.c b/mm/memory.c index 2771c10454e1..3b0cb6f19b8f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2983,10 +2983,8 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, unsigned long addr = vmf->address;
if (likely(src)) { - if (copy_mc_user_highpage(dst, src, addr, vma)) { - memory_failure_queue(page_to_pfn(src), 0); + if (copy_mc_user_highpage(dst, src, addr, vma)) return -EHWPOISON; - } return 0; }
@@ -6508,10 +6506,8 @@ static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
cond_resched(); if (copy_mc_user_highpage(dst_page, src_page, - addr + i*PAGE_SIZE, vma)) { - memory_failure_queue(page_to_pfn(src_page), 0); + addr + i*PAGE_SIZE, vma)) return -EHWPOISON; - } } return 0; } @@ -6527,10 +6523,9 @@ static int copy_subpage(unsigned long addr, int idx, void *arg) struct copy_subpage_arg *copy_arg = arg;
if (copy_mc_user_highpage(copy_arg->dst + idx, copy_arg->src + idx, - addr, copy_arg->vma)) { - memory_failure_queue(page_to_pfn(copy_arg->src + idx), 0); + addr, copy_arg->vma)) return -EHWPOISON; - } + return 0; }
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 02f4ee5a144cef6b26421cb42cca64bb4138d459 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add a #MC variant of folio_copy() which uses copy_mc_highpage() to support
Link: https://lkml.kernel.org/r/20240626085328.608006-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Jane Chu jane.chu@oracle.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Lance Yang ioworker0@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/util.c [Conflicts due to is folio_copy() is exported by 4093602d6bb ("nilfs2: convert nilfs_copy_page() to nilfs_copy_folio()")] Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- include/linux/mm.h | 1 + mm/util.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index cd7d5f78477d..9423d2bc5f44 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1318,6 +1318,7 @@ void put_pages_list(struct list_head *pages);
void split_page(struct page *page, unsigned int order); void folio_copy(struct folio *dst, struct folio *src); +int folio_mc_copy(struct folio *dst, struct folio *src);
unsigned long nr_free_buffer_pages(void);
diff --git a/mm/util.c b/mm/util.c index e41ac8a58eb5..8a677e1287a1 100644 --- a/mm/util.c +++ b/mm/util.c @@ -815,6 +815,23 @@ void folio_copy(struct folio *dst, struct folio *src) } }
+int folio_mc_copy(struct folio *dst, struct folio *src) +{ + long nr = folio_nr_pages(src); + long i = 0; + + for (;;) { + if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) + return -EHWPOISON; + if (++i == nr) + break; + cond_resched(); + } + + return 0; +} +EXPORT_SYMBOL(folio_mc_copy); + int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS; int sysctl_overcommit_ratio __read_mostly = 50; unsigned long sysctl_overcommit_kbytes __read_mostly;
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 528815392f873f0af8c6cdc279c89bd0154cbf6a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The folio refcount check is moved out for both !mapping and mapping folio, also update comment from page to folio for folio_migrate_mapping().
No functional change intended.
Link: https://lkml.kernel.org/r/20240626085328.608006-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jane Chu jane.chu@oracle.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Lance Yang ioworker0@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- mm/migrate.c | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 00530c0b9405..f716005e15e6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -391,28 +391,23 @@ static int folio_expected_refs(struct address_space *mapping, }
/* - * Replace the page in the mapping. + * Replace the folio in the mapping. * * The number of remaining references must be: - * 1 for anonymous pages without a mapping - * 2 for pages with a mapping - * 3 for pages with a mapping and PagePrivate/PagePrivate2 set. + * 1 for anonymous folios without a mapping + * 2 for folios with a mapping + * 3 for folios with a mapping and PagePrivate/PagePrivate2 set. */ -int folio_migrate_mapping(struct address_space *mapping, - struct folio *newfolio, struct folio *folio, int extra_count) +static int __folio_migrate_mapping(struct address_space *mapping, + struct folio *newfolio, struct folio *folio, int expected_count) { XA_STATE(xas, &mapping->i_pages, folio_index(folio)); struct zone *oldzone, *newzone; int dirty; - int expected_count = folio_expected_refs(mapping, folio) + extra_count; long nr = folio_nr_pages(folio); long entries, i;
if (!mapping) { - /* Anonymous page without mapping */ - if (folio_ref_count(folio) != expected_count) - return -EAGAIN; - /* Take off deferred split queue while frozen and memcg set */ if (folio_test_large(folio) && folio_test_large_rmappable(folio)) { @@ -462,7 +457,7 @@ int folio_migrate_mapping(struct address_space *mapping, entries = 1; }
- /* Move dirty while page refs frozen and newpage not yet exposed */ + /* Move dirty while folio refs frozen and newfolio not yet exposed */ dirty = folio_test_dirty(folio); if (dirty) { folio_clear_dirty(folio); @@ -476,7 +471,7 @@ int folio_migrate_mapping(struct address_space *mapping, }
/* - * Drop cache reference from old page by unfreezing + * Drop cache reference from old folio by unfreezing * to one less reference. * We know this isn't the last reference. */ @@ -492,11 +487,11 @@ int folio_migrate_mapping(struct address_space *mapping,
/* * If moved to a different zone then also account - * the page for that zone. Other VM counters will be + * the folio for that zone. Other VM counters will be * taken care of when we establish references to the - * new page and drop references to the old page. + * new folio and drop references to the old folio. * - * Note that anonymous pages are accounted for + * Note that anonymous folios are accounted for * via NR_FILE_PAGES and NR_ANON_MAPPED if they * are mapped to swap space. */ @@ -536,6 +531,17 @@ int folio_migrate_mapping(struct address_space *mapping,
return MIGRATEPAGE_SUCCESS; } + +int folio_migrate_mapping(struct address_space *mapping, + struct folio *newfolio, struct folio *folio, int extra_count) +{ + int expected_count = folio_expected_refs(mapping, folio) + extra_count; + + if (folio_ref_count(folio) != expected_count) + return -EAGAIN; + + return __folio_migrate_mapping(mapping, newfolio, folio, expected_count); +} EXPORT_SYMBOL(folio_migrate_mapping);
/*
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 060913999d7a9e50c283fdb15253fc27974ddadc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The folio migration is widely used in kernel, memory compaction, memory hotplug, soft offline page, numa balance, memory demote/promotion, etc, but once access a poisoned source folio when migrating, the kerenl will panic.
There is a mechanism in the kernel to recover from uncorrectable memory errors, ARCH_HAS_COPY_MC, which is already used in other core-mm paths, eg, CoW, khugepaged, coredump, ksm copy, see copy_mc_to_{user,kernel}, copy_mc_{user_}highpage callers.
In order to support poisoned folio copy recover from migrate folio, we chose to make folio migration tolerant of memory failures and return error for folio migration, because folio migration is no guarantee of success, this could avoid the similar panic shown below.
CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0 pc : copy_page+0x10/0xc0 lr : copy_highpage+0x38/0x50 ... Call trace: copy_page+0x10/0xc0 folio_copy+0x78/0x90 migrate_folio_extra+0x54/0xa0 move_to_new_folio+0xd8/0x1f0 migrate_folio_move+0xb8/0x300 migrate_pages_batch+0x528/0x788 migrate_pages_sync+0x8c/0x258 migrate_pages+0x440/0x528 soft_offline_in_use_page+0x2ec/0x3c0 soft_offline_page+0x238/0x310 soft_offline_page_store+0x6c/0xc0 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x130/0x1c8 new_sync_write+0xa4/0x138 vfs_write+0x238/0x2d8 ksys_write+0x74/0x110
Note, folio copy is moved in the begin of the __migrate_folio(), which could simplify the error handling since there is no turning back if folio_migrate_mapping() return success, the downside is the folio copied even though folio_migrate_mapping() return fail, an optimization is to check whether source folio does not have extra refs before we do folio copy.
Link: https://lkml.kernel.org/r/20240626085328.608006-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jane Chu jane.chu@oracle.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Lance Yang ioworker0@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: mm/migrate.c [Conflicts due to MIGRATE_SYNC_NO_COPY mode]
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- mm/migrate.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index f716005e15e6..e58f4fb73b09 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -683,19 +683,24 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, void *src_private, enum migrate_mode mode) { - int rc; + int rc, expected_count = folio_expected_refs(mapping, src); + + /* Check whether src does not have extra refs before we do more work */ + if (folio_ref_count(src) != expected_count) + return -EAGAIN;
- rc = folio_migrate_mapping(mapping, dst, src, 0); + rc = folio_mc_copy(dst, src); + if (unlikely(rc)) + return rc; + + rc = __folio_migrate_mapping(mapping, dst, src, expected_count); if (rc != MIGRATEPAGE_SUCCESS) return rc;
if (src_private) folio_attach_private(dst, folio_detach_private(src));
- if (mode != MIGRATE_SYNC_NO_COPY) - folio_migrate_copy(dst, src); - else - folio_migrate_flags(dst, src); + folio_migrate_flags(dst, src); return MIGRATEPAGE_SUCCESS; }
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit f00b295b9b61bb332b4f5951f479ab3aaeada5b8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAROKE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This is similar to __migrate_folio(), use folio_mc_copy() in HugeTLB folio migration to avoid panic when copy from poisoned folio.
Link: https://lkml.kernel.org/r/20240626085328.608006-6-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Alistair Popple apopple@nvidia.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jane Chu jane.chu@oracle.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Lance Yang ioworker0@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: fs/hugetlbfs/inode.c [Conflicts due to MIGRATE_SYNC_NO_COPY mode]
Signed-off-by: Tong Tiangen tongtiangen@huawei.com --- fs/hugetlbfs/inode.c | 5 +---- mm/migrate.c | 10 ++++++++-- 2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c4f3c5d631f8..8386d78eabe5 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1156,10 +1156,7 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping, hugetlb_set_folio_subpool(src, NULL); }
- if (mode != MIGRATE_SYNC_NO_COPY) - folio_migrate_copy(dst, src); - else - folio_migrate_flags(dst, src); + folio_migrate_flags(dst, src);
return MIGRATEPAGE_SUCCESS; } diff --git a/mm/migrate.c b/mm/migrate.c index e58f4fb73b09..efcca5330568 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -552,10 +552,16 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src) { XA_STATE(xas, &mapping->i_pages, folio_index(src)); - int expected_count; + int rc, expected_count = folio_expected_refs(mapping, src); + + if (folio_ref_count(src) != expected_count) + return -EAGAIN; + + rc = folio_mc_copy(dst, src); + if (unlikely(rc)) + return rc;
xas_lock_irq(&xas); - expected_count = folio_expected_refs(mapping, src); if (!folio_ref_freeze(src, expected_count)) { xas_unlock_irq(&xas); return -EAGAIN;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/11569 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/11569 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...