From: Xia Fukun xiafukun@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I697JG CVE: NA
--------------------------------
In kobject_get_path(), if kobj->name is changed between calls get_kobj_path_length() and fill_kobj_path() and the length becomes longer, then fill_kobj_path() will have an out-of-bounds bug.
The actual current problem occurs when the ixgbe probe.
In ixgbe_mii_bus_init(), if the length of netdev->dev.kobj.name length becomes longer, out-of-bounds will occur.
cpu0 cpu1 ixgbe_probe register_netdev(netdev) netdev_register_kobject device_add kobject_uevent // Sending ADD events systemd-udevd // rename netdev dev_change_name device_rename kobject_rename ixgbe_mii_bus_init | mdiobus_register | __mdiobus_register | device_register | device_add | kobject_uevent | kobject_get_path | len = get_kobj_path_length // old name | path = kzalloc(len, gfp_mask); | kobj->name = name; /* name length becomes * longer */ fill_kobj_path /* kobj path length is * longer than path, * resulting in out of * bounds when filling path */
This is the kasan report:
================================================================== BUG: KASAN: slab-out-of-bounds in fill_kobj_path+0x50/0xc0 Write of size 7 at addr ff1100090573d1fd by task kworker/28:1/673
Workqueue: events work_for_cpu_fn Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_address_description.constprop.0+0x86/0x1e7 print_report+0x36/0x4f kasan_report+0xad/0x130 kasan_check_range+0x35/0x1c0 memcpy+0x39/0x60 fill_kobj_path+0x50/0xc0 kobject_get_path+0x5a/0xc0 kobject_uevent_env+0x140/0x460 device_add+0x5c7/0x910 __mdiobus_register+0x14e/0x490 ixgbe_probe.cold+0x441/0x574 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 process_one_work+0x3b6/0x6a0 worker_thread+0x368/0x520 kthread+0x165/0x1a0 ret_from_fork+0x1f/0x30
This reproducer triggers that bug:
while: do rmmod ixgbe sleep 0.5 modprobe ixgbe sleep 0.5
When calling fill_kobj_path() to fill path, if the name length of kobj becomes longer, return failure and retry. This fixes the problem.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wang Hai wanghai38@huawei.com Signed-off-by: Xia Fukun xiafukun@huawei.com Reviewed-by: songping yu yusongping@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- lib/kobject.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/lib/kobject.c b/lib/kobject.c index c82b88b2e860..d724a5dbcf9b 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -135,7 +135,7 @@ static int get_kobj_path_length(struct kobject *kobj) return length; }
-static void fill_kobj_path(struct kobject *kobj, char *path, int length) +static int fill_kobj_path(struct kobject *kobj, char *path, int length) { struct kobject *parent;
@@ -144,12 +144,16 @@ static void fill_kobj_path(struct kobject *kobj, char *path, int length) int cur = strlen(kobject_name(parent)); /* back up enough to print this name with '/' */ length -= cur; + if (length <= 0) + return -EINVAL; memcpy(path + length, kobject_name(parent), cur); *(path + --length) = '/'; }
pr_debug("kobject: '%s' (%p): %s: path = '%s'\n", kobject_name(kobj), kobj, __func__, path); + + return 0; }
/** @@ -165,13 +169,17 @@ char *kobject_get_path(struct kobject *kobj, gfp_t gfp_mask) char *path; int len;
+retry: len = get_kobj_path_length(kobj); if (len == 0) return NULL; path = kzalloc(len, gfp_mask); if (!path) return NULL; - fill_kobj_path(kobj, path, len); + if (fill_kobj_path(kobj, path, len)) { + kfree(path); + goto retry; + }
return path; }
From: Oscar Salvador osalvador@suse.de
mainline inclusion from mainline-v5.11-rc1 commit 1e8aaedb182d6ddffc894b832e4962629907b3e0 category: bugfix bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI CVE: NA
--------------------------------
madvise_inject_error() uses get_user_pages_fast to translate the address we specified to a page. After [1], we drop the extra reference count for memory_failure() path. That commit says that memory_failure wanted to keep the pin in order to take the page out of circulation.
The truth is that we need to keep the page pinned, otherwise the page might be re-used after the put_page() and we can end up messing with someone else's memory.
E.g:
CPU0 process X CPU1 madvise_inject_error get_user_pages put_page page gets reclaimed process Y allocates the page memory_failure // We mess with process Y memory
madvise() is meant to operate on a self address space, so messing with pages that do not belong to us seems the wrong thing to do. To avoid that, let us keep the page pinned for memory_failure as well.
Pages for DAX mappings will release this extra refcount in memory_failure_dev_pagemap.
[1] ("23e7b5c2e271: mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference") Signed-off-by: Oscar Salvador osalvador@suse.de Suggested-by: Vlastimil Babka vbabka@suse.cz Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Dan Williams dan.j.williams@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/madvise.c Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/madvise.c | 10 +--------- mm/memory-failure.c | 6 ++++++ 2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c index e187cd74e925..a3b8b7ecc930 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -669,15 +669,7 @@ static int madvise_inject_error(int behavior, } else { pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", pfn, start); - /* - * Drop the page reference taken by - * get_user_pages_fast(). In the absence of - * MF_COUNT_INCREASED the memory_failure() routine is - * responsible for pinning the page to prevent it - * from being released back to the page allocator. - */ - put_page(page); - ret = memory_failure(pfn, 0); + ret = memory_failure(pfn, MF_COUNT_INCREASED); }
if (ret) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 1c29e4eac520..dd110d3c82db 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1252,6 +1252,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, int rc = -EBUSY; loff_t start;
+ if (flags & MF_COUNT_INCREASED) + /* + * Drop the extra refcount in case we come from madvise(). + */ + put_page(page); + /* * Prevent the inode from being freed while we are interrogating * the address_space, typically this would be handled by
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v5.18-rc1 commit 75ee64b3c9a9695726056e9ec527e11dbf286500 category: bugfix bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI CVE: NA
--------------------------------
We're only intended to deal with the non-Compound page after we split thp in memory_failure. However, the page could have changed compound pages due to race window. If this happens, we could retry once to hopefully handle the page next round. Also remove unneeded orig_head. It's always equal to the hpage. So we can use hpage directly and remove this redundant one.
Link: https://lkml.kernel.org/r/20220218090118.1105-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/memory-failure.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index dd110d3c82db..11ae0dacaae7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1345,7 +1345,6 @@ int memory_failure(unsigned long pfn, int flags) { struct page *p; struct page *hpage; - struct page *orig_head; struct dev_pagemap *pgmap; int res = 0; unsigned long page_flags; @@ -1383,7 +1382,7 @@ int memory_failure(unsigned long pfn, int flags) goto unlock_mutex; }
- orig_head = hpage = compound_head(p); + hpage = compound_head(p); num_poisoned_pages_inc();
/* @@ -1443,10 +1442,21 @@ int memory_failure(unsigned long pfn, int flags) lock_page(p);
/* - * The page could have changed compound pages during the locking. - * If this happens just bail out. + * We're only intended to deal with the non-Compound page here. + * However, the page could have changed compound pages due to + * race window. If this happens, we could try again to hopefully + * handle the page next round. */ - if (PageCompound(p) && compound_head(p) != orig_head) { + if (PageCompound(p)) { + if (retry) { + if (TestClearPageHWPoison(p)) + num_poisoned_pages_dec(); + unlock_page(p); + put_page(p); + flags &= ~MF_COUNT_INCREASED; + retry = false; + goto try_again; + } action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); res = -EBUSY; goto unlock_page;
From: Naoya Horiguchi naoya.horiguchi@nec.com
mainline inclusion from mainline-v5.19-rc1 commit f361e2462e8cccdd9231aa3274690705a2ea35a2 category: bugfix bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI CVE: NA
--------------------------------
In already hwpoisoned case, memory_failure() is supposed to return with releasing the page refcount taken for error handling. But currently the refcount is not released when called with MF_COUNT_INCREASED, which makes page refcount inconsistent. This should be rare and non-critical, but it might be inconvenient in testing (unpoison doesn't work).
Link: https://lkml.kernel.org/r/20220408135323.1559401-3-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi naoya.horiguchi@nec.com Suggested-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Dan Carpenter dan.carpenter@oracle.com Cc: Yang Shi shy828301@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/memory-failure.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 11ae0dacaae7..55c175f57223 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1379,6 +1379,8 @@ int memory_failure(unsigned long pfn, int flags) pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); res = -EHWPOISON; + if (flags & MF_COUNT_INCREASED) + put_page(p); goto unlock_mutex; }