From: Oscar Salvador osalvador@suse.de
mainline inclusion from mainline-v5.11-rc1 commit 1e8aaedb182d6ddffc894b832e4962629907b3e0 category: bugfix bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI CVE: NA
--------------------------------
madvise_inject_error() uses get_user_pages_fast to translate the address we specified to a page. After [1], we drop the extra reference count for memory_failure() path. That commit says that memory_failure wanted to keep the pin in order to take the page out of circulation.
The truth is that we need to keep the page pinned, otherwise the page might be re-used after the put_page() and we can end up messing with someone else's memory.
E.g:
CPU0 process X CPU1 madvise_inject_error get_user_pages put_page page gets reclaimed process Y allocates the page memory_failure // We mess with process Y memory
madvise() is meant to operate on a self address space, so messing with pages that do not belong to us seems the wrong thing to do. To avoid that, let us keep the page pinned for memory_failure as well.
Pages for DAX mappings will release this extra refcount in memory_failure_dev_pagemap.
[1] ("23e7b5c2e271: mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference") Signed-off-by: Oscar Salvador osalvador@suse.de Suggested-by: Vlastimil Babka vbabka@suse.cz Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Dan Williams dan.j.williams@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/madvise.c Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/madvise.c | 10 +--------- mm/memory-failure.c | 6 ++++++ 2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c index e187cd74e925..a3b8b7ecc930 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -669,15 +669,7 @@ static int madvise_inject_error(int behavior, } else { pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", pfn, start); - /* - * Drop the page reference taken by - * get_user_pages_fast(). In the absence of - * MF_COUNT_INCREASED the memory_failure() routine is - * responsible for pinning the page to prevent it - * from being released back to the page allocator. - */ - put_page(page); - ret = memory_failure(pfn, 0); + ret = memory_failure(pfn, MF_COUNT_INCREASED); }
if (ret) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 1c29e4eac520..dd110d3c82db 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1252,6 +1252,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, int rc = -EBUSY; loff_t start;
+ if (flags & MF_COUNT_INCREASED) + /* + * Drop the extra refcount in case we come from madvise(). + */ + put_page(page); + /* * Prevent the inode from being freed while we are interrogating * the address_space, typically this would be handled by