From: Ard Biesheuvel ardb@kernel.org
mainline inclusion from mainline-v5.11-rc1 commit 97d6786e0669daa5c2f2d07a057f574e849dfd3e category: bugfix bugzilla: 175103 CVE: NA
-----------------------------------------------
As a hardening measure, we currently randomize the placement of physical memory inside the linear region when KASLR is in effect. Since the random offset at which to place the available physical memory inside the linear region is chosen early at boot, it is based on the memblock description of memory, which does not cover hotplug memory. The consequence of this is that the randomization offset may be chosen such that any hotplugged memory located above memblock_end_of_DRAM() that appears later is pushed off the end of the linear region, where it cannot be accessed.
So let's limit this randomization of the linear region to ensure that this can no longer happen, by using the CPU's addressable PA range instead. As it is guaranteed that no hotpluggable memory will appear that falls outside of that range, we can safely put this PA range sized window anywhere in the linear region.
Signed-off-by: Ard Biesheuvel ardb@kernel.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Will Deacon will@kernel.org Cc: Steven Price steven.price@arm.com Cc: Robin Murphy robin.murphy@arm.com Link: https://lore.kernel.org/r/20201014081857.3288-1-ardb@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: wangkefeng 00584194 wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/mm/init.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 554824ce0f286..5449ae2d26bee 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -683,15 +683,18 @@ void __init arm64_memblock_init(void)
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { extern u16 memstart_offset_seed; - u64 range = linear_region_size - - (memblock_end_of_DRAM() - memblock_start_of_DRAM()); + u64 mmfr0 = read_cpuid(ID_AA64MMFR0_EL1); + int parange = cpuid_feature_extract_unsigned_field( + mmfr0, ID_AA64MMFR0_PARANGE_SHIFT); + s64 range = linear_region_size - + BIT(id_aa64mmfr0_parange_to_phys_shift(parange));
/* * If the size of the linear region exceeds, by a sufficient - * margin, the size of the region that the available physical - * memory spans, randomize the linear region as well. + * margin, the size of the region that the physical memory can + * span, randomize the linear region as well. */ - if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { + if (memstart_offset_seed > 0 && range >= (s64)ARM64_MEMSTART_ALIGN) { range /= ARM64_MEMSTART_ALIGN; memstart_addr -= ARM64_MEMSTART_ALIGN * ((range * memstart_offset_seed) >> 16);
From: Tony Luck tony.luck@intel.com
mainline inclusion from mainline-v5.13 commit 171936ddaf97e6f4e1264f4128bb5cf15691339c category: bugfix bugzilla: 175120 CVE: NA
-------------------------------------------------
Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5.
I wrote this patchset to materialize what I think is the current allowable solution mentioned by the previous discussion [1]. I simply borrowed Tony's mutex patch and Aili's return code patch, then I queued another one to find error virtual address in the best effort manner. I know that this is not a perfect solution, but should work for some typical case.
[1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machin...
This patch (of 2):
There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks.
But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing.
Fix by wrapping most of the internal parts of memory_failure() in a mutex.
[akpm@linux-foundation.org: make mf_mutex local to memory_failure()]
Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Naoya Horiguchi naoya.horiguchi@nec.com Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Aili Yao yaoaili@kingsoft.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: David Hildenbrand david@redhat.com Cc: Jue Wang juew@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/memory-failure.c Signed-off-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/memory-failure.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 145262227bb08..54e93c679cec1 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1256,8 +1256,9 @@ int memory_failure(unsigned long pfn, int flags) struct page *hpage; struct page *orig_head; struct dev_pagemap *pgmap; - int res; + int res = 0; unsigned long page_flags; + static DEFINE_MUTEX(mf_mutex);
if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); @@ -1275,12 +1276,17 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; }
- if (PageHuge(p)) - return memory_failure_hugetlb(pfn, flags); + mutex_lock(&mf_mutex); + + if (PageHuge(p)) { + res = memory_failure_hugetlb(pfn, flags); + goto unlock_mutex; + } + if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + goto unlock_mutex; }
orig_head = hpage = compound_head(p); @@ -1300,11 +1306,11 @@ int memory_failure(unsigned long pfn, int flags) if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) { if (is_free_buddy_page(p)) { action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); - return 0; } else { action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); - return -EBUSY; + res = -EBUSY; } + goto unlock_mutex; }
if (PageTransHuge(hpage)) { @@ -1320,7 +1326,8 @@ int memory_failure(unsigned long pfn, int flags) if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); put_hwpoison_page(p); - return -EBUSY; + res = -EBUSY; + goto unlock_mutex; } unlock_page(p); VM_BUG_ON_PAGE(!page_count(p), p); @@ -1342,7 +1349,7 @@ int memory_failure(unsigned long pfn, int flags) action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); else action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED); - return 0; + goto unlock_mutex; }
lock_page(p); @@ -1354,7 +1361,7 @@ int memory_failure(unsigned long pfn, int flags) if (PageCompound(p) && compound_head(p) != orig_head) { action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; }
/* @@ -1377,14 +1384,14 @@ int memory_failure(unsigned long pfn, int flags) num_poisoned_pages_dec(); unlock_page(p); put_hwpoison_page(p); - return 0; + goto unlock_mutex; } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); unlock_page(p); put_hwpoison_page(p); - return 0; + goto unlock_mutex; }
/* @@ -1411,7 +1418,7 @@ int memory_failure(unsigned long pfn, int flags) if (!hwpoison_user_mappings(p, pfn, flags, &hpage)) { action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; }
/* @@ -1420,13 +1427,15 @@ int memory_failure(unsigned long pfn, int flags) if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) { action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; }
identify_page_state: res = identify_page_state(pfn, p, page_flags); -out: +unlock_page: unlock_page(p); +unlock_mutex: + mutex_unlock(&mf_mutex); return res; } EXPORT_SYMBOL_GPL(memory_failure);
From: Aili Yao yaoaili@kingsoft.com
mainline inclusion from mainline-v5.13 commit 47af12bae17f99b5e77f8651cb7f3e1877610acf category: bugfix bugzilla: 175120 CVE: NA
-------------------------------------------------
When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs.
Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop.
This issue can arise for example in the following scenarios:
- Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question.
- If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation.
To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path.
Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com Signed-off-by: Aili Yao yaoaili@kingsoft.com Signed-off-by: Naoya Horiguchi naoya.horiguchi@nec.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Borislav Petkov bp@suse.de Cc: David Hildenbrand david@redhat.com Cc: Jue Wang juew@google.com Cc: Tony Luck tony.luck@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 54e93c679cec1..77f29dd9e5c9d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1094,7 +1094,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; }
num_poisoned_pages_inc(); @@ -1286,6 +1286,7 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; goto unlock_mutex; }