From: Oscar Salvador osalvador@suse.de
mainline inclusion from mainline-v5.11-rc1 commit 17e395b60f5b3dea204fcae60c7b38e84a00d87a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5E2IG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Patch series "HWpoison: further fixes and cleanups", v5.
This patchset includes some more fixes and a cleanup.
Patch#2 and patch#3 are both fixes for taking a HWpoison page off a buddy freelist, since having them there has proved to be bad (see [1] and pathch#2's commit log). Patch#3 does the same for hugetlb pages.
[1] https://lkml.org/lkml/2020/9/22/565
This patch (of 4):
A page with 0-refcount and !PageBuddy could perfectly be a pcppage. Currently, we bail out with an error if we encounter such a page, meaning that we do not handle pcppages neither from hard-offline nor from soft-offline path.
Fix this by draining pcplists whenever we find this kind of page and retry the check again. It might be that pcplists have been spilled into the buddy allocator and so we can handle it.
Link: https://lkml.kernel.org/r/20201013144447.6706-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20201013144447.6706-2-osalvador@suse.de Signed-off-by: Oscar Salvador osalvador@suse.de Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/memory-failure.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 63bacfcca122..de36b4aa4b6f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -947,13 +947,13 @@ static int page_action(struct page_state *ps, struct page *p, }
/** - * get_hwpoison_page() - Get refcount for memory error handling: + * __get_hwpoison_page() - Get refcount for memory error handling: * @page: raw error page (hit by memory error) * * Return: return 0 if failed to grab the refcount, otherwise true (some * non-zero value.) */ -static int get_hwpoison_page(struct page *page) +static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page);
@@ -983,6 +983,26 @@ static int get_hwpoison_page(struct page *page) return 0; }
+static int get_hwpoison_page(struct page *p) +{ + int ret; + bool drained = false; + +retry: + ret = __get_hwpoison_page(p); + if (!ret && !is_free_buddy_page(p) && !page_count(p) && !drained) { + /* + * The page might be in a pcplist, so try to drain those + * and see if we are lucky. + */ + drain_all_pages(page_zone(p)); + drained = true; + goto retry; + } + + return ret; +} + /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty.
From: Oscar Salvador osalvador@suse.de
mainline inclusion from mainline-v5.11-rc1 commit a8b2c2ce89d4e01062de69b89cafad97cd0fc01b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5E2IG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
The crux of the matter is that historically we left poisoned pages in the buddy system because we have some checks in place when allocating a page that are gatekeeper for poisoned pages. Unfortunately, we do have other users (e.g: compaction [1]) that scan buddy freelists and try to get a page from there without checking whether the page is HWPoison.
As I stated already, I think it is fundamentally wrong to keep HWPoison pages within the buddy systems, checks in place or not.
Let us fix this the same way we did for soft_offline [2], taking the page off the buddy freelist so it is completely unreachable.
Note that this is fairly simple to trigger, as we only need to poison free buddy pages (madvise MADV_HWPOISON) and then run some sort of memory stress system.
Just for a matter of reference, I put a dump_page() in compaction_alloc() to trigger for HWPoison patches:
page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db flags: 0xfffffc0800000(hwpoison) raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000 raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: compaction_alloc
CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G E 5.9.0-rc2-mm1-1-default+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x6d/0x8b compaction_alloc+0xb2/0xc0 migrate_pages+0x2a6/0x12a0 compact_zone+0x5eb/0x11c0 proactive_compact_node+0x89/0xf0 kcompactd+0x2d0/0x3a0 kthread+0x118/0x130 ret_from_fork+0x22/0x30
After that, if e.g: a process faults in the page, it will get killed unexpectedly. Fix it by containing the page immediatelly.
Besides that, two more changes can be noticed:
* MF_DELAYED no longer suits as we are fixing the issue by containing the page immediately, so it does no longer rely on the allocation-time checks to stop HWPoison to be handed over. gain unless it is unpoisoned, so we fixed the situation. Because of that, let us use MF_RECOVERED from now on.
* The second block that handles PageBuddy pages is no longer needed: We call shake_page and then check whether the page is Buddy because shake_page calls drain_all_pages, which sends pcp-pages back to the buddy freelists, so we could have a chance to handle free pages. Currently, get_hwpoison_page already calls drain_all_pages, and we call get_hwpoison_page right before coming here, so we should be on the safe side.
[1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u [2] https://patchwork.kernel.org/cover/11792607/
[osalvador@suse.de: take the poisoned subpage off the buddy frelists] Link: https://lkml.kernel.org/r/20201013144447.6706-4-osalvador@suse.de
Link: https://lkml.kernel.org/r/20201013144447.6706-3-osalvador@suse.de Signed-off-by: Oscar Salvador osalvador@suse.de Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/memory-failure.c Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/memory-failure.c | 45 ++++++++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 15 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index de36b4aa4b6f..0519f20d2b57 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -810,7 +810,7 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn) */ static int me_huge_page(struct page *p, unsigned long pfn) { - int res = 0; + int res; struct page *hpage = compound_head(p); struct address_space *mapping;
@@ -821,6 +821,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) if (mapping) { res = truncate_error_page(hpage, pfn, mapping); } else { + res = MF_FAILED; unlock_page(hpage); /* * migration entry prevents later access on error anonymous @@ -829,8 +830,10 @@ static int me_huge_page(struct page *p, unsigned long pfn) */ if (PageAnon(hpage)) put_page(hpage); - dissolve_free_huge_page(p); - res = MF_RECOVERED; + if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { + page_ref_inc(p); + res = MF_RECOVERED; + } lock_page(hpage); }
@@ -1201,9 +1204,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) return 0; } unlock_page(head); - dissolve_free_huge_page(p); - action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED); - return 0; + res = MF_FAILED; + if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { + page_ref_inc(p); + res = MF_RECOVERED; + } + action_result(pfn, MF_MSG_FREE_HUGE, res); + return res == MF_RECOVERED ? 0 : -EBUSY; }
lock_page(head); @@ -1358,6 +1365,7 @@ int memory_failure(unsigned long pfn, int flags) int res = 0; unsigned long page_flags; static DEFINE_MUTEX(mf_mutex); + bool retry = true;
if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); @@ -1377,6 +1385,7 @@ int memory_failure(unsigned long pfn, int flags)
mutex_lock(&mf_mutex);
+try_again: if (PageHuge(p)) { res = memory_failure_hugetlb(pfn, flags); goto unlock_mutex; @@ -1405,7 +1414,21 @@ int memory_failure(unsigned long pfn, int flags) */ if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) { if (is_free_buddy_page(p)) { - action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); + if (take_page_off_buddy(p)) { + page_ref_inc(p); + res = MF_RECOVERED; + } else { + /* We lost the race, try again */ + if (retry) { + ClearPageHWPoison(p); + num_poisoned_pages_dec(); + retry = false; + goto try_again; + } + res = MF_FAILED; + } + action_result(pfn, MF_MSG_BUDDY, res); + res = res == MF_RECOVERED ? 0 : -EBUSY; } else { action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); res = -EBUSY; @@ -1431,14 +1454,6 @@ int memory_failure(unsigned long pfn, int flags) * walked by the page reclaim code, however that's not a big loss. */ shake_page(p, 0); - /* shake_page could have turned it free. */ - if (!PageLRU(p) && is_free_buddy_page(p)) { - if (flags & MF_COUNT_INCREASED) - action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); - else - action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED); - goto unlock_mutex; - }
lock_page(p);
From: Oscar Salvador osalvador@suse.de
mainline inclusion from mainline-v5.11-rc1 commit 32409cba3f66810626c1c15b728c31968d6bfa92 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5E2IG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
memory_failure and soft_offline_path paths now drain pcplists by calling get_hwpoison_page.
memory_failure flags the page as HWPoison before, so that page cannot longer go into a pcplist, and soft_offline_page only flags a page as HWPoison if 1) we took the page off a buddy freelist 2) the page was in-use and we migrated it 3) was a clean pagecache.
Because of that, a page cannot longer be poisoned and be in a pcplist.
Link: https://lkml.kernel.org/r/20201013144447.6706-5-osalvador@suse.de Signed-off-by: Oscar Salvador osalvador@suse.de Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/madvise.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c index 24abc79f8914..a9bcd16b5d95 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -877,7 +877,6 @@ static long madvise_remove(struct vm_area_struct *vma, static int madvise_inject_error(int behavior, unsigned long start, unsigned long end) { - struct zone *zone; unsigned long size;
if (!capable(CAP_SYS_ADMIN)) @@ -915,10 +914,6 @@ static int madvise_inject_error(int behavior, return ret; }
- /* Ensure that all poisoned pages are removed from per-cpu lists */ - for_each_populated_zone(zone) - drain_all_pages(zone); - return 0; } #endif
From: Tianchen Ding dtcccc@linux.alibaba.com
mainline inclusion from mainline-v5.18-rc1 commit 698361bca2d59fd29d46c757163854454df477f1 category: feature bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "provide the flexibility to enable KFENCE", v3.
If CONFIG_CONTIG_ALLOC is not supported, we fallback to try alloc_pages_exact(). Allocating pages in this way has limits about MAX_ORDER (default 11). So we will not support allocating kfence pool after system startup with a large KFENCE_NUM_OBJECTS.
When handling failures in kfence_init_pool_late(), we pair free_pages_exact() to alloc_pages_exact() for compatibility consideration, though it actually does the same as free_contig_range().
This patch (of 2):
If once KFENCE is disabled by: echo 0 > /sys/module/kfence/parameters/sample_interval KFENCE could never be re-enabled until next rebooting.
Allow re-enabling it by writing a positive num to sample_interval.
Link: https://lkml.kernel.org/r/20220307074516.6920-1-dtcccc@linux.alibaba.com Link: https://lkml.kernel.org/r/20220307074516.6920-2-dtcccc@linux.alibaba.com Signed-off-by: Tianchen Ding dtcccc@linux.alibaba.com Reviewed-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index fcc79594020c..0979fa62b58d 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -38,14 +38,17 @@ #define KFENCE_WARN_ON(cond) \ ({ \ const bool __cond = WARN_ON(cond); \ - if (unlikely(__cond)) \ + if (unlikely(__cond)) { \ WRITE_ONCE(kfence_enabled, false); \ + disabled_by_warn = true; \ + } \ __cond; \ })
/* === Data ================================================================= */
static bool kfence_enabled __read_mostly; +static bool disabled_by_warn __read_mostly;
unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */ @@ -55,6 +58,7 @@ EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */ #endif #define MODULE_PARAM_PREFIX "kfence."
+static int kfence_enable_late(void); static int param_set_sample_interval(const char *val, const struct kernel_param *kp) { unsigned long num; @@ -65,10 +69,11 @@ static int param_set_sample_interval(const char *val, const struct kernel_param
if (!num) /* Using 0 to indicate KFENCE is disabled. */ WRITE_ONCE(kfence_enabled, false); - else if (!READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) - return -EINVAL; /* Cannot (re-)enable KFENCE on-the-fly. */
*((unsigned long *)kp->arg) = num; + + if (num && !READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) + return disabled_by_warn ? -EINVAL : kfence_enable_late(); return 0; }
@@ -943,6 +948,16 @@ void __init kfence_init(void) (void *)(__kfence_pool + KFENCE_POOL_SIZE)); }
+static int kfence_enable_late(void) +{ + if (!__kfence_pool) + return -EINVAL; + + WRITE_ONCE(kfence_enabled, true); + queue_delayed_work(system_unbound_wq, &kfence_timer, 0); + return 0; +} + void kfence_shutdown_cache(struct kmem_cache *s) { unsigned long flags;
From: Liu Shixin liushixin2@huawei.com
hulk inclusion category: feature bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7
--------------------------------
Since re-enabling KFENCE is supported, make it compatible with dynamic configuired objects.
Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 108 ++++++++++++++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 39 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index ba5e19b60efc..5d8be34daba5 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -633,6 +633,53 @@ static void rcu_guarded_free(struct rcu_head *h) kfence_guarded_free((void *)meta->addr, meta, false); }
+#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS +static int __ref kfence_dynamic_init(void) +{ + metadata_size = sizeof(struct kfence_metadata) * KFENCE_NR_OBJECTS; + if (system_state < SYSTEM_RUNNING) + kfence_metadata = memblock_alloc(metadata_size, PAGE_SIZE); + else + kfence_metadata = kzalloc(metadata_size, GFP_KERNEL); + if (!kfence_metadata) + return -ENOMEM; + + covered_size = sizeof(atomic_t) * ALLOC_COVERED_SIZE; + if (system_state < SYSTEM_RUNNING) + alloc_covered = memblock_alloc(covered_size, PAGE_SIZE); + else + alloc_covered = kzalloc(covered_size, GFP_KERNEL); + if (!alloc_covered) { + if (system_state < SYSTEM_RUNNING) + memblock_free(__pa(kfence_metadata), metadata_size); + else + kfree(kfence_metadata); + kfence_metadata = NULL; + + return -ENOMEM; + } + + return 0; +} + +static void __ref kfence_dynamic_destroy(void) +{ + if (system_state < SYSTEM_RUNNING) { + memblock_free(__pa(alloc_covered), covered_size); + memblock_free(__pa(kfence_metadata), metadata_size); + } else { + kfree(alloc_covered); + kfree(kfence_metadata); + } + alloc_covered = NULL; + kfence_metadata = NULL; +} + +#else +static int __init kfence_dynamic_init(void) { return 0; } +static void __init kfence_dynamic_destroy(void) { } +#endif + /* * Initialization of the KFENCE pool after its allocation. * Returns 0 on success; otherwise returns the address up to @@ -730,6 +777,7 @@ static bool __init kfence_init_pool_early(void) */ memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool)); __kfence_pool = NULL; + kfence_dynamic_destroy(); return false; }
@@ -750,6 +798,7 @@ static bool kfence_init_pool_late(void) free_pages_exact((void *)addr, free_size); #endif __kfence_pool = NULL; + kfence_dynamic_destroy(); return false; }
@@ -793,9 +842,14 @@ static void *next_object(struct seq_file *seq, void *v, loff_t *pos)
static int show_object(struct seq_file *seq, void *v) { - struct kfence_metadata *meta = &kfence_metadata[(long)v - 1]; + struct kfence_metadata *meta; unsigned long flags;
+ if (!kfence_metadata_valid()) + return 0; + + meta = &kfence_metadata[(long)v - 1]; + raw_spin_lock_irqsave(&meta->lock, flags); kfence_print_object(seq, meta); raw_spin_unlock_irqrestore(&meta->lock, flags); @@ -830,8 +884,7 @@ static int __init kfence_debugfs_init(void) debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops);
/* Variable kfence_metadata may fail to allocate. */ - if (kfence_metadata_valid()) - debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops); + debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops);
return 0; } @@ -892,40 +945,6 @@ static void toggle_allocation_gate(struct work_struct *work) } static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
-#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS -static int __init kfence_dynamic_init(void) -{ - metadata_size = sizeof(struct kfence_metadata) * KFENCE_NR_OBJECTS; - kfence_metadata = memblock_alloc(metadata_size, PAGE_SIZE); - if (!kfence_metadata) { - pr_err("failed to allocate metadata\n"); - return -ENOMEM; - } - - covered_size = sizeof(atomic_t) * ALLOC_COVERED_SIZE; - alloc_covered = memblock_alloc(covered_size, PAGE_SIZE); - if (!alloc_covered) { - memblock_free(__pa(kfence_metadata), metadata_size); - kfence_metadata = NULL; - pr_err("failed to allocate covered\n"); - return -ENOMEM; - } - - return 0; -} - -static void __init kfence_dynamic_destroy(void) -{ - memblock_free(__pa(alloc_covered), covered_size); - alloc_covered = NULL; - memblock_free(__pa(kfence_metadata), metadata_size); - kfence_metadata = NULL; -} -#else -static int __init kfence_dynamic_init(void) { return 0; } -static void __init kfence_dynamic_destroy(void) { } -#endif - /* === Public interface ===================================================== */ void __init kfence_early_alloc_pool(void) { @@ -991,12 +1010,21 @@ void __init kfence_init(void) static int kfence_init_late(void) { const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE; + #ifdef CONFIG_CONTIG_ALLOC struct page *pages; +#endif + + if (kfence_dynamic_init()) + return -ENOMEM;
+#ifdef CONFIG_CONTIG_ALLOC pages = alloc_contig_pages(nr_pages, GFP_KERNEL, first_online_node, NULL); - if (!pages) + if (!pages) { + kfence_dynamic_destroy(); return -ENOMEM; + } + __kfence_pool = page_to_virt(pages); #else if (nr_pages > MAX_ORDER_NR_PAGES) { @@ -1004,8 +1032,10 @@ static int kfence_init_late(void) return -EINVAL; } __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL); - if (!__kfence_pool) + if (!__kfence_pool) { + kfence_dynamic_destroy(); return -ENOMEM; + } #endif
if (!kfence_init_pool_late()) {
From: Liu Shixin liushixin2@huawei.com
hulk inclusion category: feature bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7
--------------------------------
KFENCE requires linear map to be mapped at page granularity, this must done in very early time. To save memory of page table, arm64 only map the pages in KFENCE pool itself at page granularity. Thus, the kfence pool could not allocated by buddy system.
For the flexibility of KFENCE, scale sample_interval to control whether support to enable kfence after system startup(re-enabling). Once sample_interval is set to -1 on arm64, memory for kfence pool will be allocated from early memory no matter KFENCE is enabled or not.
Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- Documentation/dev-tools/kfence.rst | 11 +++++++ mm/kfence/core.c | 49 ++++++++++++++++++++++++------ 2 files changed, 51 insertions(+), 9 deletions(-)
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst index 5d194615aed0..2e26d2998722 100644 --- a/Documentation/dev-tools/kfence.rst +++ b/Documentation/dev-tools/kfence.rst @@ -61,6 +61,17 @@ The total memory dedicated to the KFENCE memory pool can be computed as:: Using the default config, and assuming a page size of 4 KiB, results in dedicating 2 MiB to the KFENCE memory pool.
+KFENCE allow re-enabling after system startup, but ifndef CONFIG_CONTIG_ALLOC +and KFENCE_NUM_OBJECTS exceeds MAX_ORDER, alloc KFENCE pool after system startup +is not supported. + +For arm64, re-enabling KFENCE is kind of conflict with map the ages in KFENCE +pool itself at page granularity. For the flexibility, scale sample_interval to +control whether arm64 supported to enable kfence after system startup. +Once this is set to -1 in boot parameter, kfence_pool will be allocated from +early memory no matter kfence is enabled or not. Otherwise, re-enabling is not +supported on arm64. + Note: On architectures that support huge pages, KFENCE will ensure that the pool is using pages of size ``PAGE_SIZE``. This will result in additional page tables being allocated. diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 5d8be34daba5..9f40323953a7 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -49,6 +49,7 @@
static bool kfence_enabled __read_mostly; static bool disabled_by_warn __read_mostly; +static bool re_enabling __read_mostly;
unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */ @@ -61,16 +62,24 @@ EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */ static int kfence_enable_late(void); static int param_set_sample_interval(const char *val, const struct kernel_param *kp) { - unsigned long num; - int ret = kstrtoul(val, 0, &num); + long num; + int ret = kstrtol(val, 0, &num);
if (ret < 0) return ret;
+ if (num < -1) + return -ERANGE; + /* + * For architecture that don't require early allocation, always support + * re-enabling. So only need to set num to 0 if num < 0. + */ + num = max_t(long, 0, num); + if (!num) /* Using 0 to indicate KFENCE is disabled. */ WRITE_ONCE(kfence_enabled, false);
- *((unsigned long *)kp->arg) = num; + *((unsigned long *)kp->arg) = (unsigned long)num;
if (num && !READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) return disabled_by_warn ? -EINVAL : kfence_enable_late(); @@ -94,11 +103,22 @@ module_param_cb(sample_interval, &sample_interval_param_ops, &kfence_sample_inte #ifdef CONFIG_ARM64 static int __init parse_sample_interval(char *str) { - unsigned long num; + long num;
- if (kstrtoul(str, 0, &num) < 0) + if (kstrtol(str, 0, &num) < 0) + return 0; + + if (num < -1) return 0; - kfence_sample_interval = num; + + /* Using -1 to indicate re-enabling is supported */ + if (num == -1) { + re_enabling = true; + pr_err("re-enabling is supported\n"); + } + num = max_t(long, 0, num); + + kfence_sample_interval = (unsigned long)num; return 0; } early_param("kfence.sample_interval", parse_sample_interval); @@ -948,7 +968,7 @@ static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate); /* === Public interface ===================================================== */ void __init kfence_early_alloc_pool(void) { - if (!kfence_sample_interval) + if (!kfence_sample_interval && !re_enabling) return;
__kfence_pool = memblock_alloc_raw(KFENCE_POOL_SIZE, PAGE_SIZE); @@ -961,7 +981,7 @@ void __init kfence_early_alloc_pool(void)
void __init kfence_alloc_pool(void) { - if (!kfence_sample_interval) + if (!kfence_sample_interval && !__kfence_pool) return;
if (kfence_dynamic_init()) { @@ -996,7 +1016,7 @@ void __init kfence_init(void) stack_hash_seed = (u32)random_get_entropy();
/* Setting kfence_sample_interval to 0 on boot disables KFENCE. */ - if (!kfence_sample_interval) + if (!kfence_sample_interval && !__kfence_pool) return;
if (!kfence_init_pool_early()) { @@ -1005,6 +1025,9 @@ void __init kfence_init(void) }
kfence_init_enable(); + + if (!kfence_sample_interval) + WRITE_ONCE(kfence_enabled, false); }
static int kfence_init_late(void) @@ -1015,6 +1038,14 @@ static int kfence_init_late(void) struct page *pages; #endif
+ /* + * For kfence re_enabling on ARM64, kfence_pool should be allocated + * at startup instead of here. So just return -EINVAL here which means + * re_enabling is not supported. + */ + if (IS_ENABLED(CONFIG_ARM64)) + return -EINVAL; + if (kfence_dynamic_init()) return -ENOMEM;
From: Peng Liu liupeng256@huawei.com
mainline inclusion from mainline-v5.18-rc1 commit adf505457032c11b79b5a7c277c62ff5d61b17c2 category: bugfix bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "kunit: fix a UAF bug and do some optimization", v2.
This series is to fix UAF (use after free) when running kfence test case test_gfpzero, which is time costly. This UAF bug can be easily triggered by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some optimization for kunit tests has been done.
This patch (of 3):
Kunit will create a new thread to run an actual test case, and the main process will wait for the completion of the actual test thread until overtime. The variable "struct kunit test" has local property in function kunit_try_catch_run, and will be used in the test case thread. Task kunit_try_catch_run will free "struct kunit test" when kunit runs overtime, but the actual test case is still run and an UAF bug will be triggered.
The above problem has been both observed in a physical machine and qemu platform when running kfence kunit tests. The problem can be triggered when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the test case test_gfpzero will cost hours and kunit will run to overtime. The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace: kunit_log_append+0x58/0xd0 ... test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test] test_gfpzero.cold+0x61/0x8ab [kfence_test] kunit_try_run_case+0x4c/0x70 kunit_generic_run_threadfn_adapter+0x11/0x20 kthread+0x166/0x190 ret_from_fork+0x22/0x30 Kernel panic - not syncing: Fatal exception Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the kunit frame runs overtime. The stop signal will send in function kunit_try_catch_run, and test_gfpzero will handle it.
Link: https://lkml.kernel.org/r/20220309083753.1561921-1-liupeng256@huawei.com Link: https://lkml.kernel.org/r/20220309083753.1561921-2-liupeng256@huawei.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Marco Elver elver@google.com Reviewed-by: Brendan Higgins brendanhiggins@google.com Tested-by: Brendan Higgins brendanhiggins@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Wang Kefeng wangkefeng.wang@huawei.com Cc: Daniel Latypov dlatypov@google.com Cc: David Gow davidgow@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/kfence/kfence_test.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- lib/kunit/try-catch.c | 1 + mm/kfence/kfence_test.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index 0dd434e40487..8e3fdbdef9a1 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -78,6 +78,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) if (time_remaining == 0) { kunit_err(test, "try timed out\n"); try_catch->try_result = -ETIMEDOUT; + kthread_stop(task_struct); }
exit_code = try_catch->try_result; diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index c9952fc8d596..4c9a9a47c8e3 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -621,7 +621,7 @@ static void test_gfpzero(struct kunit *test) break; test_free(buf2);
- if (i == KFENCE_NR_OBJECTS) { + if (kthread_should_stop() || (i == KFENCE_NR_OBJECTS)) { kunit_warn(test, "giving up ... cannot get same object back\n"); return; }
From: Peng Liu liupeng256@huawei.com
mainline inclusion from mainline-v5.18-rc1 commit 3cb1c9620eeeb67c614c0732a35861b0b1efdc53 category: bugfix bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When CONFIG_KFENCE_NUM_OBJECTS is set to a big number, kfence kunit-test-case test_gfpzero will eat up nearly all the CPU's resources and rcu_stall is reported as the following log which is cut from a physical server.
rcu: INFO: rcu_sched self-detected stall on CPU rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002 softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019) Task dump for CPU 68: task:kunit_try_catch state:R running task stack: 0 pid: 9728 ppid: 2 flags:0x0000020a Call trace: dump_backtrace+0x0/0x1e4 show_stack+0x20/0x2c sched_show_task+0x148/0x170 ... rcu_sched_clock_irq+0x70/0x180 update_process_times+0x68/0xb0 tick_sched_handle+0x38/0x74 ... gic_handle_irq+0x78/0x2c0 el1_irq+0xb8/0x140 kfree+0xd8/0x53c test_alloc+0x264/0x310 [kfence_test] test_gfpzero+0xf4/0x840 [kfence_test] kunit_try_run_case+0x48/0x20c kunit_generic_run_threadfn_adapter+0x28/0x34 kthread+0x108/0x13c ret_from_fork+0x10/0x18
To avoid rcu_stall and unacceptable latency, a schedule point is added to test_gfpzero.
Link: https://lkml.kernel.org/r/20220309083753.1561921-4-liupeng256@huawei.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Marco Elver elver@google.com Tested-by: Brendan Higgins brendanhiggins@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Wang Kefeng wangkefeng.wang@huawei.com Cc: Daniel Latypov dlatypov@google.com Cc: David Gow davidgow@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/kfence_test.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 4c9a9a47c8e3..0acbc7365412 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -625,6 +625,7 @@ static void test_gfpzero(struct kunit *test) kunit_warn(test, "giving up ... cannot get same object back\n"); return; } + cond_resched(); }
for (i = 0; i < size; i++)
From: huangshaobo huangshaobo6@huawei.com
mainline inclusion from mainline-v5.19-rc1 commit 3c81b3bb0a33e2b555edb8d7eb99a7ae4f17d8bb category: bugfix bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Out-of-bounds accesses that aren't caught by a guard page will result in corruption of canary memory. In pathological cases, where an object has certain alignment requirements, an out-of-bounds access might never be caught by the guard page. Such corruptions, however, are only detected on kfree() normally. If the bug causes the kernel to panic before kfree(), KFENCE has no opportunity to report the issue. Such corruptions may also indicate failing memory or other faults.
To provide some more information in such cases, add the option to check canary bytes on panic. This might help narrow the search for the panic cause; but, due to only having the allocation stack trace, such reports are difficult to use to diagnose an issue alone. In most cases, such reports are inactionable, and is therefore an opt-in feature (disabled by default).
[akpm@linux-foundation.org: add __read_mostly, per Marco] Link: https://lkml.kernel.org/r/20220425022456.44300-1-huangshaobo6@huawei.com Signed-off-by: huangshaobo huangshaobo6@huawei.com Suggested-by: chenzefeng chenzefeng2@huawei.com Reviewed-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Xiaoming Ni nixiaoming@huawei.com Cc: Wangbing wangbing6@huawei.com Cc: Jubin Zhong zhongjubin@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/kfence/core.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 9f40323953a7..7af797305fdd 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -14,6 +14,7 @@ #include <linux/irq_work.h> #include <linux/jhash.h> #include <linux/kcsan-checks.h> +#include <linux/kernel.h> #include <linux/kfence.h> #include <linux/kmemleak.h> #include <linux/list.h> @@ -21,6 +22,7 @@ #include <linux/log2.h> #include <linux/memblock.h> #include <linux/moduleparam.h> +#include <linux/notifier.h> #include <linux/random.h> #include <linux/rcupdate.h> #include <linux/sched/clock.h> @@ -128,6 +130,10 @@ early_param("kfence.sample_interval", parse_sample_interval); static unsigned long kfence_skip_covered_thresh __read_mostly = 75; module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644);
+/* If true, check all canary bytes on panic. */ +static bool kfence_check_on_panic __read_mostly; +module_param_named(check_on_panic, kfence_check_on_panic, bool, 0444); + /* The pool of pages used for guard pages and objects. */ char *__kfence_pool __read_mostly; EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ @@ -911,6 +917,31 @@ static int __init kfence_debugfs_init(void)
late_initcall(kfence_debugfs_init);
+/* === Panic Notifier ====================================================== */ + +static void kfence_check_all_canary(void) +{ + int i; + + for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + struct kfence_metadata *meta = &kfence_metadata[i]; + + if (meta->state == KFENCE_OBJECT_ALLOCATED) + for_each_canary(meta, check_canary_byte); + } +} + +static int kfence_check_canary_callback(struct notifier_block *nb, + unsigned long reason, void *arg) +{ + kfence_check_all_canary(); + return NOTIFY_OK; +} + +static struct notifier_block kfence_check_canary_notifier = { + .notifier_call = kfence_check_canary_callback, +}; + /* === Allocation Gate Timer ================================================ */
#ifdef CONFIG_KFENCE_STATIC_KEYS @@ -1004,6 +1035,10 @@ static void kfence_init_enable(void) { if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS)) static_branch_enable(&kfence_allocation_key); + + if (kfence_check_on_panic) + atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier); + WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); pr_info("initialized - using %lu bytes for %lu objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
From: Jackie Liu liuyun01@kylinos.cn
mainline inclusion from mainline-v5.19-rc1 commit 83d7d04f9d2ef354858b2a8444aee38e41ec1699 category: bugfix bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
By printing information, we can friendly prompt the status change information of kfence by dmesg and record by syslog.
Also, set kfence_enabled to false only when needed.
Link: https://lkml.kernel.org/r/20220518073105.3160335-1-liu.yun@linux.dev Signed-off-by: Jackie Liu liuyun01@kylinos.cn Co-developed-by: Marco Elver elver@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Marco Elver elver@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/kfence/core.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 7af797305fdd..d330b6c930c8 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -78,8 +78,11 @@ static int param_set_sample_interval(const char *val, const struct kernel_param */ num = max_t(long, 0, num);
- if (!num) /* Using 0 to indicate KFENCE is disabled. */ + /* Using 0 to indicate KFENCE is disabled. */ + if (!num && READ_ONCE(kfence_enabled)) { + pr_info("disabled\n"); WRITE_ONCE(kfence_enabled, false); + }
*((unsigned long *)kp->arg) = (unsigned long)num;
@@ -1120,6 +1123,7 @@ static int kfence_enable_late(void)
WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); + pr_info("re-enabled\n"); return 0; }
From: Muchun Song songmuchun@bytedance.com
mainline inclusion from mainline-v5.18-rc1 commit 8f0b36497303487d5a32c75789c77859cc2ee895 category: bugfix bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the kfence object is allocated to be used for objects vector, then this slot of the pool eventually being occupied permanently since the vector is never freed. The solutions could be (1) freeing vector when the kfence object is freed or (2) allocating all vectors statically.
Since the memory consumption of object vectors is low, it is better to chose (2) to fix the issue and it is also can reduce overhead of vectors allocating in the future.
Link: https://lkml.kernel.org/r/20220328132843.16624-1-songmuchun@bytedance.com Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Marco Elver elver@google.com Reviewed-by: Roman Gushchin roman.gushchin@linux.dev Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Xiongchun Duan duanxiongchun@bytedance.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/kfence/core.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 7 +++++++ mm/kfence/kfence.h | 3 +++ 2 files changed, 10 insertions(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index d330b6c930c8..213bfac21a64 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -742,6 +742,10 @@ static unsigned long kfence_init_pool(void) return addr;
__SetPageSlab(&pages[i]); +#ifdef CONFIG_MEMCG + pages[i].memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg | + MEMCG_DATA_OBJCGS; +#endif }
/* @@ -1281,6 +1285,9 @@ void __kfence_free(void *addr) { struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
+#ifdef CONFIG_MEMCG + KFENCE_WARN_ON(meta->objcg); +#endif /* * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing * the object, as the object page may be recycled for other-typed diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index e5f8f8577911..867e7982adb5 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -89,6 +89,9 @@ struct kfence_metadata { struct kfence_track free_track; /* For updating alloc_covered on frees. */ u32 alloc_stack_hash; +#ifdef CONFIG_MEMCG + struct obj_cgroup *objcg; +#endif };
#ifdef CONFIG_KFENCE_DYNAMIC_OBJECTS
From: Hyeonggon Yoo 42.hyeyoo@gmail.com
mainline inclusion from mainline-v5.18-rc7 commit 2839b0999c20c9f6bf353849c69370e121e2fa1a category: bugfix bugzilla: 187071, https://gitee.com/openeuler/kernel/issues/I5DLA7 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When kfence fails to initialize kfence pool, it frees the pool. But it does not reset memcg_data and PG_slab flag.
Below is a BUG because of this. Let's fix it by resetting memcg_data and PG_slab flag before free.
[ 0.089149] BUG: Bad page state in process swapper/0 pfn:3d8e06 [ 0.089149] page:ffffea46cf638180 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3d8e06 [ 0.089150] memcg:ffffffff94a475d1 [ 0.089150] flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff) [ 0.089151] raw: 0017ffffc0000200 ffffea46cf638188 ffffea46cf638188 0000000000000000 [ 0.089152] raw: 0000000000000000 0000000000000000 00000000ffffffff ffffffff94a475d1 [ 0.089152] page dumped because: page still charged to cgroup [ 0.089153] Modules linked in: [ 0.089153] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B W 5.18.0-rc1+ #965 [ 0.089154] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 0.089154] Call Trace: [ 0.089155] <TASK> [ 0.089155] dump_stack_lvl+0x49/0x5f [ 0.089157] dump_stack+0x10/0x12 [ 0.089158] bad_page.cold+0x63/0x94 [ 0.089159] check_free_page_bad+0x66/0x70 [ 0.089160] __free_pages_ok+0x423/0x530 [ 0.089161] __free_pages_core+0x8e/0xa0 [ 0.089162] memblock_free_pages+0x10/0x12 [ 0.089164] memblock_free_late+0x8f/0xb9 [ 0.089165] kfence_init+0x68/0x92 [ 0.089166] start_kernel+0x789/0x992 [ 0.089167] x86_64_start_reservations+0x24/0x26 [ 0.089168] x86_64_start_kernel+0xa9/0xaf [ 0.089170] secondary_startup_64_no_verify+0xd5/0xdb [ 0.089171] </TASK>
Link: https://lkml.kernel.org/r/YnPG3pQrqfcgOlVa@hyeyoo Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Fixes: 8f0b36497303 ("mm: kfence: fix objcgs vector allocation") Signed-off-by: Hyeonggon Yoo 42.hyeyoo@gmail.com Reviewed-by: Marco Elver elver@google.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/kfence/core.c Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/kfence/core.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 213bfac21a64..f67418a30282 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -792,6 +792,7 @@ static unsigned long kfence_init_pool(void) static bool __init kfence_init_pool_early(void) { unsigned long addr; + char *p;
if (!__kfence_pool) return false; @@ -808,6 +809,16 @@ static bool __init kfence_init_pool_early(void) * fails for the first page, and therefore expect addr==__kfence_pool in * most failure cases. */ + for (p = (char *)addr; p < __kfence_pool + KFENCE_POOL_SIZE; p += PAGE_SIZE) { + struct page *page = virt_to_page(p); + + if (!page) + continue; +#ifdef CONFIG_MEMCG + page->memcg_data = 0; +#endif + __ClearPageSlab(page); + } memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool)); __kfence_pool = NULL; kfence_dynamic_destroy();
From: luhuaxin luhuaxin1@huawei.com
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETJZ CVE: NA
--------
openeuler openssl now supports SM certificate. The type of key should be set to EVP_PKEY_SM2 before using.
Signed-off-by: luhuaxin luhuaxin1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- scripts/sign-file.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
diff --git a/scripts/sign-file.c b/scripts/sign-file.c index fbd34b8e8f57..acc9e5f2eb04 100644 --- a/scripts/sign-file.c +++ b/scripts/sign-file.c @@ -206,6 +206,28 @@ static X509 *read_x509(const char *x509_name) return x509; }
+#if defined(EVP_PKEY_SM2) +static int pkey_is_sm2(EVP_PKEY *pkey) +{ + EC_KEY *eckey = NULL; + + const EC_GROUP *group = NULL; + + if (pkey == NULL || EVP_PKEY_id(pkey) != EVP_PKEY_EC) + return 0; + + eckey = EVP_PKEY_get0_EC_KEY(pkey); + if (eckey == NULL) + return 0; + + group = EC_KEY_get0_group(eckey); + if (group == NULL) + return 0; + + return EC_GROUP_get_curve_name(group) == NID_sm2; +} +#endif + int main(int argc, char **argv) { struct module_signature sig_info = { .id_type = PKEY_ID_PKCS7 }; @@ -220,6 +242,10 @@ int main(int argc, char **argv) unsigned int use_signed_attrs; const EVP_MD *digest_algo; EVP_PKEY *private_key; +#if defined(EVP_PKEY_SM2) + EVP_PKEY *public_key; +#endif + #ifndef USE_PKCS7 CMS_ContentInfo *cms = NULL; unsigned int use_keyid = 0; @@ -303,6 +329,16 @@ int main(int argc, char **argv) digest_algo = EVP_get_digestbyname(hash_algo); ERR(!digest_algo, "EVP_get_digestbyname");
+#if defined(EVP_PKEY_SM2) + if (pkey_is_sm2(private_key)) + EVP_PKEY_set_alias_type(private_key, EVP_PKEY_SM2); + + public_key = X509_get0_pubkey(x509); + ERR(!public_key, "X509_get0_pubkey"); + if (pkey_is_sm2(public_key)) + EVP_PKEY_set_alias_type(public_key, EVP_PKEY_SM2); +#endif + #ifndef USE_PKCS7 /* Load the signature message from the digest buffer. */ cms = CMS_sign(NULL, NULL, NULL, NULL,
From: Li Huafei lihuafei1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
When klp_mem_prepare() fails, the requested resources are not cleared. We'd better clean up each newly requested resource upon error return.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/livepatch/core.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 87ed93df7a98..4d79543d9155 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -1408,33 +1408,34 @@ static void func_node_free(struct klp_func *func) } }
-static int klp_mem_prepare(struct klp_patch *patch) +static void klp_mem_recycle(struct klp_patch *patch) { struct klp_object *obj; struct klp_func *func;
klp_for_each_object(patch, obj) { klp_for_each_func(obj, func) { - func->func_node = func_node_alloc(func); - if (func->func_node == NULL) { - pr_err("alloc func_node failed\n"); - return -ENOMEM; - } + func_node_free(func); } } - return 0; }
-static void klp_mem_recycle(struct klp_patch *patch) +static int klp_mem_prepare(struct klp_patch *patch) { struct klp_object *obj; struct klp_func *func;
klp_for_each_object(patch, obj) { klp_for_each_func(obj, func) { - func_node_free(func); + func->func_node = func_node_alloc(func); + if (func->func_node == NULL) { + klp_mem_recycle(patch); + pr_err("alloc func_node failed\n"); + return -ENOMEM; + } } } + return 0; }
static int __klp_disable_patch(struct klp_patch *patch) @@ -1697,8 +1698,11 @@ static int __klp_enable_patch(struct klp_patch *patch)
arch_klp_code_modify_prepare(); ret = klp_mem_prepare(patch); - if (ret == 0) - ret = stop_machine(klp_try_enable_patch, &patch_data, cpu_online_mask); + if (ret) { + arch_klp_code_modify_post_process(); + return ret; + } + ret = stop_machine(klp_try_enable_patch, &patch_data, cpu_online_mask); arch_klp_code_modify_post_process(); if (ret) { klp_mem_recycle(patch);
From: Li Huafei lihuafei1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Currently, arch_klp_code_modify_{prepare, post_process} is implemented only in the x86 architecture. It is used to hold the 'text_mutex' lock before entering the stop_machine and modifying the code, and to release the lock after exiting the stop_machine. klp_mem_prepare() needs to hold the 'text_mutex' lock only when saving old instruction code on x86 to ensure that it holds valid instructions.
Place klp_mem_prepare() before arch_klp_code_modify_prepare() and lock the save instruction action separately to narrow the 'text_mutex' lock.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/kernel/livepatch.c | 11 +++++++++-- kernel/livepatch/core.c | 6 ++---- 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index fe34183826d3..0a2ba5c8ba4e 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -386,8 +386,15 @@ void arch_klp_code_modify_post_process(void)
long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func) { - return copy_from_kernel_nofault(arch_data->old_code, - old_func, JMP_E9_INSN_SIZE); + long ret; + + /* Prevent text modification */ + mutex_lock(&text_mutex); + ret = copy_from_kernel_nofault(arch_data->old_code, + old_func, JMP_E9_INSN_SIZE); + mutex_unlock(&text_mutex); + + return ret; }
int arch_klp_patch_func(struct klp_func *func) diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 4d79543d9155..957f16f6c6c4 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -1696,12 +1696,10 @@ static int __klp_enable_patch(struct klp_patch *patch) } #endif
- arch_klp_code_modify_prepare(); ret = klp_mem_prepare(patch); - if (ret) { - arch_klp_code_modify_post_process(); + if (ret) return ret; - } + arch_klp_code_modify_prepare(); ret = stop_machine(klp_try_enable_patch, &patch_data, cpu_online_mask); arch_klp_code_modify_post_process(); if (ret) {
From: Li Huafei lihuafei1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Delete the duplicate code of klp_compare_address() in each arch.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm/kernel/livepatch.c | 10 ---------- arch/arm64/kernel/livepatch.c | 10 ---------- arch/powerpc/kernel/livepatch_32.c | 10 ---------- arch/powerpc/kernel/livepatch_64.c | 10 ---------- arch/x86/kernel/livepatch.c | 11 ----------- include/linux/livepatch.h | 12 ++++++++++++ 6 files changed, 12 insertions(+), 51 deletions(-)
diff --git a/arch/arm/kernel/livepatch.c b/arch/arm/kernel/livepatch.c index da88113d14e9..3162de4aec70 100644 --- a/arch/arm/kernel/livepatch.c +++ b/arch/arm/kernel/livepatch.c @@ -86,16 +86,6 @@ static inline unsigned long klp_size_to_check(unsigned long func_size, return size; }
-static inline int klp_compare_address(unsigned long pc, unsigned long func_addr, - const char *func_name, unsigned long check_size) -{ - if (pc >= func_addr && pc < func_addr + check_size) { - pr_err("func %s is in use!\n", func_name); - return -EBUSY; - } - return 0; -} - static bool check_jump_insn(unsigned long func_addr) { unsigned long i; diff --git a/arch/arm64/kernel/livepatch.c b/arch/arm64/kernel/livepatch.c index 4bc35725af36..d629ad409721 100644 --- a/arch/arm64/kernel/livepatch.c +++ b/arch/arm64/kernel/livepatch.c @@ -80,16 +80,6 @@ static inline unsigned long klp_size_to_check(unsigned long func_size, return size; }
-static inline int klp_compare_address(unsigned long pc, unsigned long func_addr, - const char *func_name, unsigned long check_size) -{ - if (pc >= func_addr && pc < func_addr + check_size) { - pr_err("func %s is in use!\n", func_name); - return -EBUSY; - } - return 0; -} - static bool check_jump_insn(unsigned long func_addr) { unsigned long i; diff --git a/arch/powerpc/kernel/livepatch_32.c b/arch/powerpc/kernel/livepatch_32.c index a3cf41af073e..4ce4bd07eaaf 100644 --- a/arch/powerpc/kernel/livepatch_32.c +++ b/arch/powerpc/kernel/livepatch_32.c @@ -83,16 +83,6 @@ static inline unsigned long klp_size_to_check(unsigned long func_size, return size; }
-static inline int klp_compare_address(unsigned long pc, unsigned long func_addr, - const char *func_name, unsigned long check_size) -{ - if (pc >= func_addr && pc < func_addr + check_size) { - pr_err("func %s is in use!\n", func_name); - return -EBUSY; - } - return 0; -} - static bool check_jump_insn(unsigned long func_addr) { unsigned long i; diff --git a/arch/powerpc/kernel/livepatch_64.c b/arch/powerpc/kernel/livepatch_64.c index 0098ad48f918..acc6d94a5c91 100644 --- a/arch/powerpc/kernel/livepatch_64.c +++ b/arch/powerpc/kernel/livepatch_64.c @@ -89,16 +89,6 @@ static inline unsigned long klp_size_to_check(unsigned long func_size, return size; }
-static inline int klp_compare_address(unsigned long pc, unsigned long func_addr, - const char *func_name, unsigned long check_size) -{ - if (pc >= func_addr && pc < func_addr + check_size) { - pr_err("func %s is in use!\n", func_name); - return -EBUSY; - } - return 0; -} - static bool check_jump_insn(unsigned long func_addr) { unsigned long i; diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index 0a2ba5c8ba4e..824b538d2861 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -66,17 +66,6 @@ static inline unsigned long klp_size_to_check(unsigned long func_size, return size; }
-static inline int klp_compare_address(unsigned long stack_addr, - unsigned long func_addr, const char *func_name, - unsigned long check_size) -{ - if (stack_addr >= func_addr && stack_addr < func_addr + check_size) { - pr_err("func %s is in use!\n", func_name); - return -EBUSY; - } - return 0; -} - static bool check_jump_insn(unsigned long func_addr) { int len = JMP_E9_INSN_SIZE; diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 5f88e6429484..c12781f7397b 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -234,6 +234,18 @@ struct klp_func_node { struct klp_func_node *klp_find_func_node(const void *old_func); void klp_add_func_node(struct klp_func_node *func_node); void klp_del_func_node(struct klp_func_node *func_node); + +static inline +int klp_compare_address(unsigned long pc, unsigned long func_addr, + const char *func_name, unsigned long check_size) +{ + if (pc >= func_addr && pc < func_addr + check_size) { + pr_err("func %s is in use!\n", func_name); + return -EBUSY; + } + return 0; +} + #endif
int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
From: Li Huafei lihuafei1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
klp_find_func_node() is used to traverse the klp_func_list linked list. Currently, klp_find_func_node() is used only when the klp_mutex lock is held. In the subsequent submission, we need to access the klp_func_list linked list in the exception handling process and cannot hold the klp_mutex lock.
We change the traversal of klp_func_list to use the rcu interface and perform rcu synchronization when deleting nodes.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/livepatch/core.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 957f16f6c6c4..47d8661ee5e4 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -1273,11 +1273,16 @@ int __weak klp_check_calltrace(struct klp_patch *patch, int enable)
static LIST_HEAD(klp_func_list);
+/* + * The caller must ensure that the klp_mutex lock is held or is in the rcu read + * critical area. + */ struct klp_func_node *klp_find_func_node(const void *old_func) { struct klp_func_node *func_node;
- list_for_each_entry(func_node, &klp_func_list, node) { + list_for_each_entry_rcu(func_node, &klp_func_list, node, + lockdep_is_held(&klp_mutex)) { if (func_node->old_func == old_func) return func_node; } @@ -1403,6 +1408,7 @@ static void func_node_free(struct klp_func *func) func->func_node = NULL; if (list_empty(&func_node->func_stack)) { klp_del_func_node(func_node); + synchronize_rcu(); arch_klp_mem_free(func_node); } }
From: Li Huafei lihuafei1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
The commit 86e35fae15bb ("livepatch: checks only if the replaced instruction is on the stack") optimizes stack checking. However, for extremely hot functions, the replaced instruction may still be on the stack, and there is room for further optimization.
By inserting a breakpoint exception instruction at the entry of the patched old function, we can divert calls from the old function to the new function. In this way, during stack check, only tasks that have entered the old function before the breakpoint is inserted need to be considered. This increases the probability of passing the stack check.
If the stack check fails, we sleep for a period of time and try again, giving the task entering the old function a chance to run out of the instruction replacement area.
We first enable the patch using the normal process, that is, do not insert breakpoints. If the first enable fails and the force flag KLP_STACK_OPTIMIZE is set for all functions of the patch, then we use breakpoint exception optimization.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm/kernel/livepatch.c | 2 +- arch/arm64/kernel/livepatch.c | 2 +- arch/powerpc/kernel/livepatch_32.c | 2 +- arch/powerpc/kernel/livepatch_64.c | 2 +- arch/x86/kernel/livepatch.c | 2 +- include/linux/livepatch.h | 14 +- kernel/livepatch/core.c | 275 +++++++++++++++++++++++++++-- kernel/livepatch/core.h | 14 ++ kernel/livepatch/patch.c | 26 ++- kernel/livepatch/patch.h | 4 + 10 files changed, 322 insertions(+), 21 deletions(-)
diff --git a/arch/arm/kernel/livepatch.c b/arch/arm/kernel/livepatch.c index 3162de4aec70..338222846b81 100644 --- a/arch/arm/kernel/livepatch.c +++ b/arch/arm/kernel/livepatch.c @@ -137,7 +137,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { if (enable) { - if (func->force == KLP_ENFORCEMENT) + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* * When enable, checking the currently diff --git a/arch/arm64/kernel/livepatch.c b/arch/arm64/kernel/livepatch.c index d629ad409721..3b1b4db58d52 100644 --- a/arch/arm64/kernel/livepatch.c +++ b/arch/arm64/kernel/livepatch.c @@ -131,7 +131,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { if (enable) { - if (func->force == KLP_ENFORCEMENT) + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* * When enable, checking the currently diff --git a/arch/powerpc/kernel/livepatch_32.c b/arch/powerpc/kernel/livepatch_32.c index 4ce4bd07eaaf..8478d496a991 100644 --- a/arch/powerpc/kernel/livepatch_32.c +++ b/arch/powerpc/kernel/livepatch_32.c @@ -134,7 +134,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { if (enable) { - if (func->force == KLP_ENFORCEMENT) + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* * When enable, checking the currently diff --git a/arch/powerpc/kernel/livepatch_64.c b/arch/powerpc/kernel/livepatch_64.c index acc6d94a5c91..b313917242ee 100644 --- a/arch/powerpc/kernel/livepatch_64.c +++ b/arch/powerpc/kernel/livepatch_64.c @@ -143,7 +143,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable,
/* Check func address in stack */ if (enable) { - if (func->force == KLP_ENFORCEMENT) + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* * When enable, checking the currently diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index 824b538d2861..5763876457d1 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -126,7 +126,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable,
/* Check func address in stack */ if (enable) { - if (func->force == KLP_ENFORCEMENT) + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* * When enable, checking the currently diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index c12781f7397b..602e944dfc9e 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -229,19 +229,29 @@ struct klp_func_node { struct list_head func_stack; void *old_func; struct arch_klp_data arch_data; + /* + * Used in breakpoint exception handling functions. + * If 'brk_func' is NULL, no breakpoint is inserted into the entry of + * the old function. + * If it is not NULL, the value is the new function that will jump to + * when the breakpoint exception is triggered. + */ + void *brk_func; };
struct klp_func_node *klp_find_func_node(const void *old_func); void klp_add_func_node(struct klp_func_node *func_node); void klp_del_func_node(struct klp_func_node *func_node); +void *klp_get_brk_func(void *addr);
static inline int klp_compare_address(unsigned long pc, unsigned long func_addr, const char *func_name, unsigned long check_size) { if (pc >= func_addr && pc < func_addr + check_size) { - pr_err("func %s is in use!\n", func_name); - return -EBUSY; + pr_warn("func %s is in use!\n", func_name); + /* Return -EAGAIN for next retry */ + return -EAGAIN; } return 0; } diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 47d8661ee5e4..a682a8638e01 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -31,6 +31,7 @@ #include "state.h" #include "transition.h" #elif defined(CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY) +#include <linux/delay.h> #include <linux/stop_machine.h> #endif
@@ -57,6 +58,7 @@ static struct kobject *klp_root_kobj; struct patch_data { struct klp_patch *patch; atomic_t cpu_count; + bool rollback; }; #endif
@@ -1300,6 +1302,37 @@ void klp_del_func_node(struct klp_func_node *func_node) list_del_rcu(&func_node->node); }
+/* + * Called from the breakpoint exception handler function. + */ +void *klp_get_brk_func(void *addr) +{ + struct klp_func_node *func_node; + void *brk_func = NULL; + + if (!addr) + return NULL; + + rcu_read_lock(); + + func_node = klp_find_func_node(addr); + if (!func_node) + goto unlock; + + /* + * Corresponds to smp_wmb() in {add, remove}_breakpoint(). If the + * current breakpoint exception belongs to us, we have observed the + * breakpoint instruction, so brk_func must be observed. + */ + smp_rmb(); + + brk_func = func_node->brk_func; + +unlock: + rcu_read_unlock(); + return brk_func; +} + /* * This function is called from stop_machine() context. */ @@ -1370,6 +1403,25 @@ long __weak arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_fu return -ENOSYS; }
+int __weak arch_klp_check_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + return 0; +} + +int __weak arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + return -ENOTSUPP; +} + +void __weak arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ +} + +void __weak arch_klp_set_brk_func(struct klp_func_node *func_node, void *new_func) +{ + func_node->brk_func = new_func; +} + static struct klp_func_node *func_node_alloc(struct klp_func *func) { long ret; @@ -1444,6 +1496,110 @@ static int klp_mem_prepare(struct klp_patch *patch) return 0; }
+static void remove_breakpoint(struct klp_func *func, bool restore) +{ + + struct klp_func_node *func_node = klp_find_func_node(func->old_func); + struct arch_klp_data *arch_data = &func_node->arch_data; + + if (!func_node->brk_func) + return; + + if (restore) + arch_klp_remove_breakpoint(arch_data, func->old_func); + + /* Wait for all breakpoint exception handler functions to exit. */ + synchronize_rcu(); + + /* 'brk_func' cannot be set to NULL before the breakpoint is removed. */ + smp_wmb(); + + arch_klp_set_brk_func(func_node, NULL); +} + +static void __klp_breakpoint_post_process(struct klp_patch *patch, bool restore) +{ + struct klp_object *obj; + struct klp_func *func; + + klp_for_each_object(patch, obj) { + klp_for_each_func(obj, func) { + remove_breakpoint(func, restore); + } + } +} + +static int add_breakpoint(struct klp_func *func) +{ + struct klp_func_node *func_node = klp_find_func_node(func->old_func); + struct arch_klp_data *arch_data = &func_node->arch_data; + int ret; + + if (WARN_ON_ONCE(func_node->brk_func)) + return -EINVAL; + + ret = arch_klp_check_breakpoint(arch_data, func->old_func); + if (ret) + return ret; + + arch_klp_set_brk_func(func_node, func->new_func); + + /* + * When entering an exception, we must see 'brk_func' or the kernel + * will not be able to handle the breakpoint exception we are about + * to insert. + */ + smp_wmb(); + + ret = arch_klp_add_breakpoint(arch_data, func->old_func); + if (ret) + arch_klp_set_brk_func(func_node, NULL); + + return ret; +} + +static int klp_add_breakpoint(struct klp_patch *patch) +{ + struct klp_object *obj; + struct klp_func *func; + int ret; + + /* + * Ensure that the module is not uninstalled before the breakpoint is + * removed. After the breakpoint is removed, it can be ensured that the + * new function will not be jumped through the handler function of the + * breakpoint. + */ + if (!try_module_get(patch->mod)) + return -ENODEV; + + arch_klp_code_modify_prepare(); + + klp_for_each_object(patch, obj) { + klp_for_each_func(obj, func) { + ret = add_breakpoint(func); + if (ret) { + __klp_breakpoint_post_process(patch, true); + arch_klp_code_modify_post_process(); + module_put(patch->mod); + return ret; + } + } + } + + arch_klp_code_modify_post_process(); + + return 0; +} + +static void klp_breakpoint_post_process(struct klp_patch *patch, bool restore) +{ + arch_klp_code_modify_prepare(); + __klp_breakpoint_post_process(patch, restore); + arch_klp_code_modify_post_process(); + module_put(patch->mod); +} + static int __klp_disable_patch(struct klp_patch *patch) { int ret; @@ -1614,7 +1770,7 @@ EXPORT_SYMBOL_GPL(klp_enable_patch); /* * This function is called from stop_machine() context. */ -static int enable_patch(struct klp_patch *patch) +static int enable_patch(struct klp_patch *patch, bool rollback) { struct klp_object *obj; int ret; @@ -1622,19 +1778,21 @@ static int enable_patch(struct klp_patch *patch) pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n"); add_taint(TAINT_LIVEPATCH, LOCKDEP_STILL_OK);
- if (!try_module_get(patch->mod)) - return -ENODEV; + if (!patch->enabled) { + if (!try_module_get(patch->mod)) + return -ENODEV;
- patch->enabled = true; + patch->enabled = true;
- pr_notice("enabling patch '%s'\n", patch->mod->name); + pr_notice("enabling patch '%s'\n", patch->mod->name); + }
klp_for_each_object(patch, obj) { if (!klp_is_object_loaded(obj)) continue;
- ret = klp_patch_object(obj); - if (ret) { + ret = klp_patch_object(obj, rollback); + if (ret && klp_need_rollback(ret, rollback)) { pr_warn("failed to patch object '%s'\n", klp_is_module(obj) ? obj->name : "vmlinux"); goto disable; @@ -1666,7 +1824,7 @@ int klp_try_enable_patch(void *data) atomic_inc(&pd->cpu_count); return ret; } - ret = enable_patch(patch); + ret = enable_patch(patch, pd->rollback); if (ret) { atomic_inc(&pd->cpu_count); return ret; @@ -1682,12 +1840,89 @@ int klp_try_enable_patch(void *data) return ret; }
+/* + * When the stop_machine is used to enable the patch, if the patch fails to be + * enabled because the stack check fails, a certain number of retries are + * allowed. The maximum number of retries is KLP_RETRY_COUNT. + * + * Sleeps for KLP_RETRY_INTERVAL milliseconds before each retry to give tasks + * that fail the stack check a chance to run out of the instruction replacement + * area. + */ +#define KLP_RETRY_COUNT 5 +#define KLP_RETRY_INTERVAL 100 + +static bool klp_use_breakpoint(struct klp_patch *patch) +{ + struct klp_object *obj; + struct klp_func *func; + + klp_for_each_object(patch, obj) { + klp_for_each_func(obj, func) { + if (func->force != KLP_STACK_OPTIMIZE) + return false; + } + } + + return true; +} + +static int klp_breakpoint_optimize(struct klp_patch *patch) +{ + int ret; + int i; + int cnt = 0; + + ret = klp_add_breakpoint(patch); + if (ret) { + pr_err("failed to add breakpoints, ret=%d\n", ret); + return ret; + } + + for (i = 0; i < KLP_RETRY_COUNT; i++) { + struct patch_data patch_data = { + .patch = patch, + .cpu_count = ATOMIC_INIT(0), + .rollback = false, + }; + + if (i == KLP_RETRY_COUNT - 1) + patch_data.rollback = true; + + cnt++; + + arch_klp_code_modify_prepare(); + ret = stop_machine(klp_try_enable_patch, &patch_data, + cpu_online_mask); + arch_klp_code_modify_post_process(); + if (!ret || ret != -EAGAIN) + break; + + pr_notice("try again in %d ms.\n", KLP_RETRY_INTERVAL); + + msleep(KLP_RETRY_INTERVAL); + } + pr_notice("patching %s, tried %d times, ret=%d.\n", + ret ? "failed" : "success", cnt, ret); + + /* + * If the patch is enabled successfully, the breakpoint instruction + * has been replaced with the jump instruction. However, if the patch + * fails to be enabled, we need to delete the previously inserted + * breakpoint to restore the instruction at the old function entry. + */ + klp_breakpoint_post_process(patch, !!ret); + + return ret; +} + static int __klp_enable_patch(struct klp_patch *patch) { int ret; struct patch_data patch_data = { .patch = patch, .cpu_count = ATOMIC_INIT(0), + .rollback = true, };
if (WARN_ON(patch->enabled)) @@ -1705,14 +1940,26 @@ static int __klp_enable_patch(struct klp_patch *patch) ret = klp_mem_prepare(patch); if (ret) return ret; + arch_klp_code_modify_prepare(); - ret = stop_machine(klp_try_enable_patch, &patch_data, cpu_online_mask); + ret = stop_machine(klp_try_enable_patch, &patch_data, + cpu_online_mask); arch_klp_code_modify_post_process(); - if (ret) { - klp_mem_recycle(patch); - return ret; + if (!ret) + goto move_patch_to_tail; + if (ret != -EAGAIN) + goto err_out; + + if (!klp_use_breakpoint(patch)) { + pr_debug("breakpoint exception optimization is not used.\n"); + goto err_out; }
+ ret = klp_breakpoint_optimize(patch); + if (ret) + goto err_out; + +move_patch_to_tail: #ifndef CONFIG_LIVEPATCH_STACK /* move the enabled patch to the list tail */ list_del(&patch->list); @@ -1720,6 +1967,10 @@ static int __klp_enable_patch(struct klp_patch *patch) #endif
return 0; + +err_out: + klp_mem_recycle(patch); + return ret; }
/** diff --git a/kernel/livepatch/core.h b/kernel/livepatch/core.h index 9bcd139eb7d6..911b6452e5be 100644 --- a/kernel/livepatch/core.h +++ b/kernel/livepatch/core.h @@ -57,4 +57,18 @@ static inline void klp_post_unpatch_callback(struct klp_object *obj) obj->callbacks.post_unpatch_enabled = false; } #endif /* CONFIG_LIVEPATCH_PER_TASK_CONSISTENCY */ + +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +/* + * In the enable_patch() process, we do not need to roll back the patch + * immediately if the patch fails to enabled. In this way, the function that has + * been successfully patched does not need to be enabled repeatedly during + * retry. However, if it is the last retry (rollback == true) or not because of + * stack check failure (patch_err != -EAGAIN), rollback is required immediately. + */ +static inline bool klp_need_rollback(int patch_err, bool rollback) +{ + return patch_err != -EAGAIN || rollback; +} +#endif /* CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY */ #endif /* _LIVEPATCH_CORE_H */ diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c index 6515b8e99829..bea6c5d0af94 100644 --- a/kernel/livepatch/patch.c +++ b/kernel/livepatch/patch.c @@ -269,10 +269,10 @@ static inline int klp_patch_func(struct klp_func *func) { int ret = 0;
+ if (func->patched) + return 0; if (WARN_ON(!func->old_func)) return -EINVAL; - if (WARN_ON(func->patched)) - return -EINVAL; if (WARN_ON(!func->func_node)) return -EINVAL;
@@ -306,6 +306,27 @@ void klp_unpatch_object(struct klp_object *obj) __klp_unpatch_object(obj, false); }
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +int klp_patch_object(struct klp_object *obj, bool rollback) +{ + struct klp_func *func; + int ret; + + if (obj->patched) + return 0; + + klp_for_each_func(obj, func) { + ret = klp_patch_func(func); + if (ret && klp_need_rollback(ret, rollback)) { + klp_unpatch_object(obj); + return ret; + } + } + obj->patched = true; + + return 0; +} +#else int klp_patch_object(struct klp_object *obj) { struct klp_func *func; @@ -325,6 +346,7 @@ int klp_patch_object(struct klp_object *obj)
return 0; } +#endif
static void __klp_unpatch_objects(struct klp_patch *patch, bool nops_only) { diff --git a/kernel/livepatch/patch.h b/kernel/livepatch/patch.h index c9cde47f7e97..9566681660e4 100644 --- a/kernel/livepatch/patch.h +++ b/kernel/livepatch/patch.h @@ -29,7 +29,11 @@ struct klp_ops {
struct klp_ops *klp_find_ops(void *old_func);
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +int klp_patch_object(struct klp_object *obj, bool rollback); +#else int klp_patch_object(struct klp_object *obj); +#endif void klp_unpatch_object(struct klp_object *obj); void klp_unpatch_objects(struct klp_patch *patch); void klp_unpatch_objects_dynamic(struct klp_patch *patch);
From: Li Huafei lihuafei1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Implement the arch_klp_{check, add, remove}_breakpoint interface to support breakpoint exception optimization.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/include/asm/livepatch.h | 13 +++++ arch/x86/kernel/livepatch.c | 83 +++++++++++++++++++++++++++++++- arch/x86/kernel/traps.c | 10 ++++ 3 files changed, 104 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/livepatch.h b/arch/x86/include/asm/livepatch.h index e23c2da3c323..e0faa20d525d 100644 --- a/arch/x86/include/asm/livepatch.h +++ b/arch/x86/include/asm/livepatch.h @@ -37,9 +37,22 @@ int klp_check_calltrace(struct klp_patch *patch, int enable); #define JMP_E9_INSN_SIZE 5 struct arch_klp_data { unsigned char old_code[JMP_E9_INSN_SIZE]; +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY + /* + * Saved opcode at the entry of the old func (which maybe replaced + * with breakpoint). + */ + unsigned char saved_opcode; +#endif };
long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func); +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +int arch_klp_check_breakpoint(struct arch_klp_data *arch_data, void *old_func); +int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); +int klp_int3_handler(struct pt_regs *regs); +#endif
#endif
diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index 5763876457d1..b026efdfe346 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -31,6 +31,10 @@ #include <asm/nops.h> #include <asm/sections.h>
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +#include <linux/kprobes.h> +#endif + #ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY /* * The instruction set on x86 is CISC. @@ -352,6 +356,74 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) free_list(&check_funcs); return ret; } + +int arch_klp_check_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + int ret; + unsigned char opcode; + + ret = copy_from_kernel_nofault(&opcode, old_func, INT3_INSN_SIZE); + if (ret) + return ret; + + /* Another subsystem puts a breakpoint, reject patching at this time */ + if (opcode == INT3_INSN_OPCODE) + return -EBUSY; + + return 0; +} + +int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + unsigned char int3 = INT3_INSN_OPCODE; + int ret; + + ret = copy_from_kernel_nofault(&arch_data->saved_opcode, old_func, + INT3_INSN_SIZE); + if (ret) + return ret; + + text_poke(old_func, &int3, INT3_INSN_SIZE); + /* arch_klp_code_modify_post_process() will do text_poke_sync() */ + + return 0; +} + +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + unsigned char opcode; + int ret; + + ret = copy_from_kernel_nofault(&opcode, old_func, INT3_INSN_SIZE); + if (ret) { + pr_warn("%s: failed to read opcode, ret=%d\n", __func__, ret); + return; + } + + /* instruction have been recovered at arch_klp_unpatch_func() */ + if (opcode != INT3_INSN_OPCODE) + return; + + text_poke(old_func, &arch_data->saved_opcode, INT3_INSN_SIZE); + /* arch_klp_code_modify_post_process() will do text_poke_sync() */ +} + +int klp_int3_handler(struct pt_regs *regs) +{ + unsigned long addr = regs->ip - INT3_INSN_SIZE; + void *brk_func; + + if (user_mode(regs)) + return 0; + + brk_func = klp_get_brk_func((void *)addr); + if (!brk_func) + return 0; + + int3_emulate_jmp(regs, (unsigned long)brk_func); + return 1; +} +NOKPROBE_SYMBOL(klp_int3_handler); #endif
#ifdef CONFIG_LIVEPATCH_WO_FTRACE @@ -390,15 +462,22 @@ int arch_klp_patch_func(struct klp_func *func) { struct klp_func_node *func_node; unsigned long ip, new_addr; - void *new; + unsigned char *new;
func_node = func->func_node; ip = (unsigned long)func->old_func; list_add_rcu(&func->stack_node, &func_node->func_stack); new_addr = (unsigned long)func->new_func; /* replace the text with the new text */ - new = klp_jmp_code(ip, new_addr); + new = (unsigned char *)klp_jmp_code(ip, new_addr); +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY + /* update jmp offset */ + text_poke((void *)(ip + 1), new + 1, JMP_E9_INSN_SIZE - 1); + /* update jmp opcode */ + text_poke((void *)ip, new, 1); +#else text_poke((void *)ip, new, JMP_E9_INSN_SIZE); +#endif
return 0; } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 303970bba0f8..696ec85164e6 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -62,6 +62,10 @@ #include <asm/insn-eval.h> #include <asm/vdso.h>
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +#include <asm/livepatch.h> +#endif + #ifdef CONFIG_X86_64 #include <asm/x86_init.h> #include <asm/proto.h> @@ -654,6 +658,12 @@ static bool do_int3(struct pt_regs *regs) if (kprobe_int3_handler(regs)) return true; #endif + +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY + if (klp_int3_handler(regs)) + return true; +#endif + res = notify_die(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP);
return res == NOTIFY_STOP;
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
For the ARM architecture, need to register the callback function for processing BRK exception in advance. Therefore, the architecture-related init interface needs to be provided.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/livepatch.h | 2 ++ kernel/livepatch/core.c | 7 +++++++ 2 files changed, 9 insertions(+)
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 602e944dfc9e..c9f3c1c12638 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -256,6 +256,8 @@ int klp_compare_address(unsigned long pc, unsigned long func_addr, return 0; }
+void arch_klp_init(void); + #endif
int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs, diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index a682a8638e01..ae116aac9b48 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -1403,6 +1403,10 @@ long __weak arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_fu return -ENOSYS; }
+void __weak arch_klp_init(void) +{ +} + int __weak arch_klp_check_breakpoint(struct arch_klp_data *arch_data, void *old_func) { return 0; @@ -2293,6 +2297,9 @@ static int __init klp_init(void) if (!klp_root_kobj) goto error_remove;
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY + arch_klp_init(); +#endif return 0;
error_remove:
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add breakpoint exception optimization support to improve livepatch success rate for arm64.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/include/asm/brk-imm.h | 1 + arch/arm64/include/asm/debug-monitors.h | 2 ++ arch/arm64/include/asm/livepatch.h | 8 +++++ arch/arm64/kernel/livepatch.c | 41 +++++++++++++++++++++++++ 4 files changed, 52 insertions(+)
diff --git a/arch/arm64/include/asm/brk-imm.h b/arch/arm64/include/asm/brk-imm.h index ec7720dbe2c8..1ac8bc293ea2 100644 --- a/arch/arm64/include/asm/brk-imm.h +++ b/arch/arm64/include/asm/brk-imm.h @@ -21,6 +21,7 @@ #define KPROBES_BRK_IMM 0x004 #define UPROBES_BRK_IMM 0x005 #define KPROBES_BRK_SS_IMM 0x006 +#define KLP_BRK_IMM 0x007 #define FAULT_BRK_IMM 0x100 #define KGDB_DYN_DBG_BRK_IMM 0x400 #define KGDB_COMPILED_DBG_BRK_IMM 0x401 diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h index 657c921fd784..bc015465ecd2 100644 --- a/arch/arm64/include/asm/debug-monitors.h +++ b/arch/arm64/include/asm/debug-monitors.h @@ -56,6 +56,8 @@ #define BRK64_OPCODE_KPROBES_SS (AARCH64_BREAK_MON | (KPROBES_BRK_SS_IMM << 5)) /* uprobes BRK opcodes with ESR encoding */ #define BRK64_OPCODE_UPROBES (AARCH64_BREAK_MON | (UPROBES_BRK_IMM << 5)) +/* klp BRK opcodes with ESR encoding */ +#define BRK64_OPCODE_KLP (AARCH64_BREAK_MON | (KLP_BRK_IMM << 5))
/* AArch32 */ #define DBG_ESR_EVT_BKPT 0x4 diff --git a/arch/arm64/include/asm/livepatch.h b/arch/arm64/include/asm/livepatch.h index 7b9ea5dcea4d..b87dc35c2b3f 100644 --- a/arch/arm64/include/asm/livepatch.h +++ b/arch/arm64/include/asm/livepatch.h @@ -58,8 +58,16 @@ int klp_check_calltrace(struct klp_patch *patch, int enable);
struct arch_klp_data { u32 old_insns[LJMP_INSN_SIZE]; + + /* + * Saved opcode at the entry of the old func (which maybe replaced + * with breakpoint). + */ + u32 saved_opcode; };
+int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func);
#endif diff --git a/arch/arm64/kernel/livepatch.c b/arch/arm64/kernel/livepatch.c index 3b1b4db58d52..508a43ce18ca 100644 --- a/arch/arm64/kernel/livepatch.c +++ b/arch/arm64/kernel/livepatch.c @@ -30,6 +30,7 @@ #include <asm/insn.h> #include <asm-generic/sections.h> #include <asm/ptrace.h> +#include <asm/debug-monitors.h> #include <linux/ftrace.h> #include <linux/sched/debug.h> #include <linux/kallsyms.h> @@ -315,6 +316,46 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) free_list(&check_funcs); return ret; } + +int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + u32 insn = BRK64_OPCODE_KLP; + u32 *addr = (u32 *)old_func; + + arch_data->saved_opcode = le32_to_cpu(*addr); + aarch64_insn_patch_text(&old_func, &insn, 1); + return 0; +} + +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + aarch64_insn_patch_text(&old_func, &arch_data->saved_opcode, 1); +} + +static int klp_breakpoint_handler(struct pt_regs *regs, unsigned int esr) +{ + void *brk_func = NULL; + unsigned long addr = instruction_pointer(regs); + + brk_func = klp_get_brk_func((void *)addr); + if (!brk_func) { + pr_warn("Unrecoverable livepatch detected.\n"); + BUG(); + } + + instruction_pointer_set(regs, (unsigned long)brk_func); + return 0; +} + +static struct break_hook klp_break_hook = { + .imm = KLP_BRK_IMM, + .fn = klp_breakpoint_handler, +}; + +void arch_klp_init(void) +{ + register_kernel_break_hook(&klp_break_hook); +} #endif
long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func)
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add breakpoint exception optimization support to improve livepatch success rate for arm.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm/include/asm/livepatch.h | 10 +++++++ arch/arm/kernel/livepatch.c | 45 ++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+)
diff --git a/arch/arm/include/asm/livepatch.h b/arch/arm/include/asm/livepatch.h index befa1efbbcd1..70ce2dba7134 100644 --- a/arch/arm/include/asm/livepatch.h +++ b/arch/arm/include/asm/livepatch.h @@ -23,6 +23,8 @@
#include <linux/module.h>
+#define KLP_ARM_BREAKPOINT_INSTRUCTION 0xe7f001f9 + struct klp_patch; struct klp_func;
@@ -47,8 +49,16 @@ int klp_check_calltrace(struct klp_patch *patch, int enable);
struct arch_klp_data { u32 old_insns[LJMP_INSN_SIZE]; + + /* + * Saved opcode at the entry of the old func (which maybe replaced + * with breakpoint). + */ + u32 saved_opcode; };
+int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func);
#endif diff --git a/arch/arm/kernel/livepatch.c b/arch/arm/kernel/livepatch.c index 338222846b81..26241749cf08 100644 --- a/arch/arm/kernel/livepatch.c +++ b/arch/arm/kernel/livepatch.c @@ -28,6 +28,8 @@ #include <asm/stacktrace.h> #include <asm/cacheflush.h> #include <linux/slab.h> +#include <linux/ptrace.h> +#include <asm/traps.h> #include <asm/insn.h> #include <asm/patch.h>
@@ -317,6 +319,49 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) free_list(&check_funcs); return ret; } + +int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + u32 *addr = (u32 *)old_func; + + arch_data->saved_opcode = le32_to_cpu(*addr); + patch_text(old_func, KLP_ARM_BREAKPOINT_INSTRUCTION); + return 0; +} + +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + patch_text(old_func, arch_data->saved_opcode); +} + +static int klp_trap_handler(struct pt_regs *regs, unsigned int instr) +{ + void *brk_func = NULL; + unsigned long addr = regs->ARM_pc; + + brk_func = klp_get_brk_func((void *)addr); + if (!brk_func) { + pr_warn("Unrecoverable livepatch detected.\n"); + BUG(); + } + + regs->ARM_pc = (unsigned long)brk_func; + return 0; +} + +static struct undef_hook klp_arm_break_hook = { + .instr_mask = 0x0fffffff, + .instr_val = (KLP_ARM_BREAKPOINT_INSTRUCTION & 0x0fffffff), + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, + .fn = klp_trap_handler, +}; + +void arch_klp_init(void) +{ + register_undef_hook(&klp_arm_break_hook); +} + #endif
static inline bool offset_in_range(unsigned long pc, unsigned long addr,
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
A trampoline needs to be created before adding a breakpoint for PPC64. Change livepatch_create_btamp to a public function and delete redundant input parameter "struct module *me".
Fix an issue where the branch stub of livepatch is not created if address of the modified function is a branch function.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/include/asm/livepatch.h | 4 ++++ arch/powerpc/kernel/module_64.c | 26 ++++++++++++-------------- 2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index fea12c6b915c..9ddddc35d21e 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -75,6 +75,10 @@ extern void livepatch_branch_stub_end(void); #ifdef PPC64_ELF_ABI_v1 extern void livepatch_branch_trampoline(void); extern void livepatch_branch_trampoline_end(void); +void livepatch_create_btramp(struct ppc64_klp_btramp_entry *entry, unsigned long addr); +#else +static inline void livepatch_create_btramp(struct ppc64_klp_btramp_entry *entry, + unsigned long addr) {} #endif /* PPC64_ELF_ABI_v1 */
int livepatch_create_branch(unsigned long pc, diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 7a143ab7d433..ef093691f606 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -835,16 +835,15 @@ static int livepatch_create_bstub(struct ppc64_klp_bstub_entry *entry, return 0; }
- if (entry->magic != BRANCH_STUB_MAGIC) { - stub_start = ppc_function_entry((void *)livepatch_branch_stub); - stub_end = ppc_function_entry((void *)livepatch_branch_stub_end); - stub_size = stub_end - stub_start; - memcpy(entry->jump, (u32 *)stub_start, stub_size); - - entry->jump[0] |= PPC_HA(reladdr); - entry->jump[1] |= PPC_LO(reladdr); - entry->magic = BRANCH_STUB_MAGIC; - } + + stub_start = ppc_function_entry((void *)livepatch_branch_stub); + stub_end = ppc_function_entry((void *)livepatch_branch_stub_end); + stub_size = stub_end - stub_start; + memcpy(entry->jump, (u32 *)stub_start, stub_size); + + entry->jump[0] |= PPC_HA(reladdr); + entry->jump[1] |= PPC_LO(reladdr); + entry->magic = BRANCH_STUB_MAGIC; entry->trampoline = addr;
pr_debug("Create livepatch branch stub 0x%px with reladdr 0x%lx r2 0x%lx to trampoline 0x%lx\n", @@ -854,9 +853,8 @@ static int livepatch_create_bstub(struct ppc64_klp_bstub_entry *entry, }
#ifdef PPC64_ELF_ABI_v1 -static void livepatch_create_btramp(struct ppc64_klp_btramp_entry *entry, - unsigned long addr, - struct module *me) +void livepatch_create_btramp(struct ppc64_klp_btramp_entry *entry, + unsigned long addr) { unsigned long reladdr, tramp_start, tramp_end, tramp_size;
@@ -894,7 +892,7 @@ int livepatch_create_branch(unsigned long pc, { #ifdef PPC64_ELF_ABI_v1 /* Create trampoline to addr(new func) */ - livepatch_create_btramp((struct ppc64_klp_btramp_entry *)trampoline, addr, me); + livepatch_create_btramp((struct ppc64_klp_btramp_entry *)trampoline, addr); #else trampoline = addr; #endif
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add breakpoint exception optimization support to improve livepatch success rate for ppc64/ppc32.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/include/asm/livepatch.h | 20 +++++++++ arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kernel/entry_64.S | 36 +++++++++++++++ arch/powerpc/kernel/livepatch.c | 67 ++++++++++++++++++++++++++++ arch/powerpc/kernel/traps.c | 8 ++++ 5 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/livepatch.c
diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index 9ddddc35d21e..052a82078bef 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -75,6 +75,7 @@ extern void livepatch_branch_stub_end(void); #ifdef PPC64_ELF_ABI_v1 extern void livepatch_branch_trampoline(void); extern void livepatch_branch_trampoline_end(void); +extern void livepatch_brk_trampoline(void); void livepatch_create_btramp(struct ppc64_klp_btramp_entry *entry, unsigned long addr); #else static inline void livepatch_create_btramp(struct ppc64_klp_btramp_entry *entry, @@ -93,6 +94,12 @@ struct arch_klp_data { #else unsigned long trampoline; #endif /* PPC64_ELF_ABI_v1 */ + + /* + * Saved opcode at the entry of the old func (which maybe replaced + * with breakpoint). + */ + u32 saved_opcode; };
#elif defined(CONFIG_PPC32) @@ -101,10 +108,23 @@ struct arch_klp_data { #define LJMP_INSN_SIZE 4 struct arch_klp_data { u32 old_insns[LJMP_INSN_SIZE]; + + /* + * Saved opcode at the entry of the old func (which maybe replaced + * with breakpoint). + */ + u32 saved_opcode; };
#endif /* CONFIG_PPC64 */
+#ifdef PPC64_ELF_ABI_v1 +struct klp_func_node; +void arch_klp_set_brk_func(struct klp_func_node *func_node, void *new_func); +#endif +int klp_brk_handler(struct pt_regs *regs); +int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func);
#endif /* CONFIG_LIVEPATCH_FTRACE */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 4b6720e81632..32c617ba6901 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -95,7 +95,7 @@ obj-$(CONFIG_44x) += cpu_setup_44x.o obj-$(CONFIG_PPC_FSL_BOOK3E) += cpu_setup_fsl_booke.o obj-$(CONFIG_PPC_DOORBELL) += dbell.o obj-$(CONFIG_JUMP_LABEL) += jump_label.o -obj-$(CONFIG_LIVEPATCH_WO_FTRACE) += livepatch_$(BITS).o +obj-$(CONFIG_LIVEPATCH_WO_FTRACE) += livepatch.o livepatch_$(BITS).o
extra-$(CONFIG_PPC64) := head_64.o extra-$(CONFIG_PPC_BOOK3S_32) := head_book3s_32.o diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 71ff3a4f10a6..ad3281b092be 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -1068,5 +1068,41 @@ _GLOBAL(livepatch_branch_trampoline) blr _GLOBAL(livepatch_branch_trampoline_end) nop + +/* + * This function is the trampoline of livepatch brk handler. + * + * brk -> traps + * - klp_brk_handler + * - set R11 to new_func address + * - set NIP to livepatch_brk_trampoline address + * see arch/powerpc/kernel/livepatch.c + */ +_GLOBAL(livepatch_brk_trampoline) + mflr r0 + std r0, 16(r1) + std r2, 24(r1) + stdu r1, -STACK_FRAME_OVERHEAD(r1) + + /* Call NEW_FUNC */ + ld r12, 0(r11) /* load new func address to R12 */ +#ifdef PPC64_ELF_ABI_v1 + ld r2, 8(r11) /* set up new R2 */ +#endif + mtctr r12 /* load R12(new func address) to CTR */ + bctrl /* call new func */ + + /* + * Now we are returning from the patched function to the original + * caller A. We are free to use r11, r12 and we can use r2 until we + * restore it. + */ + addi r1, r1, STACK_FRAME_OVERHEAD + ld r2, 24(r1) + ld r0, 16(r1) + mtlr r0 + + /* Return to original caller of live patched function */ + blr #endif #endif /* CONFIG_LIVEPATCH_WO_FTRACE */ diff --git a/arch/powerpc/kernel/livepatch.c b/arch/powerpc/kernel/livepatch.c new file mode 100644 index 000000000000..b8afcc7b9939 --- /dev/null +++ b/arch/powerpc/kernel/livepatch.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * livepatch.c - powerpc-specific Kernel Live Patching Core + * + * Copyright (C) 2022 Huawei Technologies Co., Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see http://www.gnu.org/licenses/. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/livepatch.h> +#include <asm/probes.h> +#include <asm/livepatch.h> +#include <asm/code-patching.h> + +int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)old_func); + + arch_data->saved_opcode = ppc_inst_val(insn); + patch_instruction((struct ppc_inst *)old_func, ppc_inst(BREAKPOINT_INSTRUCTION)); + return 0; +} + +void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func) +{ + patch_instruction((struct ppc_inst *)old_func, ppc_inst(arch_data->saved_opcode)); +} + +int klp_brk_handler(struct pt_regs *regs) +{ + void *brk_func = NULL; + unsigned long addr = regs->nip; + + if (user_mode(regs)) + return 0; + + brk_func = klp_get_brk_func((void *)addr); + if (!brk_func) + return 0; + +#ifdef PPC64_ELF_ABI_v1 + /* + * Only static trampoline can be used here to prevent + * resource release caused by rollback. + */ + regs->gpr[PT_R11] = (unsigned long)brk_func; + regs->nip = ppc_function_entry((void *)livepatch_brk_trampoline); +#else + regs->nip = (unsigned long)brk_func; +#endif + + return 1; +} diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 069d451240fa..d2f6b2e30b6a 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -67,6 +67,9 @@ #include <asm/kprobes.h> #include <asm/stacktrace.h> #include <asm/nmi.h> +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +#include <asm/livepatch.h> +#endif
#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE) int (*__debugger)(struct pt_regs *regs) __read_mostly; @@ -1491,6 +1494,11 @@ void program_check_exception(struct pt_regs *regs) if (kprobe_handler(regs)) goto bail;
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY + if (klp_brk_handler(regs)) + goto bail; +#endif + /* trap exception */ if (notify_die(DIE_BPT, "breakpoint", regs, 5, 5, SIGTRAP) == NOTIFY_STOP)
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
The calltrace check code is independent as do_check_calltrace, for calltrace check of klp module. No functional change.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/livepatch.c | 56 +++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/kernel/livepatch.c b/arch/arm64/kernel/livepatch.c index 508a43ce18ca..539d6534a220 100644 --- a/arch/arm64/kernel/livepatch.c +++ b/arch/arm64/kernel/livepatch.c @@ -255,23 +255,11 @@ static void free_list(struct klp_func_list **funcs) } }
-int klp_check_calltrace(struct klp_patch *patch, int enable) +static int do_check_calltrace(struct walk_stackframe_args *args, + bool (*fn)(void *, unsigned long)) { struct task_struct *g, *t; struct stackframe frame; - int ret = 0; - struct klp_func_list *check_funcs = NULL; - struct walk_stackframe_args args = { - .enable = enable, - .ret = 0 - }; - - ret = klp_check_activeness_func(patch, enable, &check_funcs); - if (ret) { - pr_err("collect active functions failed, ret=%d\n", ret); - goto out; - } - args.check_funcs = check_funcs;
for_each_process_thread(g, t) { /* @@ -284,7 +272,7 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) if (t == current) { /* current on this CPU */ frame.fp = (unsigned long)__builtin_frame_address(0); - frame.pc = (unsigned long)klp_check_calltrace; + frame.pc = (unsigned long)do_check_calltrace; } else if (strncmp(t->comm, "migration/", 10) == 0) { /* * current on other CPU @@ -293,25 +281,43 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) * task_comm here, because we can't get the * cpu_curr(task_cpu(t))). This assumes that no * other thread will pretend to be a stopper via - * task_comm. + * task_comm. */ continue; } else { frame.fp = thread_saved_fp(t); frame.pc = thread_saved_pc(t); } - if (check_funcs != NULL) { - start_backtrace(&frame, frame.fp, frame.pc); - walk_stackframe(t, &frame, klp_check_jump_func, &args); - if (args.ret) { - ret = args.ret; - pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); - show_stack(t, NULL, KERN_INFO); - goto out; - } + start_backtrace(&frame, frame.fp, frame.pc); + walk_stackframe(t, &frame, fn, args); + if (args->ret) { + pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); + show_stack(t, NULL, KERN_INFO); + return args->ret; } } + return 0; +} + +int klp_check_calltrace(struct klp_patch *patch, int enable) +{ + int ret = 0; + struct klp_func_list *check_funcs = NULL; + struct walk_stackframe_args args = { + .enable = enable, + .ret = 0 + }; + + ret = klp_check_activeness_func(patch, enable, &check_funcs); + if (ret) { + pr_err("collect active functions failed, ret=%d\n", ret); + goto out; + } + if (!check_funcs) + goto out;
+ args.check_funcs = check_funcs; + ret = do_check_calltrace(&args, klp_check_jump_func); out: free_list(&check_funcs); return ret;
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add arch_klp_module_check_calltrace to check whether stacks of all tasks are within the code segment of module.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/include/asm/livepatch.h | 1 + arch/arm64/kernel/livepatch.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/arch/arm64/include/asm/livepatch.h b/arch/arm64/include/asm/livepatch.h index b87dc35c2b3f..bcb6c4081978 100644 --- a/arch/arm64/include/asm/livepatch.h +++ b/arch/arm64/include/asm/livepatch.h @@ -69,6 +69,7 @@ struct arch_klp_data { int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func); +int arch_klp_module_check_calltrace(void *data);
#endif
diff --git a/arch/arm64/kernel/livepatch.c b/arch/arm64/kernel/livepatch.c index 539d6534a220..cda56066d859 100644 --- a/arch/arm64/kernel/livepatch.c +++ b/arch/arm64/kernel/livepatch.c @@ -68,6 +68,7 @@ struct klp_func_list { struct walk_stackframe_args { int enable; struct klp_func_list *check_funcs; + struct module *mod; int ret; };
@@ -323,6 +324,28 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) return ret; }
+static bool check_module_calltrace(void *data, unsigned long pc) +{ + struct walk_stackframe_args *args = data; + + if (within_module_core(pc, args->mod)) { + pr_err("module %s is in use!\n", args->mod->name); + args->ret = -EBUSY; + return false; + } + return true; +} + +int arch_klp_module_check_calltrace(void *data) +{ + struct walk_stackframe_args args = { + .mod = (struct module *)data, + .ret = 0 + }; + + return do_check_calltrace(&args, check_module_calltrace); +} + int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) { u32 insn = BRK64_OPCODE_KLP;
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
The calltrace check code is independent as do_check_calltrace, for calltrace check of module. No functional change.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm/kernel/livepatch.c | 52 +++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 22 deletions(-)
diff --git a/arch/arm/kernel/livepatch.c b/arch/arm/kernel/livepatch.c index 26241749cf08..8b78e05240a0 100644 --- a/arch/arm/kernel/livepatch.c +++ b/arch/arm/kernel/livepatch.c @@ -264,29 +264,18 @@ static void free_list(struct klp_func_list **funcs) } }
-int klp_check_calltrace(struct klp_patch *patch, int enable) +static int do_check_calltrace(struct walk_stackframe_args *args, + int (*fn)(struct stackframe *, void *)) { struct task_struct *g, *t; struct stackframe frame; - int ret = 0; - struct klp_func_list *check_funcs = NULL; - struct walk_stackframe_args args = { - .ret = 0 - }; - - ret = klp_check_activeness_func(patch, enable, &check_funcs); - if (ret) { - pr_err("collect active functions failed, ret=%d\n", ret); - goto out; - } - args.check_funcs = check_funcs;
for_each_process_thread(g, t) { if (t == current) { frame.fp = (unsigned long)__builtin_frame_address(0); frame.sp = current_stack_pointer; frame.lr = (unsigned long)__builtin_return_address(0); - frame.pc = (unsigned long)klp_check_calltrace; + frame.pc = (unsigned long)do_check_calltrace; } else if (strncmp(t->comm, "migration/", 10) == 0) { /* * current on other CPU @@ -304,16 +293,35 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) frame.lr = 0; /* recovered from the stack */ frame.pc = thread_saved_pc(t); } - if (check_funcs != NULL) { - walk_stackframe(&frame, klp_check_jump_func, &args); - if (args.ret) { - ret = args.ret; - pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); - show_stack(t, NULL, KERN_INFO); - goto out; - } + walk_stackframe(&frame, fn, args); + if (args->ret) { + pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); + show_stack(t, NULL, KERN_INFO); + return args->ret; } } + return 0; +} + +int klp_check_calltrace(struct klp_patch *patch, int enable) +{ + int ret = 0; + struct klp_func_list *check_funcs = NULL; + struct walk_stackframe_args args = { + .enable = enable, + .ret = 0 + }; + + ret = klp_check_activeness_func(patch, enable, &check_funcs); + if (ret) { + pr_err("collect active functions failed, ret=%d\n", ret); + goto out; + } + if (!check_funcs) + goto out; + + args.check_funcs = check_funcs; + ret = do_check_calltrace(&args, klp_check_jump_func);
out: free_list(&check_funcs);
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add arch_klp_module_check_calltrace to check whether stacks of all tasks are within the code segment of module.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm/include/asm/livepatch.h | 1 + arch/arm/kernel/livepatch.c | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/arch/arm/include/asm/livepatch.h b/arch/arm/include/asm/livepatch.h index 70ce2dba7134..47d8b01618c7 100644 --- a/arch/arm/include/asm/livepatch.h +++ b/arch/arm/include/asm/livepatch.h @@ -60,6 +60,7 @@ struct arch_klp_data { int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func); +int arch_klp_module_check_calltrace(void *data);
#endif
diff --git a/arch/arm/kernel/livepatch.c b/arch/arm/kernel/livepatch.c index 8b78e05240a0..713ce67fa6e3 100644 --- a/arch/arm/kernel/livepatch.c +++ b/arch/arm/kernel/livepatch.c @@ -75,6 +75,7 @@ struct klp_func_list { struct walk_stackframe_args { int enable; struct klp_func_list *check_funcs; + struct module *mod; int ret; };
@@ -328,6 +329,27 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) return ret; }
+static int check_module_calltrace(struct stackframe *frame, void *data) +{ + struct walk_stackframe_args *args = data; + + if (within_module_core(frame->pc, args->mod)) { + pr_err("module %s is in use!\n", args->mod->name); + return (args->ret = -EBUSY); + } + return 0; +} + +int arch_klp_module_check_calltrace(void *data) +{ + struct walk_stackframe_args args = { + .mod = (struct module *)data, + .ret = 0 + }; + + return do_check_calltrace(&args, check_module_calltrace); +} + int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func) { u32 *addr = (u32 *)old_func;
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
The calltrace check code is independent as do_check_calltrace, for calltrace check of module. No functional change.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/kernel/livepatch_32.c | 49 +++++++++++++++++------------- 1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/kernel/livepatch_32.c b/arch/powerpc/kernel/livepatch_32.c index 8478d496a991..4f42e6986a27 100644 --- a/arch/powerpc/kernel/livepatch_32.c +++ b/arch/powerpc/kernel/livepatch_32.c @@ -289,23 +289,12 @@ static void free_list(struct klp_func_list **funcs) } }
-int klp_check_calltrace(struct klp_patch *patch, int enable) +static int do_check_calltrace(struct walk_stackframe_args *args, + int (*fn)(struct stackframe *, void *)) { struct task_struct *g, *t; struct stackframe frame; unsigned long *stack; - int ret = 0; - struct klp_func_list *check_funcs = NULL; - struct walk_stackframe_args args = { - .ret = 0 - }; - - ret = klp_check_activeness_func(patch, enable, &check_funcs); - if (ret) { - pr_err("collect active functions failed, ret=%d\n", ret); - goto out; - } - args.check_funcs = check_funcs;
for_each_process_thread(g, t) { if (t == current) { @@ -344,16 +333,34 @@ int klp_check_calltrace(struct klp_patch *patch, int enable)
frame.sp = (unsigned long)stack; frame.pc = stack[STACK_FRAME_LR_SAVE]; - if (check_funcs != NULL) { - klp_walk_stackframe(&frame, klp_check_jump_func, t, &args); - if (args.ret) { - ret = args.ret; - pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); - show_stack(t, NULL, KERN_INFO); - goto out; - } + klp_walk_stackframe(&frame, fn, t, args); + if (args->ret) { + pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); + show_stack(t, NULL, KERN_INFO); + return args->ret; } } + return 0; +} + +int klp_check_calltrace(struct klp_patch *patch, int enable) +{ + int ret = 0; + struct klp_func_list *check_funcs = NULL; + struct walk_stackframe_args args = { + .ret = 0 + }; + + ret = klp_check_activeness_func(patch, enable, &check_funcs); + if (ret) { + pr_err("collect active functions failed, ret=%d\n", ret); + goto out; + } + if (!check_funcs) + goto out; + + args.check_funcs = check_funcs; + ret = do_check_calltrace(&args, klp_check_jump_func);
out: free_list(&check_funcs);
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add arch_klp_module_check_calltrace to check whether stacks of all tasks are within the code segment of module.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/kernel/livepatch_32.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/arch/powerpc/kernel/livepatch_32.c b/arch/powerpc/kernel/livepatch_32.c index 4f42e6986a27..603f1d61cc23 100644 --- a/arch/powerpc/kernel/livepatch_32.c +++ b/arch/powerpc/kernel/livepatch_32.c @@ -70,6 +70,7 @@ struct stackframe { struct walk_stackframe_args { int enable; struct klp_func_list *check_funcs; + struct module *mod; int ret; };
@@ -366,6 +367,28 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) free_list(&check_funcs); return ret; } + +static int check_module_calltrace(struct stackframe *frame, void *data) +{ + struct walk_stackframe_args *args = data; + + if (within_module_core(frame->pc, args->mod)) { + pr_err("module %s is in use!\n", args->mod->name); + return (args->ret = -EBUSY); + } + return 0; +} + +int arch_klp_module_check_calltrace(void *data) +{ + struct walk_stackframe_args args = { + .mod = (struct module *)data, + .ret = 0 + }; + + return do_check_calltrace(&args, check_module_calltrace); +} + #endif
#ifdef CONFIG_LIVEPATCH_WO_FTRACE
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
The calltrace check code is independent as do_check_calltrace, for calltrace check of module. No functional change.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/kernel/livepatch_64.c | 52 +++++++++++++++++------------- 1 file changed, 30 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/kernel/livepatch_64.c b/arch/powerpc/kernel/livepatch_64.c index b313917242ee..62c112c18e43 100644 --- a/arch/powerpc/kernel/livepatch_64.c +++ b/arch/powerpc/kernel/livepatch_64.c @@ -339,22 +339,12 @@ static void free_list(struct klp_func_list **funcs) } }
-int klp_check_calltrace(struct klp_patch *patch, int enable) +static int do_check_calltrace(struct walk_stackframe_args *args, + int (*fn)(struct stackframe *, void *)) { struct task_struct *g, *t; struct stackframe frame; unsigned long *stack; - int ret = 0; - struct klp_func_list *check_funcs = NULL; - struct walk_stackframe_args args; - - ret = klp_check_activeness_func(patch, enable, &check_funcs); - if (ret) { - pr_err("collect active functions failed, ret=%d\n", ret); - goto out; - } - args.check_funcs = check_funcs; - args.ret = 0;
for_each_process_thread(g, t) { if (t == current) { @@ -396,18 +386,36 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) frame.sp = (unsigned long)stack; frame.pc = stack[STACK_FRAME_LR_SAVE]; frame.nip = 0; - if (check_funcs != NULL) { - klp_walk_stackframe(&frame, klp_check_jump_func, t, &args); - if (args.ret) { - ret = args.ret; - pr_debug("%s FAILED when %s\n", __func__, - enable ? "enabling" : "disabling"); - pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); - show_stack(t, NULL, KERN_INFO); - goto out; - } + klp_walk_stackframe(&frame, fn, t, args); + if (args->ret) { + pr_debug("%s FAILED when %s\n", __func__, + args->enable ? "enabling" : "disabling"); + pr_info("PID: %d Comm: %.20s\n", t->pid, t->comm); + show_stack(t, NULL, KERN_INFO); + return args->ret; } } + return 0; +} + +int klp_check_calltrace(struct klp_patch *patch, int enable) +{ + int ret = 0; + struct klp_func_list *check_funcs = NULL; + struct walk_stackframe_args args; + + ret = klp_check_activeness_func(patch, enable, &check_funcs); + if (ret) { + pr_err("collect active functions failed, ret=%d\n", ret); + goto out; + } + if (!check_funcs) + goto out; + + args.check_funcs = check_funcs; + args.ret = 0; + args.enable = enable; + ret = do_check_calltrace(&args, klp_check_jump_func);
out: free_list(&check_funcs);
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add arch_klp_module_check_calltrace to check whether stacks of all tasks are within the code segment of module.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/include/asm/livepatch.h | 1 + arch/powerpc/kernel/livepatch_64.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index 052a82078bef..bafbfaba190f 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -126,6 +126,7 @@ int klp_brk_handler(struct pt_regs *regs); int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); long arch_klp_save_old_code(struct arch_klp_data *arch_data, void *old_func); +int arch_klp_module_check_calltrace(void *data);
#endif /* CONFIG_LIVEPATCH_FTRACE */
diff --git a/arch/powerpc/kernel/livepatch_64.c b/arch/powerpc/kernel/livepatch_64.c index 62c112c18e43..f008b3beb001 100644 --- a/arch/powerpc/kernel/livepatch_64.c +++ b/arch/powerpc/kernel/livepatch_64.c @@ -76,6 +76,7 @@ struct stackframe { struct walk_stackframe_args { int enable; struct klp_func_list *check_funcs; + struct module *mod; int ret; };
@@ -421,6 +422,28 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) free_list(&check_funcs); return ret; } + +static int check_module_calltrace(struct stackframe *frame, void *data) +{ + struct walk_stackframe_args *args = data; + + if (within_module_core(frame->pc, args->mod)) { + pr_err("module %s is in use!\n", args->mod->name); + return (args->ret = -EBUSY); + } + return 0; +} + +int arch_klp_module_check_calltrace(void *data) +{ + struct walk_stackframe_args args = { + .mod = (struct module *)data, + .ret = 0 + }; + + return do_check_calltrace(&args, check_module_calltrace); +} + #endif
#ifdef CONFIG_LIVEPATCH_WO_FTRACE
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
The calltrace check code is independent as do_check_calltrace, for calltrace check of module. No functional change.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/kernel/livepatch.c | 45 ++++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index b026efdfe346..a778b94afdef 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -246,8 +246,10 @@ static void klp_print_stack_trace(void *trace_ptr, int trace_len) #endif #define MAX_STACK_ENTRIES 100
-static bool check_func_list(struct klp_func_list *funcs, int *ret, unsigned long pc) +static bool check_func_list(void *data, int *ret, unsigned long pc) { + struct klp_func_list *funcs = (struct klp_func_list *)data; + while (funcs != NULL) { *ret = klp_compare_address(pc, funcs->func_addr, funcs->func_name, klp_size_to_check(funcs->func_size, funcs->force)); @@ -260,7 +262,7 @@ static bool check_func_list(struct klp_func_list *funcs, int *ret, unsigned long }
static int klp_check_stack(void *trace_ptr, int trace_len, - struct klp_func_list *check_funcs) + bool (*fn)(void *, int *, unsigned long), void *data) { #ifdef CONFIG_ARCH_STACKWALK unsigned long *trace = trace_ptr; @@ -277,7 +279,7 @@ static int klp_check_stack(void *trace_ptr, int trace_len, for (i = 0; i < trace->nr_entries; i++) { address = trace->entries[i]; #endif - if (!check_func_list(check_funcs, &ret, address)) { + if (!fn(data, &ret, address)) { #ifdef CONFIG_ARCH_STACKWALK klp_print_stack_trace(trace_ptr, trace_len); #else @@ -301,11 +303,10 @@ static void free_list(struct klp_func_list **funcs) } }
-int klp_check_calltrace(struct klp_patch *patch, int enable) +static int do_check_calltrace(bool (*fn)(void *, int *, unsigned long), void *data) { struct task_struct *g, *t; int ret = 0; - struct klp_func_list *check_funcs = NULL; static unsigned long trace_entries[MAX_STACK_ENTRIES]; #ifdef CONFIG_ARCH_STACKWALK int trace_len; @@ -313,11 +314,6 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) struct stack_trace trace; #endif
- ret = klp_check_activeness_func(patch, enable, &check_funcs); - if (ret) { - pr_err("collect active functions failed, ret=%d\n", ret); - goto out; - } for_each_process_thread(g, t) { if (!strncmp(t->comm, "migration/", 10)) continue; @@ -327,10 +323,10 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) if (ret < 0) { pr_err("%s:%d has an unreliable stack, ret=%d\n", t->comm, t->pid, ret); - goto out; + return ret; } trace_len = ret; - ret = klp_check_stack(trace_entries, trace_len, check_funcs); + ret = klp_check_stack(trace_entries, trace_len, fn, data); #else trace.skip = 0; trace.nr_entries = 0; @@ -341,17 +337,36 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) if (ret) { pr_err("%s: %s:%d has an unreliable stack, ret=%d\n", __func__, t->comm, t->pid, ret); - goto out; + return ret; } - ret = klp_check_stack(&trace, 0, check_funcs); + ret = klp_check_stack(&trace, 0, fn, data); #endif if (ret) { pr_err("%s:%d check stack failed, ret=%d\n", t->comm, t->pid, ret); - goto out; + return ret; } }
+ return 0; +} + +int klp_check_calltrace(struct klp_patch *patch, int enable) +{ + int ret = 0; + struct klp_func_list *check_funcs = NULL; + + ret = klp_check_activeness_func(patch, enable, &check_funcs); + if (ret) { + pr_err("collect active functions failed, ret=%d\n", ret); + goto out; + } + + if (!check_funcs) + goto out; + + ret = do_check_calltrace(check_func_list, (void *)check_funcs); + out: free_list(&check_funcs); return ret;
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add arch_klp_module_check_calltrace to check whether stacks of all tasks are within the code segment of module.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/include/asm/livepatch.h | 1 + arch/x86/kernel/livepatch.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/arch/x86/include/asm/livepatch.h b/arch/x86/include/asm/livepatch.h index e0faa20d525d..b510f935ec11 100644 --- a/arch/x86/include/asm/livepatch.h +++ b/arch/x86/include/asm/livepatch.h @@ -52,6 +52,7 @@ int arch_klp_check_breakpoint(struct arch_klp_data *arch_data, void *old_func); int arch_klp_add_breakpoint(struct arch_klp_data *arch_data, void *old_func); void arch_klp_remove_breakpoint(struct arch_klp_data *arch_data, void *old_func); int klp_int3_handler(struct pt_regs *regs); +int arch_klp_module_check_calltrace(void *data); #endif
#endif diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index a778b94afdef..d134169488b6 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -372,6 +372,23 @@ int klp_check_calltrace(struct klp_patch *patch, int enable) return ret; }
+static bool check_module_calltrace(void *data, int *ret, unsigned long pc) +{ + struct module *mod = (struct module *)data; + + if (within_module_core(pc, mod)) { + pr_err("module %s is in use!\n", mod->name); + *ret = -EBUSY; + return false; + } + return true; +} + +int arch_klp_module_check_calltrace(void *data) +{ + return do_check_calltrace(check_module_calltrace, data); +} + int arch_klp_check_breakpoint(struct arch_klp_data *arch_data, void *old_func) { int ret;
From: Yang Jihong yangjihong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5CJ7X
--------------------------------
Add klp_module_delete_safety_check for calltrace check during module deletion to avoid unsafe resource release.
Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/livepatch.h | 1 + kernel/livepatch/core.c | 36 ++++++++++++++++++++++++++++++++++++ kernel/module.c | 9 +++++++++ 3 files changed, 46 insertions(+)
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index c9f3c1c12638..9301f8e9bb90 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -257,6 +257,7 @@ int klp_compare_address(unsigned long pc, unsigned long func_addr, }
void arch_klp_init(void); +int klp_module_delete_safety_check(struct module *mod);
#endif
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index ae116aac9b48..780a825cee8b 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -1426,6 +1426,11 @@ void __weak arch_klp_set_brk_func(struct klp_func_node *func_node, void *new_fun func_node->brk_func = new_func; }
+int __weak arch_klp_module_check_calltrace(void *data) +{ + return 0; +} + static struct klp_func_node *func_node_alloc(struct klp_func *func) { long ret; @@ -2093,6 +2098,37 @@ int klp_unregister_patch(struct klp_patch *patch) } EXPORT_SYMBOL_GPL(klp_unregister_patch);
+/** + * klp_module_delete_safety_check() - safety check in livepatch scenario when delete a module + * @mod: Module to be deleted + * + * Module refcnt ensures that there is no rare case between enable_patch and delete_module: + * 1. safety_check -> try_enable_patch -> try_release_module_ref: + * try_enable_patch would increase module refcnt, which cause try_release_module_ref fails. + * 2. safety_check -> try_release_module_ref -> try_enable_patch: + * after release module ref, try_enable_patch would fail because try_module_get fails. + * So the problem that release resources unsafely when enable livepatch after safety_check is + * passed during module deletion does not exist, complex synchronization protection is not + * required. + + * Return: 0 on success, otherwise error + */ +int klp_module_delete_safety_check(struct module *mod) +{ + int ret; + + if (!mod || !is_livepatch_module(mod)) + return 0; + + ret = stop_machine(arch_klp_module_check_calltrace, (void *)mod, NULL); + if (ret) { + pr_debug("failed to check klp module calltrace: %d\n", ret); + return ret; + } + + return 0; +} + #endif /* #ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY */ /* * This function unpatches objects from the replaced livepatches. diff --git a/kernel/module.c b/kernel/module.c index 1acdfba63716..5fdfa29a0738 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -57,6 +57,9 @@ #include <linux/bsearch.h> #include <linux/dynamic_debug.h> #include <linux/audit.h> +#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY +#include <linux/livepatch.h> +#endif #include <uapi/linux/module.h> #include "module-internal.h"
@@ -1027,6 +1030,12 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user, } }
+#ifdef CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY + ret = klp_module_delete_safety_check(mod); + if (ret != 0) + goto out; +#endif + /* Stop the machine so refcounts can't move and disable module. */ ret = try_stop_module(mod, flags, &forced); if (ret != 0)
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit d802057c7c553ad426520a053da9f9fe08e2c35a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5F4TK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When booting with maxcpus=<small number>, interrupt controllers such as the GICv3 ITS may not be able to satisfy the affinity of some managed interrupts, as some of the HW resources are simply not available.
The same thing happens when loading a driver using managed interrupts while CPUs are offline.
In order to deal with this, do not try to activate such interrupt if there is no online CPU capable of handling it. Instead, place it in shutdown state. Once a capable CPU shows up, it will be activated.
Reported-by: John Garry john.garry@huawei.com Reported-by: David Decotigny ddecotig@google.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: John Garry john.garry@huawei.com Link: https://lore.kernel.org/r/20220405185040.206297-2-maz@kernel.org
conflict: kernel/irq/msi.c
Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/irq/msi.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index d217acc9f71b..77722ebdf6f5 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -456,6 +456,21 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, irqd_clr_can_reserve(irq_data); if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK) irqd_set_msi_nomask_quirk(irq_data); + + /* + * If the interrupt is managed but no CPU is available to + * service it, shut it down until better times. Note that + * we only do this on the !RESERVE path as x86 (the only + * architecture using this flag) deals with this in a + * different way by using a catch-all vector. + */ + if ((info->flags & MSI_FLAG_ACTIVATE_EARLY) && + irqd_affinity_is_managed(irq_data) && + !cpumask_intersects(irq_data_get_affinity_mask(irq_data), + cpu_online_mask)) { + irqd_set_managed_shutdown(irq_data); + return 0; + } } ret = irq_domain_activate_irq(irq_data, can_reserve); if (ret)
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 33de0aa4bae982ed6f7c777f86b5af3e627ac937 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5F4TK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When booting with maxcpus=<small number> (or even loading a driver while most CPUs are offline), it is pretty easy to observe managed affinities containing a mix of online and offline CPUs being passed to the irqchip driver.
This means that the irqchip cannot trust the affinity passed down from the core code, which is a bit annoying and requires (at least in theory) all drivers to implement some sort of affinity narrowing.
In order to address this, always limit the cpumask to the set of online CPUs.
Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20220405185040.206297-3-maz@kernel.org
conflict: kernel/irq/manage.c
Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/irq/manage.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index d3033e1f9d87..7ed0e452bbc7 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -223,11 +223,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, { struct irq_desc *desc = irq_data_to_desc(data); struct irq_chip *chip = irq_data_get_irq_chip(data); + const struct cpumask *prog_mask; int ret;
+ static DEFINE_RAW_SPINLOCK(tmp_mask_lock); + static struct cpumask tmp_mask; + if (!chip || !chip->irq_set_affinity) return -EINVAL;
+ raw_spin_lock(&tmp_mask_lock); /* * If this is a managed interrupt and housekeeping is enabled on * it check whether the requested affinity mask intersects with @@ -249,24 +254,28 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, */ if (irqd_affinity_is_managed(data) && housekeeping_enabled(HK_FLAG_MANAGED_IRQ)) { - const struct cpumask *hk_mask, *prog_mask; - - static DEFINE_RAW_SPINLOCK(tmp_mask_lock); - static struct cpumask tmp_mask; + const struct cpumask *hk_mask;
hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
- raw_spin_lock(&tmp_mask_lock); cpumask_and(&tmp_mask, mask, hk_mask); if (!cpumask_intersects(&tmp_mask, cpu_online_mask)) prog_mask = mask; else prog_mask = &tmp_mask; - ret = chip->irq_set_affinity(data, prog_mask, force); - raw_spin_unlock(&tmp_mask_lock); } else { - ret = chip->irq_set_affinity(data, mask, force); + prog_mask = mask; } + + /* Make sure we only provide online CPUs to the irqchip */ + cpumask_and(&tmp_mask, prog_mask, cpu_online_mask); + if (!cpumask_empty(&tmp_mask)) + ret = chip->irq_set_affinity(data, &tmp_mask, force); + else + ret = -EINVAL; + + raw_spin_unlock(&tmp_mask_lock); + switch (ret) { case IRQ_SET_MASK_OK: case IRQ_SET_MASK_OK_DONE:
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 3f893a5962d31c0164efdbf6174ed0784f1d7603 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5F4TK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Now that the core code has been fixed to always give us an affinity that only includes online CPUs, directly use this affinity when computing a target CPU.
Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20220405185040.206297-4-maz@kernel.org Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/irqchip/irq-gic-v3-its.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 54662b36dd6a..81271fd8954f 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1632,7 +1632,7 @@ static int its_select_cpu(struct irq_data *d,
cpu = cpumask_pick_least_loaded(d, tmpmask); } else { - cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask); + cpumask_copy(tmpmask, aff_mask);
/* If we cannot cross sockets, limit the search to that node */ if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit c48c8b829d2b966a6649827426bcdba082ccf922 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5F4TK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Although setting the affinity of an interrupt to a set of CPUs that doesn't have any online CPU is generally frowned apon, there are a few limited cases where such affinity is set from a CPUHP notifier, setting the affinity to a CPU that isn't online yet.
The saving grace is that this is always done using the 'force' attribute, which gives a hint that the affinity setting can be outside of the online CPU mask and the callsite set this flag with the knowledge that the underlying interrupt controller knows to handle it.
This restores the expected behaviour on Marek's system.
Fixes: 33de0aa4bae9 ("genirq: Always limit the affinity to online CPUs") Reported-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marek Szyprowski m.szyprowski@samsung.com Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/irq/manage.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 7ed0e452bbc7..8d3d49c0483e 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -267,10 +267,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, prog_mask = mask; }
- /* Make sure we only provide online CPUs to the irqchip */ + /* + * Make sure we only provide online CPUs to the irqchip, + * unless we are being asked to force the affinity (in which + * case we do as we are told). + */ cpumask_and(&tmp_mask, prog_mask, cpu_online_mask); - if (!cpumask_empty(&tmp_mask)) + if (!force && !cpumask_empty(&tmp_mask)) ret = chip->irq_set_affinity(data, &tmp_mask, force); + else if (force) + ret = chip->irq_set_affinity(data, mask, force); else ret = -EINVAL;