Backport kfence feature from mainline.
Alexander Potapenko (10): mm: add Kernel Electric-Fence infrastructure x86, kfence: enable KFENCE for x86 mm, kfence: insert KFENCE hooks for SLAB mm, kfence: insert KFENCE hooks for SLUB kfence, kasan: make KFENCE compatible with KASAN tracing: add error_report_end trace point kfence: use error_report_end tracepoint kasan: use error_report_end tracepoint kfence: move the size check to the beginning of __kfence_alloc() kfence: skip all GFP_ZONEMASK allocations
Christophe Leroy (2): powerpc/32s: Always map kernel text and rodata with BATs powerpc: Enable KFENCE for PPC32
Hyeonggon Yoo (1): mm, slub: change run-time assertion in kmalloc_index() to compile-time
Jisheng Zhang (1): arm64: mm: don't use CON and BLK mapping if KFENCE is enabled
Kefeng Wang (3): ARM: mm: Provide set_memory_valid() ARM: mm: Provide is_write_fault() ARM: Support KFENCE for ARM
Marco Elver (22): arm64, kfence: enable KFENCE for ARM64 kfence: use pt_regs to generate stack trace on faults kfence, Documentation: add KFENCE documentation kfence: add test suite MAINTAINERS: add entry for KFENCE kfence: report sensitive information based on no_hash_pointers kfence: fix printk format for ptrdiff_t kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations kfence: fix reports if constant function prefixes exist kfence: make compatible with kmemleak kfence, x86: fix preemptible warning on KPTI-enabled systems kfence: zero guard page after out-of-bounds access kfence: await for allocation using wait_event kfence: maximize allocation wait timeout duration kfence: use power-efficient work queue to run delayed work kfence: use TASK_IDLE when awaiting allocation kfence: unconditionally use unbound work queue kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE kfence, x86: only define helpers if !MODULE lib/vsprintf: do not show no_hash_pointers message multiple times kfence: test: fail fast if disabled at boot kfence: show cpu and timestamp in alloc/free info
Stephen Boyd (1): slub: force on no_hash_pointers when slub_debug is enabled
Sven Schnelle (1): kfence: add function to mask address bits
Timur Tabi (3): lib: use KSTM_MODULE_GLOBALS macro in kselftest drivers kselftest: add support for skipped tests lib/vsprintf: no_hash_pointers prints all addresses as unhashed
Vlastimil Babka (1): printk: clarify the documentation for plain pointer printing
Weizhao Ouyang (1): kfence: defer kfence_test_init to ensure that kunit debugfs is created
.../admin-guide/kernel-parameters.txt | 15 + Documentation/core-api/printk-formats.rst | 26 +- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/kfence.rst | 306 ++++++ MAINTAINERS | 12 + arch/arm/Kconfig | 1 + arch/arm/include/asm/kfence.h | 53 ++ arch/arm/include/asm/set_memory.h | 5 + arch/arm/mm/fault.c | 16 +- arch/arm/mm/pageattr.c | 41 +- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/kfence.h | 22 + arch/arm64/mm/fault.c | 4 + arch/arm64/mm/mmu.c | 11 +- arch/powerpc/Kconfig | 13 +- arch/powerpc/include/asm/kfence.h | 33 + arch/powerpc/kernel/head_book3s_32.S | 4 +- arch/powerpc/mm/book3s32/mmu.c | 8 +- arch/powerpc/mm/fault.c | 7 +- arch/powerpc/mm/init_32.c | 3 + arch/powerpc/mm/mmu_decl.h | 5 + arch/powerpc/mm/nohash/8xx.c | 4 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/kfence.h | 73 ++ arch/x86/mm/fault.c | 6 + include/linux/kernel.h | 2 + include/linux/kfence.h | 223 +++++ include/linux/slab.h | 17 +- include/linux/slab_def.h | 3 + include/linux/slub_def.h | 3 + include/trace/events/error_report.h | 74 ++ init/main.c | 3 + kernel/trace/Makefile | 1 + kernel/trace/error_report-traces.c | 11 + lib/Kconfig.debug | 1 + lib/Kconfig.kfence | 83 ++ lib/test_bitmap.c | 3 +- lib/test_printf.c | 12 +- lib/vsprintf.c | 46 +- mm/Makefile | 1 + mm/kasan/common.c | 18 + mm/kasan/generic.c | 3 +- mm/kasan/report.c | 8 +- mm/kfence/Makefile | 6 + mm/kfence/core.c | 891 ++++++++++++++++++ mm/kfence/kfence.h | 108 +++ mm/kfence/kfence_test.c | 873 +++++++++++++++++ mm/kfence/report.c | 274 ++++++ mm/kmemleak.c | 3 +- mm/slab.c | 41 +- mm/slab_common.c | 12 +- mm/slub.c | 80 +- tools/testing/selftests/kselftest_module.h | 18 +- 53 files changed, 3405 insertions(+), 84 deletions(-) create mode 100644 Documentation/dev-tools/kfence.rst create mode 100644 arch/arm/include/asm/kfence.h create mode 100644 arch/arm64/include/asm/kfence.h create mode 100644 arch/powerpc/include/asm/kfence.h create mode 100644 arch/x86/include/asm/kfence.h create mode 100644 include/linux/kfence.h create mode 100644 include/trace/events/error_report.h create mode 100644 kernel/trace/error_report-traces.c create mode 100644 lib/Kconfig.kfence create mode 100644 mm/kfence/Makefile create mode 100644 mm/kfence/core.c create mode 100644 mm/kfence/kfence.h create mode 100644 mm/kfence/kfence_test.c create mode 100644 mm/kfence/report.c
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit 0ce20dd840897b12ae70869c69f1ba34d6d16965 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. This series enables KFENCE for the x86 and arm64 architectures, and adds KFENCE hooks to the SLAB and SLUB allocators.
KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error.
Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, the next allocation through the main allocator (SLAB or SLUB) returns a guarded allocation from the KFENCE object pool. At this point, the timer is reset, and the next allocation is set up after the expiration of the interval.
To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE.
The KFENCE memory pool is of fixed size, and if the pool is exhausted no further KFENCE allocations occur. The default config is conservative with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB pages).
We have verified by running synthetic benchmarks (sysbench I/O, hackbench) and production server-workload benchmarks that a kernel with KFENCE (using sample intervals 100-500ms) is performance-neutral compared to a non-KFENCE baseline kernel.
KFENCE is inspired by GWP-ASan [1], a userspace tool with similar properties. The name "KFENCE" is a homage to the Electric Fence Malloc Debugger [2].
For more details, see Documentation/dev-tools/kfence.rst added in the series -- also viewable here:
https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tool...
[1] http://llvm.org/docs/GwpAsan.html [2] https://linux.die.net/man/3/efence
This patch (of 9):
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors.
KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. To detect out-of-bounds writes to memory within the object's page itself, KFENCE also uses pattern-based redzones. The following figure illustrates the page layout:
---+-----------+-----------+-----------+-----------+-----------+--- | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | ---+-----------+-----------+-----------+-----------+-----------+---
Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, a guarded allocation from the KFENCE object pool is returned to the main allocator (SLAB or SLUB). At this point, the timer is reset, and the next allocation is set up after the expiration of the interval.
To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. To date, we have verified by running synthetic benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE is performance-neutral compared to the non-KFENCE baseline.
For more details, see Documentation/dev-tools/kfence.rst (added later in the series).
[elver@google.com: fix parameter description for kfence_object_start()] Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com [elver@google.com: avoid stalling work queue task without allocations] Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJi... Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com [elver@google.com: fix potential deadlock due to wake_up()] Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com [elver@google.com: add option to use KFENCE without static keys] Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com [elver@google.com: add missing copyright and description headers] Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com Signed-off-by: Marco Elver elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Reviewed-by: SeongJae Park sjpark@amazon.de Co-developed-by: Marco Elver elver@google.com Reviewed-by: Jann Horn jannh@google.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: Ingo Molnar mingo@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Joern Engel joern@purestorage.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: init/main.c [Peng Liu: cherry-pick from 0ce20dd840897b12ae70869c69f1ba34d6d16965] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- include/linux/kfence.h | 216 +++++++++++ init/main.c | 3 + lib/Kconfig.debug | 1 + lib/Kconfig.kfence | 67 ++++ mm/Makefile | 1 + mm/kfence/Makefile | 3 + mm/kfence/core.c | 840 +++++++++++++++++++++++++++++++++++++++++ mm/kfence/kfence.h | 113 ++++++ mm/kfence/report.c | 240 ++++++++++++ 9 files changed, 1484 insertions(+) create mode 100644 include/linux/kfence.h create mode 100644 lib/Kconfig.kfence create mode 100644 mm/kfence/Makefile create mode 100644 mm/kfence/core.c create mode 100644 mm/kfence/kfence.h create mode 100644 mm/kfence/report.c
diff --git a/include/linux/kfence.h b/include/linux/kfence.h new file mode 100644 index 000000000000..81f3911cb298 --- /dev/null +++ b/include/linux/kfence.h @@ -0,0 +1,216 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Kernel Electric-Fence (KFENCE). Public interface for allocator and fault + * handler integration. For more info see Documentation/dev-tools/kfence.rst. + * + * Copyright (C) 2020, Google LLC. + */ + +#ifndef _LINUX_KFENCE_H +#define _LINUX_KFENCE_H + +#include <linux/mm.h> +#include <linux/types.h> + +#ifdef CONFIG_KFENCE + +/* + * We allocate an even number of pages, as it simplifies calculations to map + * address to metadata indices; effectively, the very first page serves as an + * extended guard page, but otherwise has no special purpose. + */ +#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE) +extern char *__kfence_pool; + +#ifdef CONFIG_KFENCE_STATIC_KEYS +#include <linux/static_key.h> +DECLARE_STATIC_KEY_FALSE(kfence_allocation_key); +#else +#include <linux/atomic.h> +extern atomic_t kfence_allocation_gate; +#endif + +/** + * is_kfence_address() - check if an address belongs to KFENCE pool + * @addr: address to check + * + * Return: true or false depending on whether the address is within the KFENCE + * object range. + * + * KFENCE objects live in a separate page range and are not to be intermixed + * with regular heap objects (e.g. KFENCE objects must never be added to the + * allocator freelists). Failing to do so may and will result in heap + * corruptions, therefore is_kfence_address() must be used to check whether + * an object requires specific handling. + * + * Note: This function may be used in fast-paths, and is performance critical. + * Future changes should take this into account; for instance, we want to avoid + * introducing another load and therefore need to keep KFENCE_POOL_SIZE a + * constant (until immediate patching support is added to the kernel). + */ +static __always_inline bool is_kfence_address(const void *addr) +{ + /* + * The non-NULL check is required in case the __kfence_pool pointer was + * never initialized; keep it in the slow-path after the range-check. + */ + return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && addr); +} + +/** + * kfence_alloc_pool() - allocate the KFENCE pool via memblock + */ +void __init kfence_alloc_pool(void); + +/** + * kfence_init() - perform KFENCE initialization at boot time + * + * Requires that kfence_alloc_pool() was called before. This sets up the + * allocation gate timer, and requires that workqueues are available. + */ +void __init kfence_init(void); + +/** + * kfence_shutdown_cache() - handle shutdown_cache() for KFENCE objects + * @s: cache being shut down + * + * Before shutting down a cache, one must ensure there are no remaining objects + * allocated from it. Because KFENCE objects are not referenced from the cache + * directly, we need to check them here. + * + * Note that shutdown_cache() is internal to SL*B, and kmem_cache_destroy() does + * not return if allocated objects still exist: it prints an error message and + * simply aborts destruction of a cache, leaking memory. + * + * If the only such objects are KFENCE objects, we will not leak the entire + * cache, but instead try to provide more useful debug info by making allocated + * objects "zombie allocations". Objects may then still be used or freed (which + * is handled gracefully), but usage will result in showing KFENCE error reports + * which include stack traces to the user of the object, the original allocation + * site, and caller to shutdown_cache(). + */ +void kfence_shutdown_cache(struct kmem_cache *s); + +/* + * Allocate a KFENCE object. Allocators must not call this function directly, + * use kfence_alloc() instead. + */ +void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags); + +/** + * kfence_alloc() - allocate a KFENCE object with a low probability + * @s: struct kmem_cache with object requirements + * @size: exact size of the object to allocate (can be less than @s->size + * e.g. for kmalloc caches) + * @flags: GFP flags + * + * Return: + * * NULL - must proceed with allocating as usual, + * * non-NULL - pointer to a KFENCE object. + * + * kfence_alloc() should be inserted into the heap allocation fast path, + * allowing it to transparently return KFENCE-allocated objects with a low + * probability using a static branch (the probability is controlled by the + * kfence.sample_interval boot parameter). + */ +static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) +{ +#ifdef CONFIG_KFENCE_STATIC_KEYS + if (static_branch_unlikely(&kfence_allocation_key)) +#else + if (unlikely(!atomic_read(&kfence_allocation_gate))) +#endif + return __kfence_alloc(s, size, flags); + return NULL; +} + +/** + * kfence_ksize() - get actual amount of memory allocated for a KFENCE object + * @addr: pointer to a heap object + * + * Return: + * * 0 - not a KFENCE object, must call __ksize() instead, + * * non-0 - this many bytes can be accessed without causing a memory error. + * + * kfence_ksize() returns the number of bytes requested for a KFENCE object at + * allocation time. This number may be less than the object size of the + * corresponding struct kmem_cache. + */ +size_t kfence_ksize(const void *addr); + +/** + * kfence_object_start() - find the beginning of a KFENCE object + * @addr: address within a KFENCE-allocated object + * + * Return: address of the beginning of the object. + * + * SL[AU]B-allocated objects are laid out within a page one by one, so it is + * easy to calculate the beginning of an object given a pointer inside it and + * the object size. The same is not true for KFENCE, which places a single + * object at either end of the page. This helper function is used to find the + * beginning of a KFENCE-allocated object. + */ +void *kfence_object_start(const void *addr); + +/** + * __kfence_free() - release a KFENCE heap object to KFENCE pool + * @addr: object to be freed + * + * Requires: is_kfence_address(addr) + * + * Release a KFENCE object and mark it as freed. + */ +void __kfence_free(void *addr); + +/** + * kfence_free() - try to release an arbitrary heap object to KFENCE pool + * @addr: object to be freed + * + * Return: + * * false - object doesn't belong to KFENCE pool and was ignored, + * * true - object was released to KFENCE pool. + * + * Release a KFENCE object and mark it as freed. May be called on any object, + * even non-KFENCE objects, to simplify integration of the hooks into the + * allocator's free codepath. The allocator must check the return value to + * determine if it was a KFENCE object or not. + */ +static __always_inline __must_check bool kfence_free(void *addr) +{ + if (!is_kfence_address(addr)) + return false; + __kfence_free(addr); + return true; +} + +/** + * kfence_handle_page_fault() - perform page fault handling for KFENCE pages + * @addr: faulting address + * + * Return: + * * false - address outside KFENCE pool, + * * true - page fault handled by KFENCE, no additional handling required. + * + * A page fault inside KFENCE pool indicates a memory error, such as an + * out-of-bounds access, a use-after-free or an invalid memory access. In these + * cases KFENCE prints an error message and marks the offending page as + * present, so that the kernel can proceed. + */ +bool __must_check kfence_handle_page_fault(unsigned long addr); + +#else /* CONFIG_KFENCE */ + +static inline bool is_kfence_address(const void *addr) { return false; } +static inline void kfence_alloc_pool(void) { } +static inline void kfence_init(void) { } +static inline void kfence_shutdown_cache(struct kmem_cache *s) { } +static inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { return NULL; } +static inline size_t kfence_ksize(const void *addr) { return 0; } +static inline void *kfence_object_start(const void *addr) { return NULL; } +static inline void __kfence_free(void *addr) { } +static inline bool __must_check kfence_free(void *addr) { return false; } +static inline bool __must_check kfence_handle_page_fault(unsigned long addr) { return false; } + +#endif + +#endif /* _LINUX_KFENCE_H */ diff --git a/init/main.c b/init/main.c index ff9bdc5c4394..646e20a8d1ff 100644 --- a/init/main.c +++ b/init/main.c @@ -40,6 +40,7 @@ #include <linux/security.h> #include <linux/smp.h> #include <linux/profile.h> +#include <linux/kfence.h> #include <linux/rcupdate.h> #include <linux/moduleparam.h> #include <linux/kallsyms.h> @@ -828,6 +829,7 @@ static void __init mm_init(void) */ page_ext_init_flatmem(); init_debug_pagealloc(); + kfence_alloc_pool(); report_meminit(); mem_init(); /* page_owner must be initialized after buddy is ready */ @@ -954,6 +956,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) hrtimers_init(); softirq_init(); timekeeping_init(); + kfence_init();
/* * For best initial stack canary entropy, prepare it after: diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index b9f0a3570596..bc98ac1cbb0f 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -878,6 +878,7 @@ config DEBUG_STACKOVERFLOW If in doubt, say "N".
source "lib/Kconfig.kasan" +source "lib/Kconfig.kfence"
endmenu # "Memory Debugging"
diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence new file mode 100644 index 000000000000..b88ac9d6b2e6 --- /dev/null +++ b/lib/Kconfig.kfence @@ -0,0 +1,67 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config HAVE_ARCH_KFENCE + bool + +menuconfig KFENCE + bool "KFENCE: low-overhead sampling-based memory safety error detector" + depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB) + select STACKTRACE + help + KFENCE is a low-overhead sampling-based detector of heap out-of-bounds + access, use-after-free, and invalid-free errors. KFENCE is designed + to have negligible cost to permit enabling it in production + environments. + + Note that, KFENCE is not a substitute for explicit testing with tools + such as KASAN. KFENCE can detect a subset of bugs that KASAN can + detect, albeit at very different performance profiles. If you can + afford to use KASAN, continue using KASAN, for example in test + environments. If your kernel targets production use, and cannot + enable KASAN due to its cost, consider using KFENCE. + +if KFENCE + +config KFENCE_STATIC_KEYS + bool "Use static keys to set up allocations" + default y + depends on JUMP_LABEL # To ensure performance, require jump labels + help + Use static keys (static branches) to set up KFENCE allocations. Using + static keys is normally recommended, because it avoids a dynamic + branch in the allocator's fast path. However, with very low sample + intervals, or on systems that do not support jump labels, a dynamic + branch may still be an acceptable performance trade-off. + +config KFENCE_SAMPLE_INTERVAL + int "Default sample interval in milliseconds" + default 100 + help + The KFENCE sample interval determines the frequency with which heap + allocations will be guarded by KFENCE. May be overridden via boot + parameter "kfence.sample_interval". + + Set this to 0 to disable KFENCE by default, in which case only + setting "kfence.sample_interval" to a non-zero value enables KFENCE. + +config KFENCE_NUM_OBJECTS + int "Number of guarded objects available" + range 1 65535 + default 255 + help + The number of guarded objects available. For each KFENCE object, 2 + pages are required; with one containing the object and two adjacent + ones used as guard pages. + +config KFENCE_STRESS_TEST_FAULTS + int "Stress testing of fault handling and error reporting" if EXPERT + default 0 + help + The inverse probability with which to randomly protect KFENCE object + pages, resulting in spurious use-after-frees. The main purpose of + this option is to stress test KFENCE with concurrent error reports + and allocations/frees. A value of 0 disables stress testing logic. + + Only for KFENCE testing; set to 0 if you are not a KFENCE developer. + +endif # KFENCE diff --git a/mm/Makefile b/mm/Makefile index b341ef0d3406..0ae832fe9df6 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -82,6 +82,7 @@ obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o obj-$(CONFIG_SLUB) += slub.o obj-$(CONFIG_KASAN) += kasan/ +obj-$(CONFIG_KFENCE) += kfence/ obj-$(CONFIG_FAILSLAB) += failslab.o obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o obj-$(CONFIG_MEMTEST) += memtest.o diff --git a/mm/kfence/Makefile b/mm/kfence/Makefile new file mode 100644 index 000000000000..d991e9a349f0 --- /dev/null +++ b/mm/kfence/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_KFENCE) := core.o report.o diff --git a/mm/kfence/core.c b/mm/kfence/core.c new file mode 100644 index 000000000000..d6a32c13336b --- /dev/null +++ b/mm/kfence/core.c @@ -0,0 +1,840 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KFENCE guarded object allocator and fault handling. + * + * Copyright (C) 2020, Google LLC. + */ + +#define pr_fmt(fmt) "kfence: " fmt + +#include <linux/atomic.h> +#include <linux/bug.h> +#include <linux/debugfs.h> +#include <linux/kcsan-checks.h> +#include <linux/kfence.h> +#include <linux/list.h> +#include <linux/lockdep.h> +#include <linux/memblock.h> +#include <linux/moduleparam.h> +#include <linux/random.h> +#include <linux/rcupdate.h> +#include <linux/seq_file.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/string.h> + +#include <asm/kfence.h> + +#include "kfence.h" + +/* Disables KFENCE on the first warning assuming an irrecoverable error. */ +#define KFENCE_WARN_ON(cond) \ + ({ \ + const bool __cond = WARN_ON(cond); \ + if (unlikely(__cond)) \ + WRITE_ONCE(kfence_enabled, false); \ + __cond; \ + }) + +/* === Data ================================================================= */ + +static bool kfence_enabled __read_mostly; + +static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; + +#ifdef MODULE_PARAM_PREFIX +#undef MODULE_PARAM_PREFIX +#endif +#define MODULE_PARAM_PREFIX "kfence." + +static int param_set_sample_interval(const char *val, const struct kernel_param *kp) +{ + unsigned long num; + int ret = kstrtoul(val, 0, &num); + + if (ret < 0) + return ret; + + if (!num) /* Using 0 to indicate KFENCE is disabled. */ + WRITE_ONCE(kfence_enabled, false); + else if (!READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) + return -EINVAL; /* Cannot (re-)enable KFENCE on-the-fly. */ + + *((unsigned long *)kp->arg) = num; + return 0; +} + +static int param_get_sample_interval(char *buffer, const struct kernel_param *kp) +{ + if (!READ_ONCE(kfence_enabled)) + return sprintf(buffer, "0\n"); + + return param_get_ulong(buffer, kp); +} + +static const struct kernel_param_ops sample_interval_param_ops = { + .set = param_set_sample_interval, + .get = param_get_sample_interval, +}; +module_param_cb(sample_interval, &sample_interval_param_ops, &kfence_sample_interval, 0600); + +/* The pool of pages used for guard pages and objects. */ +char *__kfence_pool __ro_after_init; +EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ + +/* + * Per-object metadata, with one-to-one mapping of object metadata to + * backing pages (in __kfence_pool). + */ +static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0); +struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; + +/* Freelist with available objects. */ +static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist); +static DEFINE_RAW_SPINLOCK(kfence_freelist_lock); /* Lock protecting freelist. */ + +#ifdef CONFIG_KFENCE_STATIC_KEYS +/* The static key to set up a KFENCE allocation. */ +DEFINE_STATIC_KEY_FALSE(kfence_allocation_key); +#endif + +/* Gates the allocation, ensuring only one succeeds in a given period. */ +atomic_t kfence_allocation_gate = ATOMIC_INIT(1); + +/* Statistics counters for debugfs. */ +enum kfence_counter_id { + KFENCE_COUNTER_ALLOCATED, + KFENCE_COUNTER_ALLOCS, + KFENCE_COUNTER_FREES, + KFENCE_COUNTER_ZOMBIES, + KFENCE_COUNTER_BUGS, + KFENCE_COUNTER_COUNT, +}; +static atomic_long_t counters[KFENCE_COUNTER_COUNT]; +static const char *const counter_names[] = { + [KFENCE_COUNTER_ALLOCATED] = "currently allocated", + [KFENCE_COUNTER_ALLOCS] = "total allocations", + [KFENCE_COUNTER_FREES] = "total frees", + [KFENCE_COUNTER_ZOMBIES] = "zombie allocations", + [KFENCE_COUNTER_BUGS] = "total bugs", +}; +static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT); + +/* === Internals ============================================================ */ + +static bool kfence_protect(unsigned long addr) +{ + return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), true)); +} + +static bool kfence_unprotect(unsigned long addr) +{ + return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), false)); +} + +static inline struct kfence_metadata *addr_to_metadata(unsigned long addr) +{ + long index; + + /* The checks do not affect performance; only called from slow-paths. */ + + if (!is_kfence_address((void *)addr)) + return NULL; + + /* + * May be an invalid index if called with an address at the edge of + * __kfence_pool, in which case we would report an "invalid access" + * error. + */ + index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) - 1; + if (index < 0 || index >= CONFIG_KFENCE_NUM_OBJECTS) + return NULL; + + return &kfence_metadata[index]; +} + +static inline unsigned long metadata_to_pageaddr(const struct kfence_metadata *meta) +{ + unsigned long offset = (meta - kfence_metadata + 1) * PAGE_SIZE * 2; + unsigned long pageaddr = (unsigned long)&__kfence_pool[offset]; + + /* The checks do not affect performance; only called from slow-paths. */ + + /* Only call with a pointer into kfence_metadata. */ + if (KFENCE_WARN_ON(meta < kfence_metadata || + meta >= kfence_metadata + CONFIG_KFENCE_NUM_OBJECTS)) + return 0; + + /* + * This metadata object only ever maps to 1 page; verify that the stored + * address is in the expected range. + */ + if (KFENCE_WARN_ON(ALIGN_DOWN(meta->addr, PAGE_SIZE) != pageaddr)) + return 0; + + return pageaddr; +} + +/* + * Update the object's metadata state, including updating the alloc/free stacks + * depending on the state transition. + */ +static noinline void metadata_update_state(struct kfence_metadata *meta, + enum kfence_object_state next) +{ + struct kfence_track *track = + next == KFENCE_OBJECT_FREED ? &meta->free_track : &meta->alloc_track; + + lockdep_assert_held(&meta->lock); + + /* + * Skip over 1 (this) functions; noinline ensures we do not accidentally + * skip over the caller by never inlining. + */ + track->num_stack_entries = stack_trace_save(track->stack_entries, KFENCE_STACK_DEPTH, 1); + track->pid = task_pid_nr(current); + + /* + * Pairs with READ_ONCE() in + * kfence_shutdown_cache(), + * kfence_handle_page_fault(). + */ + WRITE_ONCE(meta->state, next); +} + +/* Write canary byte to @addr. */ +static inline bool set_canary_byte(u8 *addr) +{ + *addr = KFENCE_CANARY_PATTERN(addr); + return true; +} + +/* Check canary byte at @addr. */ +static inline bool check_canary_byte(u8 *addr) +{ + if (likely(*addr == KFENCE_CANARY_PATTERN(addr))) + return true; + + atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); + kfence_report_error((unsigned long)addr, addr_to_metadata((unsigned long)addr), + KFENCE_ERROR_CORRUPTION); + return false; +} + +/* __always_inline this to ensure we won't do an indirect call to fn. */ +static __always_inline void for_each_canary(const struct kfence_metadata *meta, bool (*fn)(u8 *)) +{ + const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); + unsigned long addr; + + lockdep_assert_held(&meta->lock); + + /* + * We'll iterate over each canary byte per-side until fn() returns + * false. However, we'll still iterate over the canary bytes to the + * right of the object even if there was an error in the canary bytes to + * the left of the object. Specifically, if check_canary_byte() + * generates an error, showing both sides might give more clues as to + * what the error is about when displaying which bytes were corrupted. + */ + + /* Apply to left of object. */ + for (addr = pageaddr; addr < meta->addr; addr++) { + if (!fn((u8 *)addr)) + break; + } + + /* Apply to right of object. */ + for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) { + if (!fn((u8 *)addr)) + break; + } +} + +static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp) +{ + struct kfence_metadata *meta = NULL; + unsigned long flags; + struct page *page; + void *addr; + + /* Try to obtain a free object. */ + raw_spin_lock_irqsave(&kfence_freelist_lock, flags); + if (!list_empty(&kfence_freelist)) { + meta = list_entry(kfence_freelist.next, struct kfence_metadata, list); + list_del_init(&meta->list); + } + raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); + if (!meta) + return NULL; + + if (unlikely(!raw_spin_trylock_irqsave(&meta->lock, flags))) { + /* + * This is extremely unlikely -- we are reporting on a + * use-after-free, which locked meta->lock, and the reporting + * code via printk calls kmalloc() which ends up in + * kfence_alloc() and tries to grab the same object that we're + * reporting on. While it has never been observed, lockdep does + * report that there is a possibility of deadlock. Fix it by + * using trylock and bailing out gracefully. + */ + raw_spin_lock_irqsave(&kfence_freelist_lock, flags); + /* Put the object back on the freelist. */ + list_add_tail(&meta->list, &kfence_freelist); + raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); + + return NULL; + } + + meta->addr = metadata_to_pageaddr(meta); + /* Unprotect if we're reusing this page. */ + if (meta->state == KFENCE_OBJECT_FREED) + kfence_unprotect(meta->addr); + + /* + * Note: for allocations made before RNG initialization, will always + * return zero. We still benefit from enabling KFENCE as early as + * possible, even when the RNG is not yet available, as this will allow + * KFENCE to detect bugs due to earlier allocations. The only downside + * is that the out-of-bounds accesses detected are deterministic for + * such allocations. + */ + if (prandom_u32_max(2)) { + /* Allocate on the "right" side, re-calculate address. */ + meta->addr += PAGE_SIZE - size; + meta->addr = ALIGN_DOWN(meta->addr, cache->align); + } + + addr = (void *)meta->addr; + + /* Update remaining metadata. */ + metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED); + /* Pairs with READ_ONCE() in kfence_shutdown_cache(). */ + WRITE_ONCE(meta->cache, cache); + meta->size = size; + for_each_canary(meta, set_canary_byte); + + /* Set required struct page fields. */ + page = virt_to_page(meta->addr); + page->slab_cache = cache; + + raw_spin_unlock_irqrestore(&meta->lock, flags); + + /* Memory initialization. */ + + /* + * We check slab_want_init_on_alloc() ourselves, rather than letting + * SL*B do the initialization, as otherwise we might overwrite KFENCE's + * redzone. + */ + if (unlikely(slab_want_init_on_alloc(gfp, cache))) + memzero_explicit(addr, size); + if (cache->ctor) + cache->ctor(addr); + + if (CONFIG_KFENCE_STRESS_TEST_FAULTS && !prandom_u32_max(CONFIG_KFENCE_STRESS_TEST_FAULTS)) + kfence_protect(meta->addr); /* Random "faults" by protecting the object. */ + + atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCATED]); + atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCS]); + + return addr; +} + +static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool zombie) +{ + struct kcsan_scoped_access assert_page_exclusive; + unsigned long flags; + + raw_spin_lock_irqsave(&meta->lock, flags); + + if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) { + /* Invalid or double-free, bail out. */ + atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); + kfence_report_error((unsigned long)addr, meta, KFENCE_ERROR_INVALID_FREE); + raw_spin_unlock_irqrestore(&meta->lock, flags); + return; + } + + /* Detect racy use-after-free, or incorrect reallocation of this page by KFENCE. */ + kcsan_begin_scoped_access((void *)ALIGN_DOWN((unsigned long)addr, PAGE_SIZE), PAGE_SIZE, + KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT, + &assert_page_exclusive); + + if (CONFIG_KFENCE_STRESS_TEST_FAULTS) + kfence_unprotect((unsigned long)addr); /* To check canary bytes. */ + + /* Restore page protection if there was an OOB access. */ + if (meta->unprotected_page) { + kfence_protect(meta->unprotected_page); + meta->unprotected_page = 0; + } + + /* Check canary bytes for memory corruption. */ + for_each_canary(meta, check_canary_byte); + + /* + * Clear memory if init-on-free is set. While we protect the page, the + * data is still there, and after a use-after-free is detected, we + * unprotect the page, so the data is still accessible. + */ + if (!zombie && unlikely(slab_want_init_on_free(meta->cache))) + memzero_explicit(addr, meta->size); + + /* Mark the object as freed. */ + metadata_update_state(meta, KFENCE_OBJECT_FREED); + + raw_spin_unlock_irqrestore(&meta->lock, flags); + + /* Protect to detect use-after-frees. */ + kfence_protect((unsigned long)addr); + + kcsan_end_scoped_access(&assert_page_exclusive); + if (!zombie) { + /* Add it to the tail of the freelist for reuse. */ + raw_spin_lock_irqsave(&kfence_freelist_lock, flags); + KFENCE_WARN_ON(!list_empty(&meta->list)); + list_add_tail(&meta->list, &kfence_freelist); + raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); + + atomic_long_dec(&counters[KFENCE_COUNTER_ALLOCATED]); + atomic_long_inc(&counters[KFENCE_COUNTER_FREES]); + } else { + /* See kfence_shutdown_cache(). */ + atomic_long_inc(&counters[KFENCE_COUNTER_ZOMBIES]); + } +} + +static void rcu_guarded_free(struct rcu_head *h) +{ + struct kfence_metadata *meta = container_of(h, struct kfence_metadata, rcu_head); + + kfence_guarded_free((void *)meta->addr, meta, false); +} + +static bool __init kfence_init_pool(void) +{ + unsigned long addr = (unsigned long)__kfence_pool; + struct page *pages; + int i; + + if (!__kfence_pool) + return false; + + if (!arch_kfence_init_pool()) + goto err; + + pages = virt_to_page(addr); + + /* + * Set up object pages: they must have PG_slab set, to avoid freeing + * these as real pages. + * + * We also want to avoid inserting kfence_free() in the kfree() + * fast-path in SLUB, and therefore need to ensure kfree() correctly + * enters __slab_free() slow-path. + */ + for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) { + if (!i || (i % 2)) + continue; + + /* Verify we do not have a compound head page. */ + if (WARN_ON(compound_head(&pages[i]) != &pages[i])) + goto err; + + __SetPageSlab(&pages[i]); + } + + /* + * Protect the first 2 pages. The first page is mostly unnecessary, and + * merely serves as an extended guard page. However, adding one + * additional page in the beginning gives us an even number of pages, + * which simplifies the mapping of address to metadata index. + */ + for (i = 0; i < 2; i++) { + if (unlikely(!kfence_protect(addr))) + goto err; + + addr += PAGE_SIZE; + } + + for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + struct kfence_metadata *meta = &kfence_metadata[i]; + + /* Initialize metadata. */ + INIT_LIST_HEAD(&meta->list); + raw_spin_lock_init(&meta->lock); + meta->state = KFENCE_OBJECT_UNUSED; + meta->addr = addr; /* Initialize for validation in metadata_to_pageaddr(). */ + list_add_tail(&meta->list, &kfence_freelist); + + /* Protect the right redzone. */ + if (unlikely(!kfence_protect(addr + PAGE_SIZE))) + goto err; + + addr += 2 * PAGE_SIZE; + } + + return true; + +err: + /* + * Only release unprotected pages, and do not try to go back and change + * page attributes due to risk of failing to do so as well. If changing + * page attributes for some pages fails, it is very likely that it also + * fails for the first page, and therefore expect addr==__kfence_pool in + * most failure cases. + */ + memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool)); + __kfence_pool = NULL; + return false; +} + +/* === DebugFS Interface ==================================================== */ + +static int stats_show(struct seq_file *seq, void *v) +{ + int i; + + seq_printf(seq, "enabled: %i\n", READ_ONCE(kfence_enabled)); + for (i = 0; i < KFENCE_COUNTER_COUNT; i++) + seq_printf(seq, "%s: %ld\n", counter_names[i], atomic_long_read(&counters[i])); + + return 0; +} +DEFINE_SHOW_ATTRIBUTE(stats); + +/* + * debugfs seq_file operations for /sys/kernel/debug/kfence/objects. + * start_object() and next_object() return the object index + 1, because NULL is used + * to stop iteration. + */ +static void *start_object(struct seq_file *seq, loff_t *pos) +{ + if (*pos < CONFIG_KFENCE_NUM_OBJECTS) + return (void *)((long)*pos + 1); + return NULL; +} + +static void stop_object(struct seq_file *seq, void *v) +{ +} + +static void *next_object(struct seq_file *seq, void *v, loff_t *pos) +{ + ++*pos; + if (*pos < CONFIG_KFENCE_NUM_OBJECTS) + return (void *)((long)*pos + 1); + return NULL; +} + +static int show_object(struct seq_file *seq, void *v) +{ + struct kfence_metadata *meta = &kfence_metadata[(long)v - 1]; + unsigned long flags; + + raw_spin_lock_irqsave(&meta->lock, flags); + kfence_print_object(seq, meta); + raw_spin_unlock_irqrestore(&meta->lock, flags); + seq_puts(seq, "---------------------------------\n"); + + return 0; +} + +static const struct seq_operations object_seqops = { + .start = start_object, + .next = next_object, + .stop = stop_object, + .show = show_object, +}; + +static int open_objects(struct inode *inode, struct file *file) +{ + return seq_open(file, &object_seqops); +} + +static const struct file_operations objects_fops = { + .open = open_objects, + .read = seq_read, + .llseek = seq_lseek, +}; + +static int __init kfence_debugfs_init(void) +{ + struct dentry *kfence_dir = debugfs_create_dir("kfence", NULL); + + debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops); + debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops); + return 0; +} + +late_initcall(kfence_debugfs_init); + +/* === Allocation Gate Timer ================================================ */ + +/* + * Set up delayed work, which will enable and disable the static key. We need to + * use a work queue (rather than a simple timer), since enabling and disabling a + * static key cannot be done from an interrupt. + * + * Note: Toggling a static branch currently causes IPIs, and here we'll end up + * with a total of 2 IPIs to all CPUs. If this ends up a problem in future (with + * more aggressive sampling intervals), we could get away with a variant that + * avoids IPIs, at the cost of not immediately capturing allocations if the + * instructions remain cached. + */ +static struct delayed_work kfence_timer; +static void toggle_allocation_gate(struct work_struct *work) +{ + if (!READ_ONCE(kfence_enabled)) + return; + + /* Enable static key, and await allocation to happen. */ + atomic_set(&kfence_allocation_gate, 0); +#ifdef CONFIG_KFENCE_STATIC_KEYS + static_branch_enable(&kfence_allocation_key); + /* + * Await an allocation. Timeout after 1 second, in case the kernel stops + * doing allocations, to avoid stalling this worker task for too long. + */ + { + unsigned long end_wait = jiffies + HZ; + + do { + set_current_state(TASK_UNINTERRUPTIBLE); + if (atomic_read(&kfence_allocation_gate) != 0) + break; + schedule_timeout(1); + } while (time_before(jiffies, end_wait)); + __set_current_state(TASK_RUNNING); + } + /* Disable static key and reset timer. */ + static_branch_disable(&kfence_allocation_key); +#endif + schedule_delayed_work(&kfence_timer, msecs_to_jiffies(kfence_sample_interval)); +} +static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate); + +/* === Public interface ===================================================== */ + +void __init kfence_alloc_pool(void) +{ + if (!kfence_sample_interval) + return; + + __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); + + if (!__kfence_pool) + pr_err("failed to allocate pool\n"); +} + +void __init kfence_init(void) +{ + /* Setting kfence_sample_interval to 0 on boot disables KFENCE. */ + if (!kfence_sample_interval) + return; + + if (!kfence_init_pool()) { + pr_err("%s failed\n", __func__); + return; + } + + WRITE_ONCE(kfence_enabled, true); + schedule_delayed_work(&kfence_timer, 0); + pr_info("initialized - using %lu bytes for %d objects", KFENCE_POOL_SIZE, + CONFIG_KFENCE_NUM_OBJECTS); + if (IS_ENABLED(CONFIG_DEBUG_KERNEL)) + pr_cont(" at 0x%px-0x%px\n", (void *)__kfence_pool, + (void *)(__kfence_pool + KFENCE_POOL_SIZE)); + else + pr_cont("\n"); +} + +void kfence_shutdown_cache(struct kmem_cache *s) +{ + unsigned long flags; + struct kfence_metadata *meta; + int i; + + for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + bool in_use; + + meta = &kfence_metadata[i]; + + /* + * If we observe some inconsistent cache and state pair where we + * should have returned false here, cache destruction is racing + * with either kmem_cache_alloc() or kmem_cache_free(). Taking + * the lock will not help, as different critical section + * serialization will have the same outcome. + */ + if (READ_ONCE(meta->cache) != s || + READ_ONCE(meta->state) != KFENCE_OBJECT_ALLOCATED) + continue; + + raw_spin_lock_irqsave(&meta->lock, flags); + in_use = meta->cache == s && meta->state == KFENCE_OBJECT_ALLOCATED; + raw_spin_unlock_irqrestore(&meta->lock, flags); + + if (in_use) { + /* + * This cache still has allocations, and we should not + * release them back into the freelist so they can still + * safely be used and retain the kernel's default + * behaviour of keeping the allocations alive (leak the + * cache); however, they effectively become "zombie + * allocations" as the KFENCE objects are the only ones + * still in use and the owning cache is being destroyed. + * + * We mark them freed, so that any subsequent use shows + * more useful error messages that will include stack + * traces of the user of the object, the original + * allocation, and caller to shutdown_cache(). + */ + kfence_guarded_free((void *)meta->addr, meta, /*zombie=*/true); + } + } + + for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { + meta = &kfence_metadata[i]; + + /* See above. */ + if (READ_ONCE(meta->cache) != s || READ_ONCE(meta->state) != KFENCE_OBJECT_FREED) + continue; + + raw_spin_lock_irqsave(&meta->lock, flags); + if (meta->cache == s && meta->state == KFENCE_OBJECT_FREED) + meta->cache = NULL; + raw_spin_unlock_irqrestore(&meta->lock, flags); + } +} + +void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) +{ + /* + * allocation_gate only needs to become non-zero, so it doesn't make + * sense to continue writing to it and pay the associated contention + * cost, in case we have a large number of concurrent allocations. + */ + if (atomic_read(&kfence_allocation_gate) || atomic_inc_return(&kfence_allocation_gate) > 1) + return NULL; + + if (!READ_ONCE(kfence_enabled)) + return NULL; + + if (size > PAGE_SIZE) + return NULL; + + return kfence_guarded_alloc(s, size, flags); +} + +size_t kfence_ksize(const void *addr) +{ + const struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); + + /* + * Read locklessly -- if there is a race with __kfence_alloc(), this is + * either a use-after-free or invalid access. + */ + return meta ? meta->size : 0; +} + +void *kfence_object_start(const void *addr) +{ + const struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); + + /* + * Read locklessly -- if there is a race with __kfence_alloc(), this is + * either a use-after-free or invalid access. + */ + return meta ? (void *)meta->addr : NULL; +} + +void __kfence_free(void *addr) +{ + struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); + + /* + * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing + * the object, as the object page may be recycled for other-typed + * objects once it has been freed. meta->cache may be NULL if the cache + * was destroyed. + */ + if (unlikely(meta->cache && (meta->cache->flags & SLAB_TYPESAFE_BY_RCU))) + call_rcu(&meta->rcu_head, rcu_guarded_free); + else + kfence_guarded_free(addr, meta, false); +} + +bool kfence_handle_page_fault(unsigned long addr) +{ + const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE; + struct kfence_metadata *to_report = NULL; + enum kfence_error_type error_type; + unsigned long flags; + + if (!is_kfence_address((void *)addr)) + return false; + + if (!READ_ONCE(kfence_enabled)) /* If disabled at runtime ... */ + return kfence_unprotect(addr); /* ... unprotect and proceed. */ + + atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); + + if (page_index % 2) { + /* This is a redzone, report a buffer overflow. */ + struct kfence_metadata *meta; + int distance = 0; + + meta = addr_to_metadata(addr - PAGE_SIZE); + if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) { + to_report = meta; + /* Data race ok; distance calculation approximate. */ + distance = addr - data_race(meta->addr + meta->size); + } + + meta = addr_to_metadata(addr + PAGE_SIZE); + if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) { + /* Data race ok; distance calculation approximate. */ + if (!to_report || distance > data_race(meta->addr) - addr) + to_report = meta; + } + + if (!to_report) + goto out; + + raw_spin_lock_irqsave(&to_report->lock, flags); + to_report->unprotected_page = addr; + error_type = KFENCE_ERROR_OOB; + + /* + * If the object was freed before we took the look we can still + * report this as an OOB -- the report will simply show the + * stacktrace of the free as well. + */ + } else { + to_report = addr_to_metadata(addr); + if (!to_report) + goto out; + + raw_spin_lock_irqsave(&to_report->lock, flags); + error_type = KFENCE_ERROR_UAF; + /* + * We may race with __kfence_alloc(), and it is possible that a + * freed object may be reallocated. We simply report this as a + * use-after-free, with the stack trace showing the place where + * the object was re-allocated. + */ + } + +out: + if (to_report) { + kfence_report_error(addr, to_report, error_type); + raw_spin_unlock_irqrestore(&to_report->lock, flags); + } else { + /* This may be a UAF or OOB access, but we can't be sure. */ + kfence_report_error(addr, NULL, KFENCE_ERROR_INVALID); + } + + return kfence_unprotect(addr); /* Unprotect and let access proceed. */ +} diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h new file mode 100644 index 000000000000..1014060f9707 --- /dev/null +++ b/mm/kfence/kfence.h @@ -0,0 +1,113 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Kernel Electric-Fence (KFENCE). For more info please see + * Documentation/dev-tools/kfence.rst. + * + * Copyright (C) 2020, Google LLC. + */ + +#ifndef MM_KFENCE_KFENCE_H +#define MM_KFENCE_KFENCE_H + +#include <linux/mm.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/types.h> + +#include "../slab.h" /* for struct kmem_cache */ + +/* For non-debug builds, avoid leaking kernel pointers into dmesg. */ +#ifdef CONFIG_DEBUG_KERNEL +#define PTR_FMT "%px" +#else +#define PTR_FMT "%p" +#endif + +/* + * Get the canary byte pattern for @addr. Use a pattern that varies based on the + * lower 3 bits of the address, to detect memory corruptions with higher + * probability, where similar constants are used. + */ +#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) + +/* Maximum stack depth for reports. */ +#define KFENCE_STACK_DEPTH 64 + +/* KFENCE object states. */ +enum kfence_object_state { + KFENCE_OBJECT_UNUSED, /* Object is unused. */ + KFENCE_OBJECT_ALLOCATED, /* Object is currently allocated. */ + KFENCE_OBJECT_FREED, /* Object was allocated, and then freed. */ +}; + +/* Alloc/free tracking information. */ +struct kfence_track { + pid_t pid; + int num_stack_entries; + unsigned long stack_entries[KFENCE_STACK_DEPTH]; +}; + +/* KFENCE metadata per guarded allocation. */ +struct kfence_metadata { + struct list_head list; /* Freelist node; access under kfence_freelist_lock. */ + struct rcu_head rcu_head; /* For delayed freeing. */ + + /* + * Lock protecting below data; to ensure consistency of the below data, + * since the following may execute concurrently: __kfence_alloc(), + * __kfence_free(), kfence_handle_page_fault(). However, note that we + * cannot grab the same metadata off the freelist twice, and multiple + * __kfence_alloc() cannot run concurrently on the same metadata. + */ + raw_spinlock_t lock; + + /* The current state of the object; see above. */ + enum kfence_object_state state; + + /* + * Allocated object address; cannot be calculated from size, because of + * alignment requirements. + * + * Invariant: ALIGN_DOWN(addr, PAGE_SIZE) is constant. + */ + unsigned long addr; + + /* + * The size of the original allocation. + */ + size_t size; + + /* + * The kmem_cache cache of the last allocation; NULL if never allocated + * or the cache has already been destroyed. + */ + struct kmem_cache *cache; + + /* + * In case of an invalid access, the page that was unprotected; we + * optimistically only store one address. + */ + unsigned long unprotected_page; + + /* Allocation and free stack information. */ + struct kfence_track alloc_track; + struct kfence_track free_track; +}; + +extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; + +/* KFENCE error types for report generation. */ +enum kfence_error_type { + KFENCE_ERROR_OOB, /* Detected a out-of-bounds access. */ + KFENCE_ERROR_UAF, /* Detected a use-after-free access. */ + KFENCE_ERROR_CORRUPTION, /* Detected a memory corruption on free. */ + KFENCE_ERROR_INVALID, /* Invalid access of unknown type. */ + KFENCE_ERROR_INVALID_FREE, /* Invalid free. */ +}; + +void kfence_report_error(unsigned long address, const struct kfence_metadata *meta, + enum kfence_error_type type); + +void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta); + +#endif /* MM_KFENCE_KFENCE_H */ diff --git a/mm/kfence/report.c b/mm/kfence/report.c new file mode 100644 index 000000000000..64f27c8d46a3 --- /dev/null +++ b/mm/kfence/report.c @@ -0,0 +1,240 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KFENCE reporting. + * + * Copyright (C) 2020, Google LLC. + */ + +#include <stdarg.h> + +#include <linux/kernel.h> +#include <linux/lockdep.h> +#include <linux/printk.h> +#include <linux/seq_file.h> +#include <linux/stacktrace.h> +#include <linux/string.h> + +#include <asm/kfence.h> + +#include "kfence.h" + +/* Helper function to either print to a seq_file or to console. */ +__printf(2, 3) +static void seq_con_printf(struct seq_file *seq, const char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + if (seq) + seq_vprintf(seq, fmt, args); + else + vprintk(fmt, args); + va_end(args); +} + +/* + * Get the number of stack entries to skip to get out of MM internals. @type is + * optional, and if set to NULL, assumes an allocation or free stack. + */ +static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries, + const enum kfence_error_type *type) +{ + char buf[64]; + int skipnr, fallback = 0; + bool is_access_fault = false; + + if (type) { + /* Depending on error type, find different stack entries. */ + switch (*type) { + case KFENCE_ERROR_UAF: + case KFENCE_ERROR_OOB: + case KFENCE_ERROR_INVALID: + is_access_fault = true; + break; + case KFENCE_ERROR_CORRUPTION: + case KFENCE_ERROR_INVALID_FREE: + break; + } + } + + for (skipnr = 0; skipnr < num_entries; skipnr++) { + int len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skipnr]); + + if (is_access_fault) { + if (!strncmp(buf, KFENCE_SKIP_ARCH_FAULT_HANDLER, len)) + goto found; + } else { + if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") || + !strncmp(buf, "__slab_free", len)) { + /* + * In case of tail calls from any of the below + * to any of the above. + */ + fallback = skipnr + 1; + } + + /* Also the *_bulk() variants by only checking prefixes. */ + if (str_has_prefix(buf, "kfree") || + str_has_prefix(buf, "kmem_cache_free") || + str_has_prefix(buf, "__kmalloc") || + str_has_prefix(buf, "kmem_cache_alloc")) + goto found; + } + } + if (fallback < num_entries) + return fallback; +found: + skipnr++; + return skipnr < num_entries ? skipnr : 0; +} + +static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadata *meta, + bool show_alloc) +{ + const struct kfence_track *track = show_alloc ? &meta->alloc_track : &meta->free_track; + + if (track->num_stack_entries) { + /* Skip allocation/free internals stack. */ + int i = get_stack_skipnr(track->stack_entries, track->num_stack_entries, NULL); + + /* stack_trace_seq_print() does not exist; open code our own. */ + for (; i < track->num_stack_entries; i++) + seq_con_printf(seq, " %pS\n", (void *)track->stack_entries[i]); + } else { + seq_con_printf(seq, " no %s stack\n", show_alloc ? "allocation" : "deallocation"); + } +} + +void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta) +{ + const int size = abs(meta->size); + const unsigned long start = meta->addr; + const struct kmem_cache *const cache = meta->cache; + + lockdep_assert_held(&meta->lock); + + if (meta->state == KFENCE_OBJECT_UNUSED) { + seq_con_printf(seq, "kfence-#%zd unused\n", meta - kfence_metadata); + return; + } + + seq_con_printf(seq, + "kfence-#%zd [0x" PTR_FMT "-0x" PTR_FMT + ", size=%d, cache=%s] allocated by task %d:\n", + meta - kfence_metadata, (void *)start, (void *)(start + size - 1), size, + (cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid); + kfence_print_stack(seq, meta, true); + + if (meta->state == KFENCE_OBJECT_FREED) { + seq_con_printf(seq, "\nfreed by task %d:\n", meta->free_track.pid); + kfence_print_stack(seq, meta, false); + } +} + +/* + * Show bytes at @addr that are different from the expected canary values, up to + * @max_bytes. + */ +static void print_diff_canary(unsigned long address, size_t bytes_to_show, + const struct kfence_metadata *meta) +{ + const unsigned long show_until_addr = address + bytes_to_show; + const u8 *cur, *end; + + /* Do not show contents of object nor read into following guard page. */ + end = (const u8 *)(address < meta->addr ? min(show_until_addr, meta->addr) + : min(show_until_addr, PAGE_ALIGN(address))); + + pr_cont("["); + for (cur = (const u8 *)address; cur < end; cur++) { + if (*cur == KFENCE_CANARY_PATTERN(cur)) + pr_cont(" ."); + else if (IS_ENABLED(CONFIG_DEBUG_KERNEL)) + pr_cont(" 0x%02x", *cur); + else /* Do not leak kernel memory in non-debug builds. */ + pr_cont(" !"); + } + pr_cont(" ]"); +} + +void kfence_report_error(unsigned long address, const struct kfence_metadata *meta, + enum kfence_error_type type) +{ + unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 }; + int num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1); + int skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type); + const ptrdiff_t object_index = meta ? meta - kfence_metadata : -1; + + /* Require non-NULL meta, except if KFENCE_ERROR_INVALID. */ + if (WARN_ON(type != KFENCE_ERROR_INVALID && !meta)) + return; + + if (meta) + lockdep_assert_held(&meta->lock); + /* + * Because we may generate reports in printk-unfriendly parts of the + * kernel, such as scheduler code, the use of printk() could deadlock. + * Until such time that all printing code here is safe in all parts of + * the kernel, accept the risk, and just get our message out (given the + * system might already behave unpredictably due to the memory error). + * As such, also disable lockdep to hide warnings, and avoid disabling + * lockdep for the rest of the kernel. + */ + lockdep_off(); + + pr_err("==================================================================\n"); + /* Print report header. */ + switch (type) { + case KFENCE_ERROR_OOB: { + const bool left_of_object = address < meta->addr; + + pr_err("BUG: KFENCE: out-of-bounds in %pS\n\n", (void *)stack_entries[skipnr]); + pr_err("Out-of-bounds access at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n", + (void *)address, + left_of_object ? meta->addr - address : address - meta->addr, + left_of_object ? "left" : "right", object_index); + break; + } + case KFENCE_ERROR_UAF: + pr_err("BUG: KFENCE: use-after-free in %pS\n\n", (void *)stack_entries[skipnr]); + pr_err("Use-after-free access at 0x" PTR_FMT " (in kfence-#%zd):\n", + (void *)address, object_index); + break; + case KFENCE_ERROR_CORRUPTION: + pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]); + pr_err("Corrupted memory at 0x" PTR_FMT " ", (void *)address); + print_diff_canary(address, 16, meta); + pr_cont(" (in kfence-#%zd):\n", object_index); + break; + case KFENCE_ERROR_INVALID: + pr_err("BUG: KFENCE: invalid access in %pS\n\n", (void *)stack_entries[skipnr]); + pr_err("Invalid access at 0x" PTR_FMT ":\n", (void *)address); + break; + case KFENCE_ERROR_INVALID_FREE: + pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]); + pr_err("Invalid free of 0x" PTR_FMT " (in kfence-#%zd):\n", (void *)address, + object_index); + break; + } + + /* Print stack trace and object info. */ + stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr, 0); + + if (meta) { + pr_err("\n"); + kfence_print_object(NULL, meta); + } + + /* Print report footer. */ + pr_err("\n"); + dump_stack_print_info(KERN_ERR); + pr_err("==================================================================\n"); + + lockdep_on(); + + if (panic_on_warn) + panic("panic_on_warn set ...\n"); + + /* We encountered a memory unsafety error, taint the kernel! */ + add_taint(TAINT_BAD_PAGE, LOCKDEP_STILL_OK); +}
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit 1dc0da6e9ec0f8d735756374697912cd50f402cf category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Add architecture specific implementation details for KFENCE and enable KFENCE for the x86 architecture. In particular, this implements the required interface in <asm/kfence.h> for setting up the pool and providing helper functions for protecting and unprotecting pages.
For x86, we need to ensure that the pool uses 4K pages, which is done using the set_memory_4k() helper function.
[elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com Signed-off-by: Marco Elver elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Co-developed-by: Marco Elver elver@google.com Reviewed-by: Jann Horn jannh@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- arch/x86/Kconfig | 1 + arch/x86/include/asm/kfence.h | 70 +++++++++++++++++++++++++++++++++++ arch/x86/mm/fault.c | 5 +++ 3 files changed, 76 insertions(+) create mode 100644 arch/x86/include/asm/kfence.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index af7123b5b493..37a7b2c8fdd6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -148,6 +148,7 @@ config X86 select HAVE_ARCH_JUMP_LABEL_RELATIVE select HAVE_ARCH_KASAN if X86_64 select HAVE_ARCH_KASAN_VMALLOC if X86_64 + select HAVE_ARCH_KFENCE select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h new file mode 100644 index 000000000000..a0659dbd93ea --- /dev/null +++ b/arch/x86/include/asm/kfence.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * x86 KFENCE support. + * + * Copyright (C) 2020, Google LLC. + */ + +#ifndef _ASM_X86_KFENCE_H +#define _ASM_X86_KFENCE_H + +#include <linux/bug.h> +#include <linux/kfence.h> + +#include <asm/pgalloc.h> +#include <asm/pgtable.h> +#include <asm/set_memory.h> +#include <asm/tlbflush.h> + +/* + * The page fault handler entry function, up to which the stack trace is + * truncated in reports. + */ +#define KFENCE_SKIP_ARCH_FAULT_HANDLER "asm_exc_page_fault" + +/* Force 4K pages for __kfence_pool. */ +static inline bool arch_kfence_init_pool(void) +{ + unsigned long addr; + + for (addr = (unsigned long)__kfence_pool; is_kfence_address((void *)addr); + addr += PAGE_SIZE) { + unsigned int level; + + if (!lookup_address(addr, &level)) + return false; + + if (level != PG_LEVEL_4K) + set_memory_4k(addr, 1); + } + + return true; +} + +/* Protect the given page and flush TLB. */ +static inline bool kfence_protect_page(unsigned long addr, bool protect) +{ + unsigned int level; + pte_t *pte = lookup_address(addr, &level); + + if (WARN_ON(!pte || level != PG_LEVEL_4K)) + return false; + + /* + * We need to avoid IPIs, as we may get KFENCE allocations or faults + * with interrupts disabled. Therefore, the below is best-effort, and + * does not flush TLBs on all CPUs. We can tolerate some inaccuracy; + * lazy fault handling takes care of faults after the page is PRESENT. + */ + + if (protect) + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); + else + set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); + + /* Flush this CPU's TLB. */ + flush_tlb_one_kernel(addr); + return true; +} + +#endif /* _ASM_X86_KFENCE_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9c1545c376e9..c64e44f0578e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -9,6 +9,7 @@ #include <linux/kdebug.h> /* oops_begin/end, ... */ #include <linux/extable.h> /* search_exception_tables */ #include <linux/memblock.h> /* max_low_pfn */ +#include <linux/kfence.h> /* kfence_handle_page_fault */ #include <linux/kprobes.h> /* NOKPROBE_SYMBOL, ... */ #include <linux/mmiotrace.h> /* kmmio_handler, ... */ #include <linux/perf_event.h> /* perf_sw_event */ @@ -732,6 +733,10 @@ no_context(struct pt_regs *regs, unsigned long error_code, if (IS_ENABLED(CONFIG_EFI)) efi_recover_from_page_fault(address);
+ /* Only not-present faults should be handled by KFENCE. */ + if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address)) + return; + oops: /* * Oops. The kernel tried to access some bad page. We'll have to
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc1 commit 840b239863449f27bf7522deb81e6746fbfbfeaf category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Add architecture specific implementation details for KFENCE and enable KFENCE for the arm64 architecture. In particular, this implements the required interface in <asm/kfence.h>.
KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the entire linear map to be mapped at page granularity. Doing so may result in extra memory allocated for page tables in case rodata=full is not set; however, currently CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case is therefore not affected by this change.
[elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Co-developed-by: Alexander Potapenko glider@google.com Reviewed-by: Jann Horn jannh@google.com Reviewed-by: Mark Rutland mark.rutland@arm.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: arch/arm64/Kconfig arch/arm64/mm/mmu.c [Peng Liu: cherry-pick from 840b239863449f27bf7522deb81e6746fbfbfeaf] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/kfence.h | 24 ++++++++++++++++++++++++ arch/arm64/mm/fault.c | 4 ++++ arch/arm64/mm/mmu.c | 8 +++++++- 4 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/kfence.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a2380374ef59..0b58ad078a34 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -138,6 +138,7 @@ config ARM64 select HAVE_ARCH_JUMP_LABEL_RELATIVE select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48) select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN + select HAVE_ARCH_KFENCE select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h new file mode 100644 index 000000000000..42a06f83850a --- /dev/null +++ b/arch/arm64/include/asm/kfence.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * arm64 KFENCE support. + * + * Copyright (C) 2020, Google LLC. + */ + +#ifndef __ASM_KFENCE_H +#define __ASM_KFENCE_H + +#include <asm/cacheflush.h> + +#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync" + +static inline bool arch_kfence_init_pool(void) { return true; } + +static inline bool kfence_protect_page(unsigned long addr, bool protect) +{ + set_memory_valid(addr, 1, !protect); + + return true; +} + +#endif /* __ASM_KFENCE_H */ diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 795d224f184f..d1a8b523d28e 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -10,6 +10,7 @@ #include <linux/acpi.h> #include <linux/bitfield.h> #include <linux/extable.h> +#include <linux/kfence.h> #include <linux/signal.h> #include <linux/mm.h> #include <linux/hardirq.h> @@ -322,6 +323,9 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr, } else if (addr < PAGE_SIZE) { msg = "NULL pointer dereference"; } else { + if (kfence_handle_page_fault(addr)) + return; + msg = "paging request"; }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 1ec57fd87ef7..347c929fb3a7 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1452,7 +1452,13 @@ int arch_add_memory(int nid, u64 start, u64 size, return -EINVAL; }
- if (rodata_full || debug_pagealloc_enabled()) + + /* + * KFENCE requires linear map to be mapped at page granularity, so that + * it is possible to protect/unprotect single pages in the KFENCE pool. + */ + if (rodata_full || debug_pagealloc_enabled() || + IS_ENABLED(CONFIG_KFENCE)) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc1 commit d438fabce7860df3cb9337776be6f90b59ced8ed category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly.
Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace.
If the kernel is a DEBUG_KERNEL, also show registers for more information.
Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Suggested-by: Mark Rutland mark.rutland@arm.com Acked-by: Mark Rutland mark.rutland@arm.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- arch/arm64/include/asm/kfence.h | 2 -- arch/arm64/mm/fault.c | 2 +- arch/x86/include/asm/kfence.h | 6 ---- arch/x86/mm/fault.c | 2 +- include/linux/kfence.h | 5 +-- mm/kfence/core.c | 10 +++--- mm/kfence/kfence.h | 4 +-- mm/kfence/report.c | 63 +++++++++++++++++++-------------- 8 files changed, 48 insertions(+), 46 deletions(-)
diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h index 42a06f83850a..d061176d57ea 100644 --- a/arch/arm64/include/asm/kfence.h +++ b/arch/arm64/include/asm/kfence.h @@ -10,8 +10,6 @@
#include <asm/cacheflush.h>
-#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync" - static inline bool arch_kfence_init_pool(void) { return true; }
static inline bool kfence_protect_page(unsigned long addr, bool protect) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index d1a8b523d28e..a5fe9f7b423b 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -323,7 +323,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr, } else if (addr < PAGE_SIZE) { msg = "NULL pointer dereference"; } else { - if (kfence_handle_page_fault(addr)) + if (kfence_handle_page_fault(addr, regs)) return;
msg = "paging request"; diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h index a0659dbd93ea..97bbb4a9083a 100644 --- a/arch/x86/include/asm/kfence.h +++ b/arch/x86/include/asm/kfence.h @@ -16,12 +16,6 @@ #include <asm/set_memory.h> #include <asm/tlbflush.h>
-/* - * The page fault handler entry function, up to which the stack trace is - * truncated in reports. - */ -#define KFENCE_SKIP_ARCH_FAULT_HANDLER "asm_exc_page_fault" - /* Force 4K pages for __kfence_pool. */ static inline bool arch_kfence_init_pool(void) { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index c64e44f0578e..b2688e8e2faf 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -734,7 +734,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, efi_recover_from_page_fault(address);
/* Only not-present faults should be handled by KFENCE. */ - if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address)) + if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address, regs)) return;
oops: diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 81f3911cb298..5a56bcf5606c 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -186,6 +186,7 @@ static __always_inline __must_check bool kfence_free(void *addr) /** * kfence_handle_page_fault() - perform page fault handling for KFENCE pages * @addr: faulting address + * @regs: current struct pt_regs (can be NULL, but shows full stack trace) * * Return: * * false - address outside KFENCE pool, @@ -196,7 +197,7 @@ static __always_inline __must_check bool kfence_free(void *addr) * cases KFENCE prints an error message and marks the offending page as * present, so that the kernel can proceed. */ -bool __must_check kfence_handle_page_fault(unsigned long addr); +bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs);
#else /* CONFIG_KFENCE */
@@ -209,7 +210,7 @@ static inline size_t kfence_ksize(const void *addr) { return 0; } static inline void *kfence_object_start(const void *addr) { return NULL; } static inline void __kfence_free(void *addr) { } static inline bool __must_check kfence_free(void *addr) { return false; } -static inline bool __must_check kfence_handle_page_fault(unsigned long addr) { return false; } +static inline bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs) { return false; }
#endif
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index d6a32c13336b..61c76670a7a9 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -216,7 +216,7 @@ static inline bool check_canary_byte(u8 *addr) return true;
atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); - kfence_report_error((unsigned long)addr, addr_to_metadata((unsigned long)addr), + kfence_report_error((unsigned long)addr, NULL, addr_to_metadata((unsigned long)addr), KFENCE_ERROR_CORRUPTION); return false; } @@ -351,7 +351,7 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) { /* Invalid or double-free, bail out. */ atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); - kfence_report_error((unsigned long)addr, meta, KFENCE_ERROR_INVALID_FREE); + kfence_report_error((unsigned long)addr, NULL, meta, KFENCE_ERROR_INVALID_FREE); raw_spin_unlock_irqrestore(&meta->lock, flags); return; } @@ -766,7 +766,7 @@ void __kfence_free(void *addr) kfence_guarded_free(addr, meta, false); }
-bool kfence_handle_page_fault(unsigned long addr) +bool kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs) { const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE; struct kfence_metadata *to_report = NULL; @@ -829,11 +829,11 @@ bool kfence_handle_page_fault(unsigned long addr)
out: if (to_report) { - kfence_report_error(addr, to_report, error_type); + kfence_report_error(addr, regs, to_report, error_type); raw_spin_unlock_irqrestore(&to_report->lock, flags); } else { /* This may be a UAF or OOB access, but we can't be sure. */ - kfence_report_error(addr, NULL, KFENCE_ERROR_INVALID); + kfence_report_error(addr, regs, NULL, KFENCE_ERROR_INVALID); }
return kfence_unprotect(addr); /* Unprotect and let access proceed. */ diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 1014060f9707..0d83e628a97d 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -105,8 +105,8 @@ enum kfence_error_type { KFENCE_ERROR_INVALID_FREE, /* Invalid free. */ };
-void kfence_report_error(unsigned long address, const struct kfence_metadata *meta, - enum kfence_error_type type); +void kfence_report_error(unsigned long address, struct pt_regs *regs, + const struct kfence_metadata *meta, enum kfence_error_type type);
void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta);
diff --git a/mm/kfence/report.c b/mm/kfence/report.c index 64f27c8d46a3..4dbfa9a382e4 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -10,6 +10,7 @@ #include <linux/kernel.h> #include <linux/lockdep.h> #include <linux/printk.h> +#include <linux/sched/debug.h> #include <linux/seq_file.h> #include <linux/stacktrace.h> #include <linux/string.h> @@ -41,7 +42,6 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries { char buf[64]; int skipnr, fallback = 0; - bool is_access_fault = false;
if (type) { /* Depending on error type, find different stack entries. */ @@ -49,8 +49,12 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries case KFENCE_ERROR_UAF: case KFENCE_ERROR_OOB: case KFENCE_ERROR_INVALID: - is_access_fault = true; - break; + /* + * kfence_handle_page_fault() may be called with pt_regs + * set to NULL; in that case we'll simply show the full + * stack trace. + */ + return 0; case KFENCE_ERROR_CORRUPTION: case KFENCE_ERROR_INVALID_FREE: break; @@ -60,26 +64,21 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries for (skipnr = 0; skipnr < num_entries; skipnr++) { int len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skipnr]);
- if (is_access_fault) { - if (!strncmp(buf, KFENCE_SKIP_ARCH_FAULT_HANDLER, len)) - goto found; - } else { - if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") || - !strncmp(buf, "__slab_free", len)) { - /* - * In case of tail calls from any of the below - * to any of the above. - */ - fallback = skipnr + 1; - } - - /* Also the *_bulk() variants by only checking prefixes. */ - if (str_has_prefix(buf, "kfree") || - str_has_prefix(buf, "kmem_cache_free") || - str_has_prefix(buf, "__kmalloc") || - str_has_prefix(buf, "kmem_cache_alloc")) - goto found; + if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") || + !strncmp(buf, "__slab_free", len)) { + /* + * In case of tail calls from any of the below + * to any of the above. + */ + fallback = skipnr + 1; } + + /* Also the *_bulk() variants by only checking prefixes. */ + if (str_has_prefix(buf, "kfree") || + str_has_prefix(buf, "kmem_cache_free") || + str_has_prefix(buf, "__kmalloc") || + str_has_prefix(buf, "kmem_cache_alloc")) + goto found; } if (fallback < num_entries) return fallback; @@ -157,13 +156,20 @@ static void print_diff_canary(unsigned long address, size_t bytes_to_show, pr_cont(" ]"); }
-void kfence_report_error(unsigned long address, const struct kfence_metadata *meta, - enum kfence_error_type type) +void kfence_report_error(unsigned long address, struct pt_regs *regs, + const struct kfence_metadata *meta, enum kfence_error_type type) { unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 }; - int num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1); - int skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type); const ptrdiff_t object_index = meta ? meta - kfence_metadata : -1; + int num_stack_entries; + int skipnr = 0; + + if (regs) { + num_stack_entries = stack_trace_save_regs(regs, stack_entries, KFENCE_STACK_DEPTH, 0); + } else { + num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1); + skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type); + }
/* Require non-NULL meta, except if KFENCE_ERROR_INVALID. */ if (WARN_ON(type != KFENCE_ERROR_INVALID && !meta)) @@ -227,7 +233,10 @@ void kfence_report_error(unsigned long address, const struct kfence_metadata *me
/* Print report footer. */ pr_err("\n"); - dump_stack_print_info(KERN_ERR); + if (IS_ENABLED(CONFIG_DEBUG_KERNEL) && regs) + show_regs(regs); + else + dump_stack_print_info(KERN_ERR); pr_err("==================================================================\n");
lockdep_on();
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit d3fb45f370d927224af35d22d34ea465884afec8 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Inserts KFENCE hooks into the SLAB allocator.
To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE.
Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into.
When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline.
Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com Signed-off-by: Marco Elver elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Co-developed-by: Marco Elver elver@google.com
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Jann Horn jannh@google.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/slab.c [Peng Liu: cherry-pick from d3fb45f370d927224af35d22d34ea465884afec8] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- include/linux/slab_def.h | 3 +++ mm/kfence/core.c | 2 ++ mm/slab.c | 39 ++++++++++++++++++++++++++++++--------- mm/slab_common.c | 5 ++++- 4 files changed, 39 insertions(+), 10 deletions(-)
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 9eb430c163c2..3aa5e1e73ab6 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -2,6 +2,7 @@ #ifndef _LINUX_SLAB_DEF_H #define _LINUX_SLAB_DEF_H
+#include <linux/kfence.h> #include <linux/reciprocal_div.h>
/* @@ -114,6 +115,8 @@ static inline unsigned int obj_to_index(const struct kmem_cache *cache, static inline int objs_per_slab_page(const struct kmem_cache *cache, const struct page *page) { + if (is_kfence_address(page_address(page))) + return 1; return cache->num; }
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 61c76670a7a9..05c18aa11851 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -317,6 +317,8 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g /* Set required struct page fields. */ page = virt_to_page(meta->addr); page->slab_cache = cache; + if (IS_ENABLED(CONFIG_SLAB)) + page->s_mem = addr;
raw_spin_unlock_irqrestore(&meta->lock, flags);
diff --git a/mm/slab.c b/mm/slab.c index b2cc2cf7d8a3..0b4be897676a 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -100,6 +100,7 @@ #include <linux/seq_file.h> #include <linux/notifier.h> #include <linux/kallsyms.h> +#include <linux/kfence.h> #include <linux/cpu.h> #include <linux/sysctl.h> #include <linux/module.h> @@ -3207,7 +3208,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, }
static __always_inline void * -slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, +slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size, unsigned long caller) { unsigned long save_flags; @@ -3220,6 +3221,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, if (unlikely(!cachep)) return NULL;
+ ptr = kfence_alloc(cachep, orig_size, flags); + if (unlikely(ptr)) + goto out_hooks; + cache_alloc_debugcheck_before(cachep, flags); local_irq_save(save_flags);
@@ -3252,6 +3257,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr) memset(ptr, 0, cachep->object_size);
+out_hooks: slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr); return ptr; } @@ -3289,7 +3295,7 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags) #endif /* CONFIG_NUMA */
static __always_inline void * -slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) +slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller) { unsigned long save_flags; void *objp; @@ -3300,6 +3306,10 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) if (unlikely(!cachep)) return NULL;
+ objp = kfence_alloc(cachep, orig_size, flags); + if (unlikely(objp)) + goto out; + cache_alloc_debugcheck_before(cachep, flags); local_irq_save(save_flags); objp = __do_cache_alloc(cachep, flags); @@ -3310,6 +3320,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp) memset(objp, 0, cachep->object_size);
+out: slab_post_alloc_hook(cachep, objcg, flags, 1, &objp); return objp; } @@ -3415,6 +3426,13 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac) static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp, unsigned long caller) { + if (is_kfence_address(objp)) { + kmemleak_free_recursive(objp, cachep->flags); + __kfence_free(objp); + return; + } + + /* Put the object into the quarantine, don't touch it for now. */ if (kasan_slab_free(cachep, objp, _RET_IP_)) return; @@ -3480,7 +3498,7 @@ void ___cache_free(struct kmem_cache *cachep, void *objp, */ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) { - void *ret = slab_alloc(cachep, flags, _RET_IP_); + void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_);
trace_kmem_cache_alloc(_RET_IP_, ret, cachep->object_size, cachep->size, flags); @@ -3513,7 +3531,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
local_irq_disable(); for (i = 0; i < size; i++) { - void *objp = __do_cache_alloc(s, flags); + void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags);
if (unlikely(!objp)) goto error; @@ -3546,7 +3564,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size) { void *ret;
- ret = slab_alloc(cachep, flags, _RET_IP_); + ret = slab_alloc(cachep, flags, size, _RET_IP_);
ret = kasan_kmalloc(cachep, ret, size, flags); trace_kmalloc(_RET_IP_, ret, @@ -3572,7 +3590,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace); */ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid) { - void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_); + void *ret = slab_alloc_node(cachep, flags, nodeid, cachep->object_size, _RET_IP_);
trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep->object_size, cachep->size, @@ -3590,7 +3608,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep, { void *ret;
- ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_); + ret = slab_alloc_node(cachep, flags, nodeid, size, _RET_IP_);
ret = kasan_kmalloc(cachep, ret, size, flags); trace_kmalloc_node(_RET_IP_, ret, @@ -3651,7 +3669,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags, cachep = kmalloc_slab(size, flags); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; - ret = slab_alloc(cachep, flags, caller); + ret = slab_alloc(cachep, flags, size, caller);
ret = kasan_kmalloc(cachep, ret, size, flags); trace_kmalloc(caller, ret, @@ -4150,7 +4168,10 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page, BUG_ON(objnr >= cachep->num);
/* Find offset within object. */ - offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep); + if (is_kfence_address(ptr)) + offset = ptr - kfence_object_start(ptr); + else + offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
/* Allow address range falling entirely within usercopy region. */ if (offset >= cachep->useroffset && diff --git a/mm/slab_common.c b/mm/slab_common.c index 79ae785f2c9f..89022933dfa6 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -12,6 +12,7 @@ #include <linux/memory.h> #include <linux/cache.h> #include <linux/compiler.h> +#include <linux/kfence.h> #include <linux/module.h> #include <linux/cpu.h> #include <linux/uaccess.h> @@ -428,6 +429,7 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work) rcu_barrier();
list_for_each_entry_safe(s, s2, &to_destroy, list) { + kfence_shutdown_cache(s); #ifdef SLAB_SUPPORTS_SYSFS sysfs_slab_release(s); #else @@ -453,6 +455,7 @@ static int shutdown_cache(struct kmem_cache *s) list_add_tail(&s->list, &slab_caches_to_rcu_destroy); schedule_work(&slab_caches_to_rcu_destroy_work); } else { + kfence_shutdown_cache(s); #ifdef SLAB_SUPPORTS_SYSFS sysfs_slab_unlink(s); sysfs_slab_release(s); @@ -1156,7 +1159,7 @@ size_t ksize(const void *objp) if (unlikely(ZERO_OR_NULL_PTR(objp)) || !__kasan_check_read(objp, 1)) return 0;
- size = __ksize(objp); + size = kfence_ksize(objp) ?: __ksize(objp); /* * We assume that ksize callers could use whole allocated area, * so we need to unpoison this area.
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit b89fb5ef0ce611b5db8eb9d3a5a7fcaab2cbe9e4 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Inserts KFENCE hooks into the SLUB allocator.
To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE.
Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into.
When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline.
Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com Signed-off-by: Marco Elver elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Reviewed-by: Jann Horn jannh@google.com Co-developed-by: Marco Elver elver@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- include/linux/slub_def.h | 3 ++ mm/kfence/core.c | 2 ++ mm/slub.c | 60 ++++++++++++++++++++++++++++++---------- 3 files changed, 51 insertions(+), 14 deletions(-)
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 1be0ed5befa1..dcde82a4434c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -7,6 +7,7 @@ * * (C) 2007 SGI, Christoph Lameter */ +#include <linux/kfence.h> #include <linux/kobject.h> #include <linux/reciprocal_div.h>
@@ -185,6 +186,8 @@ static inline unsigned int __obj_to_index(const struct kmem_cache *cache, static inline unsigned int obj_to_index(const struct kmem_cache *cache, const struct page *page, void *obj) { + if (is_kfence_address(obj)) + return 0; return __obj_to_index(cache, page_address(page), obj); }
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 05c18aa11851..7692af715fdb 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -317,6 +317,8 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g /* Set required struct page fields. */ page = virt_to_page(meta->addr); page->slab_cache = cache; + if (IS_ENABLED(CONFIG_SLUB)) + page->objects = 1; if (IS_ENABLED(CONFIG_SLAB)) page->s_mem = addr;
diff --git a/mm/slub.c b/mm/slub.c index cf9e82282832..4d8ad3f9f8e0 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -28,6 +28,7 @@ #include <linux/ctype.h> #include <linux/debugobjects.h> #include <linux/kallsyms.h> +#include <linux/kfence.h> #include <linux/memory.h> #include <linux/math64.h> #include <linux/fault-inject.h> @@ -1560,6 +1561,11 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, void *old_tail = *tail ? *tail : *head; int rsize;
+ if (is_kfence_address(next)) { + slab_free_hook(s, next); + return true; + } + /* Head and tail of the reconstructed freelist */ *head = NULL; *tail = NULL; @@ -2801,7 +2807,7 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s, * Otherwise we can simply pick the next object from the lockless free list. */ static __always_inline void *slab_alloc_node(struct kmem_cache *s, - gfp_t gfpflags, int node, unsigned long addr) + gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) { void *object; struct kmem_cache_cpu *c; @@ -2812,6 +2818,11 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags); if (!s) return NULL; + + object = kfence_alloc(s, orig_size, gfpflags); + if (unlikely(object)) + goto out; + redo: /* * Must read kmem_cache cpu data via this cpu ptr. Preemption is @@ -2884,20 +2895,21 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object) memset(object, 0, s->object_size);
+out: slab_post_alloc_hook(s, objcg, gfpflags, 1, &object);
return object; }
static __always_inline void *slab_alloc(struct kmem_cache *s, - gfp_t gfpflags, unsigned long addr) + gfp_t gfpflags, unsigned long addr, size_t orig_size) { - return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr); + return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size); }
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) { - void *ret = slab_alloc(s, gfpflags, _RET_IP_); + void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size);
trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size, s->size, gfpflags); @@ -2909,7 +2921,7 @@ EXPORT_SYMBOL(kmem_cache_alloc); #ifdef CONFIG_TRACING void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size) { - void *ret = slab_alloc(s, gfpflags, _RET_IP_); + void *ret = slab_alloc(s, gfpflags, _RET_IP_, size); trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags); ret = kasan_kmalloc(s, ret, size, gfpflags); return ret; @@ -2920,7 +2932,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace); #ifdef CONFIG_NUMA void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node) { - void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_); + void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size);
trace_kmem_cache_alloc_node(_RET_IP_, ret, s->object_size, s->size, gfpflags, node); @@ -2934,7 +2946,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, int node, size_t size) { - void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_); + void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret, size, s->size, gfpflags, node); @@ -2968,6 +2980,9 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
stat(s, FREE_SLOWPATH);
+ if (kfence_free(head)) + return; + if (kmem_cache_debug(s) && !free_debug_processing(s, page, head, tail, cnt, addr)) return; @@ -3220,6 +3235,13 @@ int build_detached_freelist(struct kmem_cache *s, size_t size, df->s = cache_from_obj(s, object); /* Support for memcg */ }
+ if (is_kfence_address(object)) { + slab_free_hook(df->s, object); + __kfence_free(object); + p[size] = NULL; /* mark object processed */ + return size; + } + /* Start new detached freelist */ df->page = page; set_freepointer(df->s, object, NULL); @@ -3295,8 +3317,14 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, c = this_cpu_ptr(s->cpu_slab);
for (i = 0; i < size; i++) { - void *object = c->freelist; + void *object = kfence_alloc(s, s->object_size, flags);
+ if (unlikely(object)) { + p[i] = object; + continue; + } + + object = c->freelist; if (unlikely(!object)) { /* * We may have removed an object from c->freelist using @@ -3961,7 +3989,7 @@ void *__kmalloc(size_t size, gfp_t flags) if (unlikely(ZERO_OR_NULL_PTR(s))) return s;
- ret = slab_alloc(s, flags, _RET_IP_); + ret = slab_alloc(s, flags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, flags);
@@ -4009,7 +4037,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node) if (unlikely(ZERO_OR_NULL_PTR(s))) return s;
- ret = slab_alloc_node(s, flags, node, _RET_IP_); + ret = slab_alloc_node(s, flags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);
@@ -4035,6 +4063,7 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page, struct kmem_cache *s; unsigned int offset; size_t object_size; + bool is_kfence = is_kfence_address(ptr);
ptr = kasan_reset_tag(ptr);
@@ -4047,10 +4076,13 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page, to_user, 0, n);
/* Find offset within object. */ - offset = (ptr - page_address(page)) % s->size; + if (is_kfence) + offset = ptr - kfence_object_start(ptr); + else + offset = (ptr - page_address(page)) % s->size;
/* Adjust for redzone and reject if within the redzone. */ - if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) { + if (!is_kfence && kmem_cache_debug_flags(s, SLAB_RED_ZONE)) { if (offset < s->red_left_pad) usercopy_abort("SLUB object in left red zone", s->name, to_user, offset, n); @@ -4461,7 +4493,7 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller) if (unlikely(ZERO_OR_NULL_PTR(s))) return s;
- ret = slab_alloc(s, gfpflags, caller); + ret = slab_alloc(s, gfpflags, caller, size);
/* Honor the call site pointer we received. */ trace_kmalloc(caller, ret, size, s->size, gfpflags); @@ -4492,7 +4524,7 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags, if (unlikely(ZERO_OR_NULL_PTR(s))) return s;
- ret = slab_alloc_node(s, gfpflags, node, caller); + ret = slab_alloc_node(s, gfpflags, node, caller, size);
/* Honor the call site pointer we received. */ trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node);
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit 2b8305260fb37fc20e13f71e13073304d0a031c8 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Make KFENCE compatible with KASAN. Currently this helps test KFENCE itself, where KASAN can catch potential corruptions to KFENCE state, or other corruptions that may be a result of freepointer corruptions in the main allocators.
[akpm@linux-foundation.org: merge fixup] [andreyknvl@google.com: untag addresses for KFENCE] Link: https://lkml.kernel.org/r/9dc196006921b191d25d10f6e611316db7da2efc.161194615...
Link: https://lkml.kernel.org/r/20201103175841.3495947-7-elver@google.com Signed-off-by: Marco Elver elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Signed-off-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Reviewed-by: Jann Horn jannh@google.com Co-developed-by: Marco Elver elver@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/kasan/kasan.h mm/kasan/shadow.c [Peng Liu: cherry-pick from 2b8305260fb37fc20e13f71e13073304d0a031c8] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- lib/Kconfig.kfence | 2 +- mm/kasan/common.c | 18 ++++++++++++++++++ mm/kasan/generic.c | 3 ++- 3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence index b88ac9d6b2e6..edfecb5d6165 100644 --- a/lib/Kconfig.kfence +++ b/lib/Kconfig.kfence @@ -5,7 +5,7 @@ config HAVE_ARCH_KFENCE
menuconfig KFENCE bool "KFENCE: low-overhead sampling-based memory safety error detector" - depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB) + depends on HAVE_ARCH_KFENCE && (SLAB || SLUB) select STACKTRACE help KFENCE is a low-overhead sampling-based detector of heap out-of-bounds diff --git a/mm/kasan/common.c b/mm/kasan/common.c index 950fd372a07e..6c8fa5aed54c 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -124,6 +124,10 @@ void kasan_poison_shadow(const void *address, size_t size, u8 value) */ address = reset_tag(address);
+ /* Skip KFENCE memory if called explicitly outside of sl*b. */ + if (is_kfence_address(address)) + return; + shadow_start = kasan_mem_to_shadow(address); shadow_end = kasan_mem_to_shadow(address + size);
@@ -141,6 +145,14 @@ void kasan_unpoison_shadow(const void *address, size_t size) */ address = reset_tag(address);
+ /* + * Skip KFENCE memory if called explicitly outside of sl*b. Also note + * that calls to ksize(), where size is not a multiple of machine-word + * size, would otherwise poison the invalid portion of the word. + */ + if (is_kfence_address(address)) + return; + kasan_poison_shadow(address, size, tag);
if (size & KASAN_SHADOW_MASK) { @@ -396,6 +408,9 @@ static bool __kasan_slab_free(struct kmem_cache *cache, void *object, tagged_object = object; object = reset_tag(object);
+ if (is_kfence_address(object)) + return false; + if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) != object)) { kasan_report_invalid_free(tagged_object, ip); @@ -444,6 +459,9 @@ static void *__kasan_kmalloc(struct kmem_cache *cache, const void *object, if (unlikely(object == NULL)) return NULL;
+ if (is_kfence_address(kasan_reset_tag(object))) + return (void *)object; + redzone_start = round_up((unsigned long)(object + size), KASAN_SHADOW_SCALE_SIZE); redzone_end = round_up((unsigned long)object + cache->object_size, diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c index 2efc48444e77..c4c56ec8a472 100644 --- a/mm/kasan/generic.c +++ b/mm/kasan/generic.c @@ -21,6 +21,7 @@ #include <linux/init.h> #include <linux/kasan.h> #include <linux/kernel.h> +#include <linux/kfence.h> #include <linux/kmemleak.h> #include <linux/linkage.h> #include <linux/memblock.h> @@ -332,7 +333,7 @@ void kasan_record_aux_stack(void *addr) struct kasan_alloc_meta *alloc_info; void *object;
- if (!(page && PageSlab(page))) + if (is_kfence_address(addr) || !(page && PageSlab(page))) return;
cache = page->slab_cache;
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc1 commit 10efe55f883f2396a0024891ad1d7d5d040364b3 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
[elver@google.com: add missing copyright header to documentation] Link: https://lkml.kernel.org/r/20210118092159.145934-4-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-8-elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Co-developed-by: Alexander Potapenko glider@google.com Reviewed-by: Jann Horn jannh@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/kfence.rst | 298 +++++++++++++++++++++++++++++ lib/Kconfig.kfence | 2 + 3 files changed, 301 insertions(+) create mode 100644 Documentation/dev-tools/kfence.rst
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index f7809c7b1ba9..1b1cf4f5c9d9 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -22,6 +22,7 @@ whole; patches welcome! ubsan kmemleak kcsan + kfence gdb-kernel-debugging kgdb kselftest diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst new file mode 100644 index 000000000000..0e2fb6ef3016 --- /dev/null +++ b/Documentation/dev-tools/kfence.rst @@ -0,0 +1,298 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. Copyright (C) 2020, Google LLC. + +Kernel Electric-Fence (KFENCE) +============================== + +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and +invalid-free errors. + +KFENCE is designed to be enabled in production kernels, and has near zero +performance overhead. Compared to KASAN, KFENCE trades performance for +precision. The main motivation behind KFENCE's design, is that with enough +total uptime KFENCE will detect bugs in code paths not typically exercised by +non-production test workloads. One way to quickly achieve a large enough total +uptime is when the tool is deployed across a large fleet of machines. + +Usage +----- + +To enable KFENCE, configure the kernel with:: + + CONFIG_KFENCE=y + +To build a kernel with KFENCE support, but disabled by default (to enable, set +``kfence.sample_interval`` to non-zero value), configure the kernel with:: + + CONFIG_KFENCE=y + CONFIG_KFENCE_SAMPLE_INTERVAL=0 + +KFENCE provides several other configuration options to customize behaviour (see +the respective help text in ``lib/Kconfig.kfence`` for more info). + +Tuning performance +~~~~~~~~~~~~~~~~~~ + +The most important parameter is KFENCE's sample interval, which can be set via +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The +sample interval determines the frequency with which heap allocations will be +guarded by KFENCE. The default is configurable via the Kconfig option +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` +disables KFENCE. + +The KFENCE memory pool is of fixed size, and if the pool is exhausted, no +further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default +255), the number of available guarded objects can be controlled. Each object +requires 2 pages, one for the object itself and the other one used as a guard +page; object pages are interleaved with guard pages, and every object page is +therefore surrounded by two guard pages. + +The total memory dedicated to the KFENCE memory pool can be computed as:: + + ( #objects + 1 ) * 2 * PAGE_SIZE + +Using the default config, and assuming a page size of 4 KiB, results in +dedicating 2 MiB to the KFENCE memory pool. + +Note: On architectures that support huge pages, KFENCE will ensure that the +pool is using pages of size ``PAGE_SIZE``. This will result in additional page +tables being allocated. + +Error reports +~~~~~~~~~~~~~ + +A typical out-of-bounds access looks like this:: + + ================================================================== + BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b + + Out-of-bounds access at 0xffffffffb672efff (1B left of kfence-#17): + test_out_of_bounds_read+0xa3/0x22b + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507: + test_alloc+0xf3/0x25b + test_out_of_bounds_read+0x98/0x22b + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + ================================================================== + +The header of the report provides a short summary of the function involved in +the access. It is followed by more detailed information about the access and +its origin. Note that, real kernel addresses are only shown for +``CONFIG_DEBUG_KERNEL=y`` builds. + +Use-after-free accesses are reported as:: + + ================================================================== + BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143 + + Use-after-free access at 0xffffffffb673dfe0 (in kfence-#24): + test_use_after_free_read+0xb3/0x143 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507: + test_alloc+0xf3/0x25b + test_use_after_free_read+0x76/0x143 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + freed by task 507: + test_use_after_free_read+0xa8/0x143 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + ================================================================== + +KFENCE also reports on invalid frees, such as double-frees:: + + ================================================================== + BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 + + Invalid free of 0xffffffffb6741000: + test_double_free+0xdc/0x171 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507: + test_alloc+0xf3/0x25b + test_double_free+0x76/0x171 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + freed by task 507: + test_double_free+0xa8/0x171 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + ================================================================== + +KFENCE also uses pattern-based redzones on the other side of an object's guard +page, to detect out-of-bounds writes on the unprotected side of the object. +These are reported on frees:: + + ================================================================== + BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 + + Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69): + test_kmalloc_aligned_oob_write+0xef/0x184 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507: + test_alloc+0xf3/0x25b + test_kmalloc_aligned_oob_write+0x57/0x184 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + ================================================================== + +For such errors, the address where the corruption occurred as well as the +invalidly written bytes (offset from the address) are shown; in this +representation, '.' denote untouched bytes. In the example above ``0xac`` is +the value written to the invalid address at offset 0, and the remaining '.' +denote that no following bytes have been touched. Note that, real values are +only shown for ``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information +disclosure for non-debug builds, '!' is used instead to denote invalidly +written bytes. + +And finally, KFENCE may also report on invalid accesses to any protected page +where it was not possible to determine an associated object, e.g. if adjacent +object pages had not yet been allocated:: + + ================================================================== + BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0 + + Invalid access at 0xffffffffb670b00a: + test_invalid_access+0x26/0xe0 + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + + CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + ================================================================== + +DebugFS interface +~~~~~~~~~~~~~~~~~ + +Some debugging information is exposed via debugfs: + +* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. + +* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects + allocated via KFENCE, including those already freed but protected. + +Implementation Details +---------------------- + +Guarded allocations are set up based on the sample interval. After expiration +of the sample interval, the next allocation through the main allocator (SLAB or +SLUB) returns a guarded allocation from the KFENCE object pool (allocation +sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and +the next allocation is set up after the expiration of the interval. To "gate" a +KFENCE allocation through the main allocator's fast-path without overhead, +KFENCE relies on static branches via the static keys infrastructure. The static +branch is toggled to redirect the allocation to KFENCE. + +KFENCE objects each reside on a dedicated page, at either the left or right +page boundaries selected at random. The pages to the left and right of the +object page are "guard pages", whose attributes are changed to a protected +state, and cause page faults on any attempted access. Such page faults are then +intercepted by KFENCE, which handles the fault gracefully by reporting an +out-of-bounds access, and marking the page as accessible so that the faulting +code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). + +To detect out-of-bounds writes to memory within the object's page itself, +KFENCE also uses pattern-based redzones. For each object page, a redzone is set +up for all non-object memory. For typical alignments, the redzone is only +required on the unguarded side of an object. Because KFENCE must honor the +cache's requested alignment, special alignments may result in unprotected gaps +on either side of an object, all of which are redzoned. + +The following figure illustrates the page layout:: + + ---+-----------+-----------+-----------+-----------+-----------+--- + | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | + | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | + | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | + | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | + | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | + | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | + ---+-----------+-----------+-----------+-----------+-----------+--- + +Upon deallocation of a KFENCE object, the object's page is again protected and +the object is marked as freed. Any further access to the object causes a fault +and KFENCE reports a use-after-free access. Freed objects are inserted at the +tail of KFENCE's freelist, so that the least recently freed objects are reused +first, and the chances of detecting use-after-frees of recently freed objects +is increased. + +Interface +--------- + +The following describes the functions which are used by allocators as well as +page handling code to set up and deal with KFENCE allocations. + +.. kernel-doc:: include/linux/kfence.h + :functions: is_kfence_address + kfence_shutdown_cache + kfence_alloc kfence_free __kfence_free + kfence_ksize kfence_object_start + kfence_handle_page_fault + +Related Tools +------------- + +In userspace, a similar approach is taken by `GWP-ASan +http://llvm.org/docs/GwpAsan.html`_. GWP-ASan also relies on guard pages and +a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is +directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another +similar but non-sampling approach, that also inspired the name "KFENCE", can be +found in the userspace `Electric Fence Malloc Debugger +https://linux.die.net/man/3/efence`_. + +In the kernel, several tools exist to debug memory access errors, and in +particular KASAN can detect all bug classes that KFENCE can detect. While KASAN +is more precise, relying on compiler instrumentation, this comes at a +performance cost. + +It is worth highlighting that KASAN and KFENCE are complementary, with +different target environments. For instance, KASAN is the better debugging-aid, +where test cases or reproducers exists: due to the lower chance to detect the +error, it would require more effort using KFENCE to debug. Deployments at scale +that cannot afford to enable KASAN, however, would benefit from using KFENCE to +discover bugs due to code paths not exercised by test cases or fuzzers. diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence index edfecb5d6165..605125ac2ae0 100644 --- a/lib/Kconfig.kfence +++ b/lib/Kconfig.kfence @@ -13,6 +13,8 @@ menuconfig KFENCE to have negligible cost to permit enabling it in production environments.
+ See file:Documentation/dev-tools/kfence.rst for more details. + Note that, KFENCE is not a substitute for explicit testing with tools such as KASAN. KFENCE can detect a subset of bugs that KASAN can detect, albeit at very different performance profiles. If you can
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc1 commit bc8fbc5f305aecf63423da91e5faf4c0ce40bf38 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Add KFENCE test suite, testing various error detection scenarios. Makes use of KUnit for test organization. Since KFENCE's interface to obtain error reports is via the console, the test verifies that KFENCE outputs expected reports to the console.
[elver@google.com: fix typo in test] Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com [elver@google.com: show access type in report] Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Co-developed-by: Alexander Potapenko glider@google.com Reviewed-by: Jann Horn jannh@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: SeongJae Park sjpark@amazon.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Yingjie Shang 1415317271@qq.com Reviewed-by: Bixuan Cui cuibixuan@huawei.com --- Documentation/dev-tools/kfence.rst | 12 +- arch/arm64/mm/fault.c | 2 +- arch/x86/mm/fault.c | 3 +- include/linux/kfence.h | 9 +- lib/Kconfig.kfence | 13 + mm/kfence/Makefile | 3 + mm/kfence/core.c | 11 +- mm/kfence/kfence.h | 2 +- mm/kfence/kfence_test.c | 858 +++++++++++++++++++++++++++++ mm/kfence/report.c | 27 +- 10 files changed, 915 insertions(+), 25 deletions(-) create mode 100644 mm/kfence/kfence_test.c
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst index 0e2fb6ef3016..58a0a5fa1ddc 100644 --- a/Documentation/dev-tools/kfence.rst +++ b/Documentation/dev-tools/kfence.rst @@ -65,9 +65,9 @@ Error reports A typical out-of-bounds access looks like this::
================================================================== - BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b + BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b
- Out-of-bounds access at 0xffffffffb672efff (1B left of kfence-#17): + Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17): test_out_of_bounds_read+0xa3/0x22b kunit_try_run_case+0x51/0x85 kunit_generic_run_threadfn_adapter+0x16/0x30 @@ -94,9 +94,9 @@ its origin. Note that, real kernel addresses are only shown for Use-after-free accesses are reported as::
================================================================== - BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143 + BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
- Use-after-free access at 0xffffffffb673dfe0 (in kfence-#24): + Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24): test_use_after_free_read+0xb3/0x143 kunit_try_run_case+0x51/0x85 kunit_generic_run_threadfn_adapter+0x16/0x30 @@ -193,9 +193,9 @@ where it was not possible to determine an associated object, e.g. if adjacent object pages had not yet been allocated::
================================================================== - BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0 + BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
- Invalid access at 0xffffffffb670b00a: + Invalid read at 0xffffffffb670b00a: test_invalid_access+0x26/0xe0 kunit_try_run_case+0x51/0x85 kunit_generic_run_threadfn_adapter+0x16/0x30 diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index a5fe9f7b423b..8f4ceab2fa35 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -323,7 +323,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr, } else if (addr < PAGE_SIZE) { msg = "NULL pointer dereference"; } else { - if (kfence_handle_page_fault(addr, regs)) + if (kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs)) return;
msg = "paging request"; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index b2688e8e2faf..7b707c9b661a 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -734,7 +734,8 @@ no_context(struct pt_regs *regs, unsigned long error_code, efi_recover_from_page_fault(address);
/* Only not-present faults should be handled by KFENCE. */ - if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address, regs)) + if (!(error_code & X86_PF_PROT) && + kfence_handle_page_fault(address, error_code & X86_PF_WRITE, regs)) return;
oops: diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 5a56bcf5606c..a70d1ea03532 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -186,6 +186,7 @@ static __always_inline __must_check bool kfence_free(void *addr) /** * kfence_handle_page_fault() - perform page fault handling for KFENCE pages * @addr: faulting address + * @is_write: is access a write * @regs: current struct pt_regs (can be NULL, but shows full stack trace) * * Return: @@ -197,7 +198,7 @@ static __always_inline __must_check bool kfence_free(void *addr) * cases KFENCE prints an error message and marks the offending page as * present, so that the kernel can proceed. */ -bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs); +bool __must_check kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs);
#else /* CONFIG_KFENCE */
@@ -210,7 +211,11 @@ static inline size_t kfence_ksize(const void *addr) { return 0; } static inline void *kfence_object_start(const void *addr) { return NULL; } static inline void __kfence_free(void *addr) { } static inline bool __must_check kfence_free(void *addr) { return false; } -static inline bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs) { return false; } +static inline bool __must_check kfence_handle_page_fault(unsigned long addr, bool is_write, + struct pt_regs *regs) +{ + return false; +}
#endif
diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence index 605125ac2ae0..78f50ccb3b45 100644 --- a/lib/Kconfig.kfence +++ b/lib/Kconfig.kfence @@ -66,4 +66,17 @@ config KFENCE_STRESS_TEST_FAULTS
Only for KFENCE testing; set to 0 if you are not a KFENCE developer.
+config KFENCE_KUNIT_TEST + tristate "KFENCE integration test suite" if !KUNIT_ALL_TESTS + default KUNIT_ALL_TESTS + depends on TRACEPOINTS && KUNIT + help + Test suite for KFENCE, testing various error detection scenarios with + various allocation types, and checking that reports are correctly + output to console. + + Say Y here if you want the test to be built into the kernel and run + during boot; say M if you want the test to build as a module; say N + if you are unsure. + endif # KFENCE diff --git a/mm/kfence/Makefile b/mm/kfence/Makefile index d991e9a349f0..6872cd5e5390 100644 --- a/mm/kfence/Makefile +++ b/mm/kfence/Makefile @@ -1,3 +1,6 @@ # SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_KFENCE) := core.o report.o + +CFLAGS_kfence_test.o := -g -fno-omit-frame-pointer -fno-optimize-sibling-calls +obj-$(CONFIG_KFENCE_KUNIT_TEST) += kfence_test.o diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 7692af715fdb..cfe3d32ac5b7 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -216,7 +216,7 @@ static inline bool check_canary_byte(u8 *addr) return true;
atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); - kfence_report_error((unsigned long)addr, NULL, addr_to_metadata((unsigned long)addr), + kfence_report_error((unsigned long)addr, false, NULL, addr_to_metadata((unsigned long)addr), KFENCE_ERROR_CORRUPTION); return false; } @@ -355,7 +355,8 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) { /* Invalid or double-free, bail out. */ atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); - kfence_report_error((unsigned long)addr, NULL, meta, KFENCE_ERROR_INVALID_FREE); + kfence_report_error((unsigned long)addr, false, NULL, meta, + KFENCE_ERROR_INVALID_FREE); raw_spin_unlock_irqrestore(&meta->lock, flags); return; } @@ -770,7 +771,7 @@ void __kfence_free(void *addr) kfence_guarded_free(addr, meta, false); }
-bool kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs) +bool kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs) { const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE; struct kfence_metadata *to_report = NULL; @@ -833,11 +834,11 @@ bool kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs)
out: if (to_report) { - kfence_report_error(addr, regs, to_report, error_type); + kfence_report_error(addr, is_write, regs, to_report, error_type); raw_spin_unlock_irqrestore(&to_report->lock, flags); } else { /* This may be a UAF or OOB access, but we can't be sure. */ - kfence_report_error(addr, regs, NULL, KFENCE_ERROR_INVALID); + kfence_report_error(addr, is_write, regs, NULL, KFENCE_ERROR_INVALID); }
return kfence_unprotect(addr); /* Unprotect and let access proceed. */ diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 0d83e628a97d..1accc840dbbe 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -105,7 +105,7 @@ enum kfence_error_type { KFENCE_ERROR_INVALID_FREE, /* Invalid free. */ };
-void kfence_report_error(unsigned long address, struct pt_regs *regs, +void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs, const struct kfence_metadata *meta, enum kfence_error_type type);
void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta); diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c new file mode 100644 index 000000000000..db1bb596acaf --- /dev/null +++ b/mm/kfence/kfence_test.c @@ -0,0 +1,858 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test cases for KFENCE memory safety error detector. Since the interface with + * which KFENCE's reports are obtained is via the console, this is the output we + * should verify. For each test case checks the presence (or absence) of + * generated reports. Relies on 'console' tracepoint to capture reports as they + * appear in the kernel log. + * + * Copyright (C) 2020, Google LLC. + * Author: Alexander Potapenko glider@google.com + * Marco Elver elver@google.com + */ + +#include <kunit/test.h> +#include <linux/jiffies.h> +#include <linux/kernel.h> +#include <linux/kfence.h> +#include <linux/mm.h> +#include <linux/random.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/string.h> +#include <linux/tracepoint.h> +#include <trace/events/printk.h> + +#include "kfence.h" + +/* Report as observed from console. */ +static struct { + spinlock_t lock; + int nlines; + char lines[2][256]; +} observed = { + .lock = __SPIN_LOCK_UNLOCKED(observed.lock), +}; + +/* Probe for console output: obtains observed lines of interest. */ +static void probe_console(void *ignore, const char *buf, size_t len) +{ + unsigned long flags; + int nlines; + + spin_lock_irqsave(&observed.lock, flags); + nlines = observed.nlines; + + if (strnstr(buf, "BUG: KFENCE: ", len) && strnstr(buf, "test_", len)) { + /* + * KFENCE report and related to the test. + * + * The provided @buf is not NUL-terminated; copy no more than + * @len bytes and let strscpy() add the missing NUL-terminator. + */ + strscpy(observed.lines[0], buf, min(len + 1, sizeof(observed.lines[0]))); + nlines = 1; + } else if (nlines == 1 && (strnstr(buf, "at 0x", len) || strnstr(buf, "of 0x", len))) { + strscpy(observed.lines[nlines++], buf, min(len + 1, sizeof(observed.lines[0]))); + } + + WRITE_ONCE(observed.nlines, nlines); /* Publish new nlines. */ + spin_unlock_irqrestore(&observed.lock, flags); +} + +/* Check if a report related to the test exists. */ +static bool report_available(void) +{ + return READ_ONCE(observed.nlines) == ARRAY_SIZE(observed.lines); +} + +/* Information we expect in a report. */ +struct expect_report { + enum kfence_error_type type; /* The type or error. */ + void *fn; /* Function pointer to expected function where access occurred. */ + char *addr; /* Address at which the bad access occurred. */ + bool is_write; /* Is access a write. */ +}; + +static const char *get_access_type(const struct expect_report *r) +{ + return r->is_write ? "write" : "read"; +} + +/* Check observed report matches information in @r. */ +static bool report_matches(const struct expect_report *r) +{ + bool ret = false; + unsigned long flags; + typeof(observed.lines) expect; + const char *end; + char *cur; + + /* Doubled-checked locking. */ + if (!report_available()) + return false; + + /* Generate expected report contents. */ + + /* Title */ + cur = expect[0]; + end = &expect[0][sizeof(expect[0]) - 1]; + switch (r->type) { + case KFENCE_ERROR_OOB: + cur += scnprintf(cur, end - cur, "BUG: KFENCE: out-of-bounds %s", + get_access_type(r)); + break; + case KFENCE_ERROR_UAF: + cur += scnprintf(cur, end - cur, "BUG: KFENCE: use-after-free %s", + get_access_type(r)); + break; + case KFENCE_ERROR_CORRUPTION: + cur += scnprintf(cur, end - cur, "BUG: KFENCE: memory corruption"); + break; + case KFENCE_ERROR_INVALID: + cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid %s", + get_access_type(r)); + break; + case KFENCE_ERROR_INVALID_FREE: + cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid free"); + break; + } + + scnprintf(cur, end - cur, " in %pS", r->fn); + /* The exact offset won't match, remove it; also strip module name. */ + cur = strchr(expect[0], '+'); + if (cur) + *cur = '\0'; + + /* Access information */ + cur = expect[1]; + end = &expect[1][sizeof(expect[1]) - 1]; + + switch (r->type) { + case KFENCE_ERROR_OOB: + cur += scnprintf(cur, end - cur, "Out-of-bounds %s at", get_access_type(r)); + break; + case KFENCE_ERROR_UAF: + cur += scnprintf(cur, end - cur, "Use-after-free %s at", get_access_type(r)); + break; + case KFENCE_ERROR_CORRUPTION: + cur += scnprintf(cur, end - cur, "Corrupted memory at"); + break; + case KFENCE_ERROR_INVALID: + cur += scnprintf(cur, end - cur, "Invalid %s at", get_access_type(r)); + break; + case KFENCE_ERROR_INVALID_FREE: + cur += scnprintf(cur, end - cur, "Invalid free of"); + break; + } + + cur += scnprintf(cur, end - cur, " 0x" PTR_FMT, (void *)r->addr); + + spin_lock_irqsave(&observed.lock, flags); + if (!report_available()) + goto out; /* A new report is being captured. */ + + /* Finally match expected output to what we actually observed. */ + ret = strstr(observed.lines[0], expect[0]) && strstr(observed.lines[1], expect[1]); +out: + spin_unlock_irqrestore(&observed.lock, flags); + return ret; +} + +/* ===== Test cases ===== */ + +#define TEST_PRIV_WANT_MEMCACHE ((void *)1) + +/* Cache used by tests; if NULL, allocate from kmalloc instead. */ +static struct kmem_cache *test_cache; + +static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t flags, + void (*ctor)(void *)) +{ + if (test->priv != TEST_PRIV_WANT_MEMCACHE) + return size; + + kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor); + + /* + * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any + * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to + * allocate via memcg, if enabled. + */ + flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT; + test_cache = kmem_cache_create("test", size, 1, flags, ctor); + KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache"); + + return size; +} + +static void test_cache_destroy(void) +{ + if (!test_cache) + return; + + kmem_cache_destroy(test_cache); + test_cache = NULL; +} + +static inline size_t kmalloc_cache_alignment(size_t size) +{ + return kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)]->align; +} + +/* Must always inline to match stack trace against caller. */ +static __always_inline void test_free(void *ptr) +{ + if (test_cache) + kmem_cache_free(test_cache, ptr); + else + kfree(ptr); +} + +/* + * If this should be a KFENCE allocation, and on which side the allocation and + * the closest guard page should be. + */ +enum allocation_policy { + ALLOCATE_ANY, /* KFENCE, any side. */ + ALLOCATE_LEFT, /* KFENCE, left side of page. */ + ALLOCATE_RIGHT, /* KFENCE, right side of page. */ + ALLOCATE_NONE, /* No KFENCE allocation. */ +}; + +/* + * Try to get a guarded allocation from KFENCE. Uses either kmalloc() or the + * current test_cache if set up. + */ +static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocation_policy policy) +{ + void *alloc; + unsigned long timeout, resched_after; + const char *policy_name; + + switch (policy) { + case ALLOCATE_ANY: + policy_name = "any"; + break; + case ALLOCATE_LEFT: + policy_name = "left"; + break; + case ALLOCATE_RIGHT: + policy_name = "right"; + break; + case ALLOCATE_NONE: + policy_name = "none"; + break; + } + + kunit_info(test, "%s: size=%zu, gfp=%x, policy=%s, cache=%i\n", __func__, size, gfp, + policy_name, !!test_cache); + + /* + * 100x the sample interval should be more than enough to ensure we get + * a KFENCE allocation eventually. + */ + timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL); + /* + * Especially for non-preemption kernels, ensure the allocation-gate + * timer can catch up: after @resched_after, every failed allocation + * attempt yields, to ensure the allocation-gate timer is scheduled. + */ + resched_after = jiffies + msecs_to_jiffies(CONFIG_KFENCE_SAMPLE_INTERVAL); + do { + if (test_cache) + alloc = kmem_cache_alloc(test_cache, gfp); + else + alloc = kmalloc(size, gfp); + + if (is_kfence_address(alloc)) { + struct page *page = virt_to_head_page(alloc); + struct kmem_cache *s = test_cache ?: kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)]; + + /* + * Verify that various helpers return the right values + * even for KFENCE objects; these are required so that + * memcg accounting works correctly. + */ + KUNIT_EXPECT_EQ(test, obj_to_index(s, page, alloc), 0U); + KUNIT_EXPECT_EQ(test, objs_per_slab_page(s, page), 1); + + if (policy == ALLOCATE_ANY) + return alloc; + if (policy == ALLOCATE_LEFT && IS_ALIGNED((unsigned long)alloc, PAGE_SIZE)) + return alloc; + if (policy == ALLOCATE_RIGHT && + !IS_ALIGNED((unsigned long)alloc, PAGE_SIZE)) + return alloc; + } else if (policy == ALLOCATE_NONE) + return alloc; + + test_free(alloc); + + if (time_after(jiffies, resched_after)) + cond_resched(); + } while (time_before(jiffies, timeout)); + + KUNIT_ASSERT_TRUE_MSG(test, false, "failed to allocate from KFENCE"); + return NULL; /* Unreachable. */ +} + +static void test_out_of_bounds_read(struct kunit *test) +{ + size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_OOB, + .fn = test_out_of_bounds_read, + .is_write = false, + }; + char *buf; + + setup_test_cache(test, size, 0, NULL); + + /* + * If we don't have our own cache, adjust based on alignment, so that we + * actually access guard pages on either side. + */ + if (!test_cache) + size = kmalloc_cache_alignment(size); + + /* Test both sides. */ + + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT); + expect.addr = buf - 1; + READ_ONCE(*expect.addr); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); + test_free(buf); + + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT); + expect.addr = buf + size; + READ_ONCE(*expect.addr); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); + test_free(buf); +} + +static void test_out_of_bounds_write(struct kunit *test) +{ + size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_OOB, + .fn = test_out_of_bounds_write, + .is_write = true, + }; + char *buf; + + setup_test_cache(test, size, 0, NULL); + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT); + expect.addr = buf - 1; + WRITE_ONCE(*expect.addr, 42); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); + test_free(buf); +} + +static void test_use_after_free_read(struct kunit *test) +{ + const size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_UAF, + .fn = test_use_after_free_read, + .is_write = false, + }; + + setup_test_cache(test, size, 0, NULL); + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + test_free(expect.addr); + READ_ONCE(*expect.addr); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +static void test_double_free(struct kunit *test) +{ + const size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_INVALID_FREE, + .fn = test_double_free, + }; + + setup_test_cache(test, size, 0, NULL); + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + test_free(expect.addr); + test_free(expect.addr); /* Double-free. */ + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +static void test_invalid_addr_free(struct kunit *test) +{ + const size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_INVALID_FREE, + .fn = test_invalid_addr_free, + }; + char *buf; + + setup_test_cache(test, size, 0, NULL); + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + expect.addr = buf + 1; /* Free on invalid address. */ + test_free(expect.addr); /* Invalid address free. */ + test_free(buf); /* No error. */ + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +static void test_corruption(struct kunit *test) +{ + size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_CORRUPTION, + .fn = test_corruption, + }; + char *buf; + + setup_test_cache(test, size, 0, NULL); + + /* Test both sides. */ + + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT); + expect.addr = buf + size; + WRITE_ONCE(*expect.addr, 42); + test_free(buf); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); + + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT); + expect.addr = buf - 1; + WRITE_ONCE(*expect.addr, 42); + test_free(buf); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +/* + * KFENCE is unable to detect an OOB if the allocation's alignment requirements + * leave a gap between the object and the guard page. Specifically, an + * allocation of e.g. 73 bytes is aligned on 8 and 128 bytes for SLUB or SLAB + * respectively. Therefore it is impossible for the allocated object to + * contiguously line up with the right guard page. + * + * However, we test that an access to memory beyond the gap results in KFENCE + * detecting an OOB access. + */ +static void test_kmalloc_aligned_oob_read(struct kunit *test) +{ + const size_t size = 73; + const size_t align = kmalloc_cache_alignment(size); + struct expect_report expect = { + .type = KFENCE_ERROR_OOB, + .fn = test_kmalloc_aligned_oob_read, + .is_write = false, + }; + char *buf; + + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT); + + /* + * The object is offset to the right, so there won't be an OOB to the + * left of it. + */ + READ_ONCE(*(buf - 1)); + KUNIT_EXPECT_FALSE(test, report_available()); + + /* + * @buf must be aligned on @align, therefore buf + size belongs to the + * same page -> no OOB. + */ + READ_ONCE(*(buf + size)); + KUNIT_EXPECT_FALSE(test, report_available()); + + /* Overflowing by @align bytes will result in an OOB. */ + expect.addr = buf + size + align; + READ_ONCE(*expect.addr); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); + + test_free(buf); +} + +static void test_kmalloc_aligned_oob_write(struct kunit *test) +{ + const size_t size = 73; + struct expect_report expect = { + .type = KFENCE_ERROR_CORRUPTION, + .fn = test_kmalloc_aligned_oob_write, + }; + char *buf; + + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT); + /* + * The object is offset to the right, so we won't get a page + * fault immediately after it. + */ + expect.addr = buf + size; + WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1); + KUNIT_EXPECT_FALSE(test, report_available()); + test_free(buf); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +/* Test cache shrinking and destroying with KFENCE. */ +static void test_shrink_memcache(struct kunit *test) +{ + const size_t size = 32; + void *buf; + + setup_test_cache(test, size, 0, NULL); + KUNIT_EXPECT_TRUE(test, test_cache); + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + kmem_cache_shrink(test_cache); + test_free(buf); + + KUNIT_EXPECT_FALSE(test, report_available()); +} + +static void ctor_set_x(void *obj) +{ + /* Every object has at least 8 bytes. */ + memset(obj, 'x', 8); +} + +/* Ensure that SL*B does not modify KFENCE objects on bulk free. */ +static void test_free_bulk(struct kunit *test) +{ + int iter; + + for (iter = 0; iter < 5; iter++) { + const size_t size = setup_test_cache(test, 8 + prandom_u32_max(300), 0, + (iter & 1) ? ctor_set_x : NULL); + void *objects[] = { + test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT), + test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE), + test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT), + test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE), + test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE), + }; + + kmem_cache_free_bulk(test_cache, ARRAY_SIZE(objects), objects); + KUNIT_ASSERT_FALSE(test, report_available()); + test_cache_destroy(); + } +} + +/* Test init-on-free works. */ +static void test_init_on_free(struct kunit *test) +{ + const size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_UAF, + .fn = test_init_on_free, + .is_write = false, + }; + int i; + + if (!IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON)) + return; + /* Assume it hasn't been disabled on command line. */ + + setup_test_cache(test, size, 0, NULL); + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + for (i = 0; i < size; i++) + expect.addr[i] = i + 1; + test_free(expect.addr); + + for (i = 0; i < size; i++) { + /* + * This may fail if the page was recycled by KFENCE and then + * written to again -- this however, is near impossible with a + * default config. + */ + KUNIT_EXPECT_EQ(test, expect.addr[i], (char)0); + + if (!i) /* Only check first access to not fail test if page is ever re-protected. */ + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); + } +} + +/* Ensure that constructors work properly. */ +static void test_memcache_ctor(struct kunit *test) +{ + const size_t size = 32; + char *buf; + int i; + + setup_test_cache(test, size, 0, ctor_set_x); + buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + + for (i = 0; i < 8; i++) + KUNIT_EXPECT_EQ(test, buf[i], (char)'x'); + + test_free(buf); + + KUNIT_EXPECT_FALSE(test, report_available()); +} + +/* Test that memory is zeroed if requested. */ +static void test_gfpzero(struct kunit *test) +{ + const size_t size = PAGE_SIZE; /* PAGE_SIZE so we can use ALLOCATE_ANY. */ + char *buf1, *buf2; + int i; + + if (CONFIG_KFENCE_SAMPLE_INTERVAL > 100) { + kunit_warn(test, "skipping ... would take too long\n"); + return; + } + + setup_test_cache(test, size, 0, NULL); + buf1 = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + for (i = 0; i < size; i++) + buf1[i] = i + 1; + test_free(buf1); + + /* Try to get same address again -- this can take a while. */ + for (i = 0;; i++) { + buf2 = test_alloc(test, size, GFP_KERNEL | __GFP_ZERO, ALLOCATE_ANY); + if (buf1 == buf2) + break; + test_free(buf2); + + if (i == CONFIG_KFENCE_NUM_OBJECTS) { + kunit_warn(test, "giving up ... cannot get same object back\n"); + return; + } + } + + for (i = 0; i < size; i++) + KUNIT_EXPECT_EQ(test, buf2[i], (char)0); + + test_free(buf2); + + KUNIT_EXPECT_FALSE(test, report_available()); +} + +static void test_invalid_access(struct kunit *test) +{ + const struct expect_report expect = { + .type = KFENCE_ERROR_INVALID, + .fn = test_invalid_access, + .addr = &__kfence_pool[10], + .is_write = false, + }; + + READ_ONCE(__kfence_pool[10]); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +/* Test SLAB_TYPESAFE_BY_RCU works. */ +static void test_memcache_typesafe_by_rcu(struct kunit *test) +{ + const size_t size = 32; + struct expect_report expect = { + .type = KFENCE_ERROR_UAF, + .fn = test_memcache_typesafe_by_rcu, + .is_write = false, + }; + + setup_test_cache(test, size, SLAB_TYPESAFE_BY_RCU, NULL); + KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */ + + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY); + *expect.addr = 42; + + rcu_read_lock(); + test_free(expect.addr); + KUNIT_EXPECT_EQ(test, *expect.addr, (char)42); + /* + * Up to this point, memory should not have been freed yet, and + * therefore there should be no KFENCE report from the above access. + */ + rcu_read_unlock(); + + /* Above access to @expect.addr should not have generated a report! */ + KUNIT_EXPECT_FALSE(test, report_available()); + + /* Only after rcu_barrier() is the memory guaranteed to be freed. */ + rcu_barrier(); + + /* Expect use-after-free. */ + KUNIT_EXPECT_EQ(test, *expect.addr, (char)42); + KUNIT_EXPECT_TRUE(test, report_matches(&expect)); +} + +/* Test krealloc(). */ +static void test_krealloc(struct kunit *test) +{ + const size_t size = 32; + const struct expect_report expect = { + .type = KFENCE_ERROR_UAF, + .fn = test_krealloc, + .addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY), + .is_write = false, + }; + char *buf = expect.addr; + int i; + + KUNIT_EXPECT_FALSE(test, test_cache); + KUNIT_EXPECT_EQ(test, ksize(buf), size); /* Precise size match after KFENCE alloc. */ + for (i = 0; i < size; i++) + buf[i] = i + 1; + + /* Check that we successfully change the size. */ + buf = krealloc(buf, size * 3, GFP_KERNEL); /* Grow. */ + /* Note: Might no longer be a KFENCE alloc. */ + KUNIT_EXPECT_GE(test, ksize(buf), size * 3); + for (i = 0; i < size; i++) + KUNIT_EXPECT_EQ(test, buf[i], (char)(i + 1)); + for (; i < size * 3; i++) /* Fill to extra bytes. */ + buf[i] = i + 1; + + buf = krealloc(buf, size * 2, GFP_KERNEL); /* Shrink. */ + KUNIT_EXPECT_GE(test, ksize(buf), size * 2); + for (i = 0; i < size * 2; i++) + KUNIT_EXPECT_EQ(test, buf[i], (char)(i + 1)); + + buf = krealloc(buf, 0, GFP_KERNEL); /* Free. */ + KUNIT_EXPECT_EQ(test, (unsigned long)buf, (unsigned long)ZERO_SIZE_PTR); + KUNIT_ASSERT_FALSE(test, report_available()); /* No reports yet! */ + + READ_ONCE(*expect.addr); /* Ensure krealloc() actually freed earlier KFENCE object. */ + KUNIT_ASSERT_TRUE(test, report_matches(&expect)); +} + +/* Test that some objects from a bulk allocation belong to KFENCE pool. */ +static void test_memcache_alloc_bulk(struct kunit *test) +{ + const size_t size = 32; + bool pass = false; + unsigned long timeout; + + setup_test_cache(test, size, 0, NULL); + KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */ + /* + * 100x the sample interval should be more than enough to ensure we get + * a KFENCE allocation eventually. + */ + timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL); + do { + void *objects[100]; + int i, num = kmem_cache_alloc_bulk(test_cache, GFP_ATOMIC, ARRAY_SIZE(objects), + objects); + if (!num) + continue; + for (i = 0; i < ARRAY_SIZE(objects); i++) { + if (is_kfence_address(objects[i])) { + pass = true; + break; + } + } + kmem_cache_free_bulk(test_cache, num, objects); + /* + * kmem_cache_alloc_bulk() disables interrupts, and calling it + * in a tight loop may not give KFENCE a chance to switch the + * static branch. Call cond_resched() to let KFENCE chime in. + */ + cond_resched(); + } while (!pass && time_before(jiffies, timeout)); + + KUNIT_EXPECT_TRUE(test, pass); + KUNIT_EXPECT_FALSE(test, report_available()); +} + +/* + * KUnit does not provide a way to provide arguments to tests, and we encode + * additional info in the name. Set up 2 tests per test case, one using the + * default allocator, and another using a custom memcache (suffix '-memcache'). + */ +#define KFENCE_KUNIT_CASE(test_name) \ + { .run_case = test_name, .name = #test_name }, \ + { .run_case = test_name, .name = #test_name "-memcache" } + +static struct kunit_case kfence_test_cases[] = { + KFENCE_KUNIT_CASE(test_out_of_bounds_read), + KFENCE_KUNIT_CASE(test_out_of_bounds_write), + KFENCE_KUNIT_CASE(test_use_after_free_read), + KFENCE_KUNIT_CASE(test_double_free), + KFENCE_KUNIT_CASE(test_invalid_addr_free), + KFENCE_KUNIT_CASE(test_corruption), + KFENCE_KUNIT_CASE(test_free_bulk), + KFENCE_KUNIT_CASE(test_init_on_free), + KUNIT_CASE(test_kmalloc_aligned_oob_read), + KUNIT_CASE(test_kmalloc_aligned_oob_write), + KUNIT_CASE(test_shrink_memcache), + KUNIT_CASE(test_memcache_ctor), + KUNIT_CASE(test_invalid_access), + KUNIT_CASE(test_gfpzero), + KUNIT_CASE(test_memcache_typesafe_by_rcu), + KUNIT_CASE(test_krealloc), + KUNIT_CASE(test_memcache_alloc_bulk), + {}, +}; + +/* ===== End test cases ===== */ + +static int test_init(struct kunit *test) +{ + unsigned long flags; + int i; + + spin_lock_irqsave(&observed.lock, flags); + for (i = 0; i < ARRAY_SIZE(observed.lines); i++) + observed.lines[i][0] = '\0'; + observed.nlines = 0; + spin_unlock_irqrestore(&observed.lock, flags); + + /* Any test with 'memcache' in its name will want a memcache. */ + if (strstr(test->name, "memcache")) + test->priv = TEST_PRIV_WANT_MEMCACHE; + else + test->priv = NULL; + + return 0; +} + +static void test_exit(struct kunit *test) +{ + test_cache_destroy(); +} + +static struct kunit_suite kfence_test_suite = { + .name = "kfence", + .test_cases = kfence_test_cases, + .init = test_init, + .exit = test_exit, +}; +static struct kunit_suite *kfence_test_suites[] = { &kfence_test_suite, NULL }; + +static void register_tracepoints(struct tracepoint *tp, void *ignore) +{ + check_trace_callback_type_console(probe_console); + if (!strcmp(tp->name, "console")) + WARN_ON(tracepoint_probe_register(tp, probe_console, NULL)); +} + +static void unregister_tracepoints(struct tracepoint *tp, void *ignore) +{ + if (!strcmp(tp->name, "console")) + tracepoint_probe_unregister(tp, probe_console, NULL); +} + +/* + * We only want to do tracepoints setup and teardown once, therefore we have to + * customize the init and exit functions and cannot rely on kunit_test_suite(). + */ +static int __init kfence_test_init(void) +{ + /* + * Because we want to be able to build the test as a module, we need to + * iterate through all known tracepoints, since the static registration + * won't work here. + */ + for_each_kernel_tracepoint(register_tracepoints, NULL); + return __kunit_test_suites_init(kfence_test_suites); +} + +static void kfence_test_exit(void) +{ + __kunit_test_suites_exit(kfence_test_suites); + for_each_kernel_tracepoint(unregister_tracepoints, NULL); + tracepoint_synchronize_unregister(); +} + +late_initcall(kfence_test_init); +module_exit(kfence_test_exit); + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Alexander Potapenko glider@google.com, Marco Elver elver@google.com"); diff --git a/mm/kfence/report.c b/mm/kfence/report.c index 4dbfa9a382e4..901bd7ee83d8 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -156,7 +156,12 @@ static void print_diff_canary(unsigned long address, size_t bytes_to_show, pr_cont(" ]"); }
-void kfence_report_error(unsigned long address, struct pt_regs *regs, +static const char *get_access_type(bool is_write) +{ + return is_write ? "write" : "read"; +} + +void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs, const struct kfence_metadata *meta, enum kfence_error_type type) { unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 }; @@ -194,17 +199,19 @@ void kfence_report_error(unsigned long address, struct pt_regs *regs, case KFENCE_ERROR_OOB: { const bool left_of_object = address < meta->addr;
- pr_err("BUG: KFENCE: out-of-bounds in %pS\n\n", (void *)stack_entries[skipnr]); - pr_err("Out-of-bounds access at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n", - (void *)address, + pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write), + (void *)stack_entries[skipnr]); + pr_err("Out-of-bounds %s at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n", + get_access_type(is_write), (void *)address, left_of_object ? meta->addr - address : address - meta->addr, left_of_object ? "left" : "right", object_index); break; } case KFENCE_ERROR_UAF: - pr_err("BUG: KFENCE: use-after-free in %pS\n\n", (void *)stack_entries[skipnr]); - pr_err("Use-after-free access at 0x" PTR_FMT " (in kfence-#%zd):\n", - (void *)address, object_index); + pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write), + (void *)stack_entries[skipnr]); + pr_err("Use-after-free %s at 0x" PTR_FMT " (in kfence-#%zd):\n", + get_access_type(is_write), (void *)address, object_index); break; case KFENCE_ERROR_CORRUPTION: pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]); @@ -213,8 +220,10 @@ void kfence_report_error(unsigned long address, struct pt_regs *regs, pr_cont(" (in kfence-#%zd):\n", object_index); break; case KFENCE_ERROR_INVALID: - pr_err("BUG: KFENCE: invalid access in %pS\n\n", (void *)stack_entries[skipnr]); - pr_err("Invalid access at 0x" PTR_FMT ":\n", (void *)address); + pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write), + (void *)stack_entries[skipnr]); + pr_err("Invalid %s at 0x" PTR_FMT ":\n", get_access_type(is_write), + (void *)address); break; case KFENCE_ERROR_INVALID_FREE: pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]);
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc1 commit 0825c1d57f02e3fb228bbecad827956d4c796d3a category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Add entry for KFENCE maintainers.
Link: https://lkml.kernel.org/r/20201103175841.3495947-10-elver@google.com Signed-off-by: Alexander Potapenko glider@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Reviewed-by: SeongJae Park sjpark@amazon.de Co-developed-by: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christopher Lameter cl@linux.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hillf Danton hdanton@sina.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Jann Horn jannh@google.com Cc: Joern Engel joern@purestorage.com Cc: Jonathan Corbet corbet@lwn.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Paul E. McKenney paulmck@kernel.org Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vlastimil Babka vbabka@suse.cz Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- MAINTAINERS | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index a12ecc71e1e7..20ea04bb5542 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9769,6 +9769,18 @@ F: include/linux/keyctl.h F: include/uapi/linux/keyctl.h F: security/keys/
+KFENCE +M: Alexander Potapenko glider@google.com +M: Marco Elver elver@google.com +R: Dmitry Vyukov dvyukov@google.com +L: kasan-dev@googlegroups.com +S: Maintained +F: Documentation/dev-tools/kfence.rst +F: arch/*/include/asm/kfence.h +F: include/linux/kfence.h +F: lib/Kconfig.kfence +F: mm/kfence/ + KFIFO M: Stefani Seibold stefani@seibold.net S: Maintained
From: Timur Tabi timur@kernel.org
mainline inclusion from mainline-v5.12-rc1 commit 4e89a78779647ca7ee2967551c599633fe9d3647 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Instead of defining the total/failed test counters manually, test drivers that are clients of kselftest should use the macro created for this purpose.
Signed-off-by: Timur Tabi timur@kernel.org Reviewed-by: Petr Mladek pmladek@suse.com Acked-by: Marco Elver elver@google.com Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20210214161348.369023-2-timur@kernel.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- lib/test_bitmap.c | 3 +-- lib/test_printf.c | 4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index 4425a1dd4ef1..0ea0e8258f14 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -16,8 +16,7 @@
#include "../tools/testing/selftests/kselftest_module.h"
-static unsigned total_tests __initdata; -static unsigned failed_tests __initdata; +KSTM_MODULE_GLOBALS();
static char pbl_buffer[PAGE_SIZE] __initdata;
diff --git a/lib/test_printf.c b/lib/test_printf.c index 78f910200879..c557399bee9c 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -30,8 +30,8 @@ #define PAD_SIZE 16 #define FILL_CHAR '$'
-static unsigned total_tests __initdata; -static unsigned failed_tests __initdata; +KSTM_MODULE_GLOBALS(); + static char *test_buffer __initdata; static char *alloced_buffer __initdata;
From: Timur Tabi timur@kernel.org
mainline inclusion from mainline-v5.12-rc1 commit d9d4de2309cd1721421c6488f1bb5744d2c83a39 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Update the kselftest framework to allow client drivers to specify that some tests were skipped.
Signed-off-by: Timur Tabi timur@kernel.org Reviewed-by: Petr Mladek pmladek@suse.com Tested-by: Petr Mladek pmladek@suse.com Acked-by: Marco Elver elver@google.com Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20210214161348.369023-3-timur@kernel.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- tools/testing/selftests/kselftest_module.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest_module.h b/tools/testing/selftests/kselftest_module.h index e8eafaf0941a..e2ea41de3f35 100644 --- a/tools/testing/selftests/kselftest_module.h +++ b/tools/testing/selftests/kselftest_module.h @@ -11,7 +11,8 @@
#define KSTM_MODULE_GLOBALS() \ static unsigned int total_tests __initdata; \ -static unsigned int failed_tests __initdata +static unsigned int failed_tests __initdata; \ +static unsigned int skipped_tests __initdata
#define KSTM_CHECK_ZERO(x) do { \ total_tests++; \ @@ -21,11 +22,16 @@ static unsigned int failed_tests __initdata } \ } while (0)
-static inline int kstm_report(unsigned int total_tests, unsigned int failed_tests) +static inline int kstm_report(unsigned int total_tests, unsigned int failed_tests, + unsigned int skipped_tests) { - if (failed_tests == 0) - pr_info("all %u tests passed\n", total_tests); - else + if (failed_tests == 0) { + if (skipped_tests) { + pr_info("skipped %u tests\n", skipped_tests); + pr_info("remaining %u tests passed\n", total_tests); + } else + pr_info("all %u tests passed\n", total_tests); + } else pr_warn("failed %u out of %u tests\n", failed_tests, total_tests);
return failed_tests ? -EINVAL : 0; @@ -36,7 +42,7 @@ static int __init __module##_init(void) \ { \ pr_info("loaded.\n"); \ selftest(); \ - return kstm_report(total_tests, failed_tests); \ + return kstm_report(total_tests, failed_tests, skipped_tests); \ } \ static void __exit __module##_exit(void) \ { \
From: Timur Tabi timur@kernel.org
mainline inclusion from mainline-v5.12-rc1 commit 5ead723a20e0447bc7db33dc3070b420e5f80aa6 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
If the no_hash_pointers command line parameter is set, then printk("%p") will print pointers as unhashed, which is useful for debugging purposes. This change applies to any function that uses vsprintf, such as print_hex_dump() and seq_buf_printf().
A large warning message is displayed if this option is enabled. Unhashed pointers expose kernel addresses, which can be a security risk.
Also update test_printf to skip the hashed pointer tests if the command-line option is set.
Signed-off-by: Timur Tabi timur@kernel.org Acked-by: Petr Mladek pmladek@suse.com Acked-by: Randy Dunlap rdunlap@infradead.org Acked-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Acked-by: Vlastimil Babka vbabka@suse.cz Acked-by: Marco Elver elver@google.com Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20210214161348.369023-4-timur@kernel.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- .../admin-guide/kernel-parameters.txt | 15 ++++++++ lib/test_printf.c | 8 +++++ lib/vsprintf.c | 36 +++++++++++++++++-- 3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c94cc7e7df8c..3469cf42e15b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3352,6 +3352,21 @@ in certain environments such as networked servers or real-time systems.
+ no_hash_pointers + Force pointers printed to the console or buffers to be + unhashed. By default, when a pointer is printed via %p + format string, that pointer is "hashed", i.e. obscured + by hashing the pointer value. This is a security feature + that hides actual kernel addresses from unprivileged + users, but it also makes debugging the kernel more + difficult since unequal pointers can no longer be + compared. However, if this command-line option is + specified, then all normal pointers will have their true + value printed. Pointers printed via %pK may still be + hashed. This option should only be specified when + debugging the kernel. Please do not use on production + kernels. + nohibernate [HIBERNATION] Disable hibernation and resume.
nohz= [KNL] Boottime enable/disable dynamic ticks diff --git a/lib/test_printf.c b/lib/test_printf.c index c557399bee9c..298b79ce38b7 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -35,6 +35,8 @@ KSTM_MODULE_GLOBALS(); static char *test_buffer __initdata; static char *alloced_buffer __initdata;
+extern bool no_hash_pointers; + static int __printf(4, 0) __init do_test(int bufsize, const char *expect, int elen, const char *fmt, va_list ap) @@ -301,6 +303,12 @@ plain(void) { int err;
+ if (no_hash_pointers) { + pr_warn("skipping plain 'p' tests"); + skipped_tests += 2; + return; + } + err = plain_hash(); if (err) { pr_warn("plain 'p' does not appear to be hashed\n"); diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 3808a8707ed5..dc7d6a269e2f 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2172,6 +2172,32 @@ char *fwnode_string(char *buf, char *end, struct fwnode_handle *fwnode, return widen_string(buf, buf - buf_start, end, spec); }
+/* Disable pointer hashing if requested */ +bool no_hash_pointers __ro_after_init; +EXPORT_SYMBOL_GPL(no_hash_pointers); + +static int __init no_hash_pointers_enable(char *str) +{ + no_hash_pointers = true; + + pr_warn("**********************************************************\n"); + pr_warn("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n"); + pr_warn("** **\n"); + pr_warn("** This system shows unhashed kernel memory addresses **\n"); + pr_warn("** via the console, logs, and other interfaces. This **\n"); + pr_warn("** might reduce the security of your system. **\n"); + pr_warn("** **\n"); + pr_warn("** If you see this message and you are not debugging **\n"); + pr_warn("** the kernel, report this immediately to your system **\n"); + pr_warn("** administrator! **\n"); + pr_warn("** **\n"); + pr_warn("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **\n"); + pr_warn("**********************************************************\n"); + + return 0; +} +early_param("no_hash_pointers", no_hash_pointers_enable); + /* * Show a '%p' thing. A kernel extension is that the '%p' is followed * by an extra set of alphanumeric characters that are extended format @@ -2379,8 +2405,14 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, } }
- /* default is to _not_ leak addresses, hash before printing */ - return ptr_to_id(buf, end, ptr, spec); + /* + * default is to _not_ leak addresses, so hash before printing, + * unless no_hash_pointers is specified on the command line. + */ + if (unlikely(no_hash_pointers)) + return pointer_string(buf, end, ptr, spec); + else + return ptr_to_id(buf, end, ptr, spec); }
/*
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc1 commit 35beccf0926d42ee0d56e41979ec8cdf814c4769 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
We cannot rely on CONFIG_DEBUG_KERNEL to decide if we're running a "debug kernel" where we can safely show potentially sensitive information in the kernel log.
Instead, simply rely on the newly introduced "no_hash_pointers" to print unhashed kernel pointers, as well as decide if our reports can include other potentially sensitive information such as registers and corrupted bytes.
Link: https://lkml.kernel.org/r/20210223082043.1972742-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Cc: Timur Tabi timur@kernel.org Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [Peng Liu: dependent on 5ead723a20e0447bc7db33dc3070b420e5f80aa6] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- Documentation/dev-tools/kfence.rst | 8 ++++---- mm/kfence/core.c | 10 +++------- mm/kfence/kfence.h | 7 ------- mm/kfence/kfence_test.c | 2 +- mm/kfence/report.c | 18 ++++++++++-------- 5 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst index 58a0a5fa1ddc..fdf04e741ea5 100644 --- a/Documentation/dev-tools/kfence.rst +++ b/Documentation/dev-tools/kfence.rst @@ -88,8 +88,8 @@ A typical out-of-bounds access looks like this::
The header of the report provides a short summary of the function involved in the access. It is followed by more detailed information about the access and -its origin. Note that, real kernel addresses are only shown for -``CONFIG_DEBUG_KERNEL=y`` builds. +its origin. Note that, real kernel addresses are only shown when using the +kernel command line option ``no_hash_pointers``.
Use-after-free accesses are reported as::
@@ -184,8 +184,8 @@ invalidly written bytes (offset from the address) are shown; in this representation, '.' denote untouched bytes. In the example above ``0xac`` is the value written to the invalid address at offset 0, and the remaining '.' denote that no following bytes have been touched. Note that, real values are -only shown for ``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information -disclosure for non-debug builds, '!' is used instead to denote invalidly +only shown if the kernel was booted with ``no_hash_pointers``; to avoid +information disclosure otherwise, '!' is used instead to denote invalidly written bytes.
And finally, KFENCE may also report on invalid accesses to any protected page diff --git a/mm/kfence/core.c b/mm/kfence/core.c index cfe3d32ac5b7..3b8ec938470a 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -646,13 +646,9 @@ void __init kfence_init(void)
WRITE_ONCE(kfence_enabled, true); schedule_delayed_work(&kfence_timer, 0); - pr_info("initialized - using %lu bytes for %d objects", KFENCE_POOL_SIZE, - CONFIG_KFENCE_NUM_OBJECTS); - if (IS_ENABLED(CONFIG_DEBUG_KERNEL)) - pr_cont(" at 0x%px-0x%px\n", (void *)__kfence_pool, - (void *)(__kfence_pool + KFENCE_POOL_SIZE)); - else - pr_cont("\n"); + pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, + CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, + (void *)(__kfence_pool + KFENCE_POOL_SIZE)); }
void kfence_shutdown_cache(struct kmem_cache *s) diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 1accc840dbbe..24065321ff8a 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -16,13 +16,6 @@
#include "../slab.h" /* for struct kmem_cache */
-/* For non-debug builds, avoid leaking kernel pointers into dmesg. */ -#ifdef CONFIG_DEBUG_KERNEL -#define PTR_FMT "%px" -#else -#define PTR_FMT "%p" -#endif - /* * Get the canary byte pattern for @addr. Use a pattern that varies based on the * lower 3 bits of the address, to detect memory corruptions with higher diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index db1bb596acaf..4acf4251ee04 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -146,7 +146,7 @@ static bool report_matches(const struct expect_report *r) break; }
- cur += scnprintf(cur, end - cur, " 0x" PTR_FMT, (void *)r->addr); + cur += scnprintf(cur, end - cur, " 0x%p", (void *)r->addr);
spin_lock_irqsave(&observed.lock, flags); if (!report_available()) diff --git a/mm/kfence/report.c b/mm/kfence/report.c index 901bd7ee83d8..4a424de44e2d 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -19,6 +19,8 @@
#include "kfence.h"
+extern bool no_hash_pointers; + /* Helper function to either print to a seq_file or to console. */ __printf(2, 3) static void seq_con_printf(struct seq_file *seq, const char *fmt, ...) @@ -118,7 +120,7 @@ void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *met }
seq_con_printf(seq, - "kfence-#%zd [0x" PTR_FMT "-0x" PTR_FMT + "kfence-#%zd [0x%p-0x%p" ", size=%d, cache=%s] allocated by task %d:\n", meta - kfence_metadata, (void *)start, (void *)(start + size - 1), size, (cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid); @@ -148,7 +150,7 @@ static void print_diff_canary(unsigned long address, size_t bytes_to_show, for (cur = (const u8 *)address; cur < end; cur++) { if (*cur == KFENCE_CANARY_PATTERN(cur)) pr_cont(" ."); - else if (IS_ENABLED(CONFIG_DEBUG_KERNEL)) + else if (no_hash_pointers) pr_cont(" 0x%02x", *cur); else /* Do not leak kernel memory in non-debug builds. */ pr_cont(" !"); @@ -201,7 +203,7 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r
pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write), (void *)stack_entries[skipnr]); - pr_err("Out-of-bounds %s at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n", + pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%zd):\n", get_access_type(is_write), (void *)address, left_of_object ? meta->addr - address : address - meta->addr, left_of_object ? "left" : "right", object_index); @@ -210,24 +212,24 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r case KFENCE_ERROR_UAF: pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write), (void *)stack_entries[skipnr]); - pr_err("Use-after-free %s at 0x" PTR_FMT " (in kfence-#%zd):\n", + pr_err("Use-after-free %s at 0x%p (in kfence-#%zd):\n", get_access_type(is_write), (void *)address, object_index); break; case KFENCE_ERROR_CORRUPTION: pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]); - pr_err("Corrupted memory at 0x" PTR_FMT " ", (void *)address); + pr_err("Corrupted memory at 0x%p ", (void *)address); print_diff_canary(address, 16, meta); pr_cont(" (in kfence-#%zd):\n", object_index); break; case KFENCE_ERROR_INVALID: pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write), (void *)stack_entries[skipnr]); - pr_err("Invalid %s at 0x" PTR_FMT ":\n", get_access_type(is_write), + pr_err("Invalid %s at 0x%p:\n", get_access_type(is_write), (void *)address); break; case KFENCE_ERROR_INVALID_FREE: pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]); - pr_err("Invalid free of 0x" PTR_FMT " (in kfence-#%zd):\n", (void *)address, + pr_err("Invalid free of 0x%p (in kfence-#%zd):\n", (void *)address, object_index); break; } @@ -242,7 +244,7 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r
/* Print report footer. */ pr_err("\n"); - if (IS_ENABLED(CONFIG_DEBUG_KERNEL) && regs) + if (no_hash_pointers && regs) show_regs(regs); else dump_stack_print_info(KERN_ERR);
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit 9c0dee54eb91d48cca048bd7bd2c1f4a166e0252 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3.
This patchset adds a tracepoint, error_repor_end, that is to be used by KFENCE, KASAN, and potentially other bug detection tools, when they print an error report. One of the possible use cases is userspace collection of kernel error reports: interested parties can subscribe to the tracing event via tracefs, and get notified when an error report occurs.
This patch (of 3):
Introduce error_report_end tracepoint. It can be used in debugging tools like KASAN, KFENCE, etc. to provide extensions to the error reporting mechanisms (e.g. allow tests hook into error reporting, ease error report collection from production kernels). Another benefit would be making use of ftrace for debugging or benchmarking the tools themselves.
Should we need it, the tracepoint name leaves us with the possibility to introduce a complementary error_report_start tracepoint in the future.
Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Suggested-by: Marco Elver elver@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Petr Mladek pmladek@suse.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- include/trace/events/error_report.h | 74 +++++++++++++++++++++++++++++ kernel/trace/Makefile | 1 + kernel/trace/error_report-traces.c | 11 +++++ 3 files changed, 86 insertions(+) create mode 100644 include/trace/events/error_report.h create mode 100644 kernel/trace/error_report-traces.c
diff --git a/include/trace/events/error_report.h b/include/trace/events/error_report.h new file mode 100644 index 000000000000..96f64bf218b2 --- /dev/null +++ b/include/trace/events/error_report.h @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Declarations for error reporting tracepoints. + * + * Copyright (C) 2021, Google LLC. + */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM error_report + +#if !defined(_TRACE_ERROR_REPORT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_ERROR_REPORT_H + +#include <linux/tracepoint.h> + +#ifndef __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY +#define __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY + +enum error_detector { + ERROR_DETECTOR_KFENCE, + ERROR_DETECTOR_KASAN +}; + +#endif /* __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY */ + +#define error_detector_list \ + EM(ERROR_DETECTOR_KFENCE, "kfence") \ + EMe(ERROR_DETECTOR_KASAN, "kasan") +/* Always end the list with an EMe. */ + +#undef EM +#undef EMe + +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define EMe(a, b) TRACE_DEFINE_ENUM(a); + +error_detector_list + +#undef EM +#undef EMe + +#define EM(a, b) { a, b }, +#define EMe(a, b) { a, b } + +#define show_error_detector_list(val) \ + __print_symbolic(val, error_detector_list) + +DECLARE_EVENT_CLASS(error_report_template, + TP_PROTO(enum error_detector error_detector, unsigned long id), + TP_ARGS(error_detector, id), + TP_STRUCT__entry(__field(enum error_detector, error_detector) + __field(unsigned long, id)), + TP_fast_assign(__entry->error_detector = error_detector; + __entry->id = id;), + TP_printk("[%s] %lx", + show_error_detector_list(__entry->error_detector), + __entry->id)); + +/** + * error_report_end - called after printing the error report + * @error_detector: short string describing the error detection tool + * @id: pseudo-unique descriptor identifying the report + * (e.g. the memory access address) + * + * This event occurs right after a debugging tool finishes printing the error + * report. + */ +DEFINE_EVENT(error_report_template, error_report_end, + TP_PROTO(enum error_detector error_detector, unsigned long id), + TP_ARGS(error_detector, id)); + +#endif /* _TRACE_ERROR_REPORT_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index e153be351548..20bf3ada8c94 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_SYNTH_EVENTS) += trace_events_synth.o obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o +obj-$(CONFIG_TRACEPOINTS) += error_report-traces.o obj-$(CONFIG_TRACEPOINTS) += power-traces.o ifeq ($(CONFIG_PM),y) obj-$(CONFIG_TRACEPOINTS) += rpm-traces.o diff --git a/kernel/trace/error_report-traces.c b/kernel/trace/error_report-traces.c new file mode 100644 index 000000000000..f89792c25b11 --- /dev/null +++ b/kernel/trace/error_report-traces.c @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Error reporting trace points. + * + * Copyright (C) 2021, Google LLC. + */ + +#define CREATE_TRACE_POINTS +#include <trace/events/error_report.h> + +EXPORT_TRACEPOINT_SYMBOL_GPL(error_report_end);
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit f2b84d2e40eb1a17f72dc4a1da463ec8de649f19 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Make it possible to trace KFENCE error reporting. A good usecase is watching for trace events from the userspace to detect and process memory corruption reports from the kernel.
Link: https://lkml.kernel.org/r/20210121131915.1331302-3-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Suggested-by: Marco Elver elver@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Petr Mladek pmladek@suse.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/report.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/kfence/report.c b/mm/kfence/report.c index 4a424de44e2d..ab83d5a59bb1 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -14,6 +14,7 @@ #include <linux/seq_file.h> #include <linux/stacktrace.h> #include <linux/string.h> +#include <trace/events/error_report.h>
#include <asm/kfence.h>
@@ -248,6 +249,7 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r show_regs(regs); else dump_stack_print_info(KERN_ERR); + trace_error_report_end(ERROR_DETECTOR_KFENCE, address); pr_err("==================================================================\n");
lockdep_on();
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.12-rc1 commit d3a61f745e0d089a2484740283a434deb6dd4eb5 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Make it possible to trace KASAN error reporting. A good usecase is watching for trace events from the userspace to detect and process memory corruption reports from the kernel.
Link: https://lkml.kernel.org/r/20210121131915.1331302-4-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Suggested-by: Marco Elver elver@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Petr Mladek pmladek@suse.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/kasan/report.c [Peng Liu: dependent on db3de8f759c8(rename print_shadow_for_address to print_memory_metadata), adjust context] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kasan/report.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 00a53f1355ae..5ebad04b5c3b 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -30,6 +30,7 @@ #include <linux/module.h> #include <linux/sched/task_stack.h> #include <linux/uaccess.h> +#include <trace/events/error_report.h>
#include <asm/sections.h>
@@ -90,8 +91,9 @@ static void start_report(unsigned long *flags) pr_err("==================================================================\n"); }
-static void end_report(unsigned long *flags) +static void end_report(unsigned long *flags, unsigned long addr) { + trace_error_report_end(ERROR_DETECTOR_KASAN, addr); pr_err("==================================================================\n"); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); spin_unlock_irqrestore(&report_lock, *flags); @@ -504,7 +506,7 @@ void kasan_report_invalid_free(void *object, unsigned long ip) print_address_description(object, tag); pr_err("\n"); print_shadow_for_address(object); - end_report(&flags); + end_report(&flags, (unsigned long)object); }
static void __kasan_report(unsigned long addr, size_t size, bool is_write, @@ -549,7 +551,7 @@ static void __kasan_report(unsigned long addr, size_t size, bool is_write, dump_stack(); }
- end_report(&flags); + end_report(&flags, addr); }
bool kasan_report(unsigned long addr, size_t size, bool is_write,
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc3 commit 702b16d724a61cb97461f403d7a2da29324471b3 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Use %td for ptrdiff_t.
Link: https://lkml.kernel.org/r/3abbe4c9-16ad-c168-a90f-087978ccd8f7@csgroup.eu Link: https://lkml.kernel.org/r/20210303121157.3430807-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Reported-by: Christophe Leroy christophe.leroy@csgroup.eu Reviewed-by: Alexander Potapenko glider@google.com Cc: Dmitriy Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Cc: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/report.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/kfence/report.c b/mm/kfence/report.c index ab83d5a59bb1..519f037720f5 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -116,12 +116,12 @@ void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *met lockdep_assert_held(&meta->lock);
if (meta->state == KFENCE_OBJECT_UNUSED) { - seq_con_printf(seq, "kfence-#%zd unused\n", meta - kfence_metadata); + seq_con_printf(seq, "kfence-#%td unused\n", meta - kfence_metadata); return; }
seq_con_printf(seq, - "kfence-#%zd [0x%p-0x%p" + "kfence-#%td [0x%p-0x%p" ", size=%d, cache=%s] allocated by task %d:\n", meta - kfence_metadata, (void *)start, (void *)(start + size - 1), size, (cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid); @@ -204,7 +204,7 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r
pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write), (void *)stack_entries[skipnr]); - pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%zd):\n", + pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%td):\n", get_access_type(is_write), (void *)address, left_of_object ? meta->addr - address : address - meta->addr, left_of_object ? "left" : "right", object_index); @@ -213,14 +213,14 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r case KFENCE_ERROR_UAF: pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write), (void *)stack_entries[skipnr]); - pr_err("Use-after-free %s at 0x%p (in kfence-#%zd):\n", + pr_err("Use-after-free %s at 0x%p (in kfence-#%td):\n", get_access_type(is_write), (void *)address, object_index); break; case KFENCE_ERROR_CORRUPTION: pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]); pr_err("Corrupted memory at 0x%p ", (void *)address); print_diff_canary(address, 16, meta); - pr_cont(" (in kfence-#%zd):\n", object_index); + pr_cont(" (in kfence-#%td):\n", object_index); break; case KFENCE_ERROR_INVALID: pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write), @@ -230,7 +230,7 @@ void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *r break; case KFENCE_ERROR_INVALID_FREE: pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]); - pr_err("Invalid free of 0x%p (in kfence-#%zd):\n", (void *)address, + pr_err("Invalid free of 0x%p (in kfence-#%td):\n", (void *)address, object_index); break; }
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc3 commit df3ae2c9941d38106afd67d7816b58f6dc7405e8 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
cache_alloc_debugcheck_after() performs checks on an object, including adjusting the returned pointer. None of this should apply to KFENCE objects. While for non-bulk allocations, the checks are skipped when we allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after() is called via cache_alloc_debugcheck_after_bulk().
Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.
Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/slab.c b/mm/slab.c index 0b4be897676a..43104a23ebff 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2991,7 +2991,7 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep, gfp_t flags, void *objp, unsigned long caller) { WARN_ON_ONCE(cachep->ctor && (flags & __GFP_ZERO)); - if (!objp) + if (!objp || is_kfence_address(objp)) return objp; if (cachep->flags & SLAB_POISON) { check_poison_obj(cachep, objp);
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc3 commit 0aa41cae92c1e2e61ae5b3a2dde8e674172e40ac category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Some architectures prefix all functions with a constant string ('.' on ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in <asm/kfence.h>, so that get_stack_skipnr() can work properly.
Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f058@csgroup.eu Link: https://lkml.kernel.org/r/20210304144000.1148590-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Reported-by: Christophe Leroy christophe.leroy@csgroup.eu Tested-by: Christophe Leroy christophe.leroy@csgroup.eu Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/report.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/mm/kfence/report.c b/mm/kfence/report.c index 519f037720f5..e3f71451ad9e 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -20,6 +20,11 @@
#include "kfence.h"
+/* May be overridden by <asm/kfence.h>. */ +#ifndef ARCH_FUNC_PREFIX +#define ARCH_FUNC_PREFIX "" +#endif + extern bool no_hash_pointers;
/* Helper function to either print to a seq_file or to console. */ @@ -67,8 +72,9 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries for (skipnr = 0; skipnr < num_entries; skipnr++) { int len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skipnr]);
- if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") || - !strncmp(buf, "__slab_free", len)) { + if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfence_") || + str_has_prefix(buf, ARCH_FUNC_PREFIX "__kfence_") || + !strncmp(buf, ARCH_FUNC_PREFIX "__slab_free", len)) { /* * In case of tail calls from any of the below * to any of the above. @@ -77,10 +83,10 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries }
/* Also the *_bulk() variants by only checking prefixes. */ - if (str_has_prefix(buf, "kfree") || - str_has_prefix(buf, "kmem_cache_free") || - str_has_prefix(buf, "__kmalloc") || - str_has_prefix(buf, "kmem_cache_alloc")) + if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfree") || + str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_free") || + str_has_prefix(buf, ARCH_FUNC_PREFIX "__kmalloc") || + str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_alloc")) goto found; } if (fallback < num_entries)
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc5 commit 9551158069ba8fcc893798d42dc4f978b62ef60f category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Because memblock allocations are registered with kmemleak, the KFENCE pool was seen by kmemleak as one large object. Later allocations through kfence_alloc() that were registered with kmemleak via slab_post_alloc_hook() would then overlap and trigger a warning. Therefore, once the pool is initialized, we can remove (free) it from kmemleak again, since it should be treated as allocator-internal and be seen as "free memory".
The second problem is that kmemleak is passed the rounded size, and not the originally requested size, which is also the size of KFENCE objects. To avoid kmemleak scanning past the end of an object and trigger a KFENCE out-of-bounds error, fix the size if it is a KFENCE object.
For simplicity, to avoid a call to kfence_ksize() in slab_post_alloc_hook() (and avoid new IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) guard), just call kfence_ksize() in mm/kmemleak.c:create_object().
Link: https://lkml.kernel.org/r/20210317084740.3099921-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Reported-by: Luis Henriques lhenriques@suse.de Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Luis Henriques lhenriques@suse.de Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 9 +++++++++ mm/kmemleak.c | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 3b8ec938470a..d53c91f881a4 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -12,6 +12,7 @@ #include <linux/debugfs.h> #include <linux/kcsan-checks.h> #include <linux/kfence.h> +#include <linux/kmemleak.h> #include <linux/list.h> #include <linux/lockdep.h> #include <linux/memblock.h> @@ -480,6 +481,14 @@ static bool __init kfence_init_pool(void) addr += 2 * PAGE_SIZE; }
+ /* + * The pool is live and will never be deallocated from this point on. + * Remove the pool object from the kmemleak object tree, as it would + * otherwise overlap with allocations returned by kfence_alloc(), which + * are registered with kmemleak through the slab post-alloc hook. + */ + kmemleak_free(__kfence_pool); + return true;
err: diff --git a/mm/kmemleak.c b/mm/kmemleak.c index c0014d3b91c1..fe6e3ae8e8c6 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -97,6 +97,7 @@ #include <linux/atomic.h>
#include <linux/kasan.h> +#include <linux/kfence.h> #include <linux/kmemleak.h> #include <linux/memory_hotplug.h>
@@ -589,7 +590,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, atomic_set(&object->use_count, 1); object->flags = OBJECT_ALLOCATED; object->pointer = ptr; - object->size = size; + object->size = kfence_ksize((void *)ptr) ?: size; object->excess_ref = 0; object->min_count = min_count; object->count = 0; /* white color initially */
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.12-rc7 commit 6a77d38efcda40f555a920909eab22ee0917fd0d category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
On systems with KPTI enabled, we can currently observe the following warning:
BUG: using smp_processor_id() in preemptible caller is invalidate_user_asid+0x13/0x50 CPU: 6 PID: 1075 Comm: dmesg Not tainted 5.12.0-rc4-gda4a2b1a5479-kfence_1+ #1 Hardware name: Hewlett-Packard HP Pro 3500 Series/2ABF, BIOS 8.11 10/24/2012 Call Trace: dump_stack+0x7f/0xad check_preemption_disabled+0xc8/0xd0 invalidate_user_asid+0x13/0x50 flush_tlb_one_kernel+0x5/0x20 kfence_protect+0x56/0x80 ...
While it normally makes sense to require preemption to be off, so that the expected CPU's TLB is flushed and not another, in our case it really is best-effort (see comments in kfence_protect_page()).
Avoid the warning by disabling preemption around flush_tlb_one_kernel().
Link: https://lore.kernel.org/lkml/YGIDBAboELGgMgXy@elver.google.com/ Link: https://lkml.kernel.org/r/20210330065737.652669-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Reported-by: Tomi Sarvela tomi.p.sarvela@intel.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/x86/include/asm/kfence.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h index 97bbb4a9083a..05b48b33baf0 100644 --- a/arch/x86/include/asm/kfence.h +++ b/arch/x86/include/asm/kfence.h @@ -56,8 +56,13 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect) else set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
- /* Flush this CPU's TLB. */ + /* + * Flush this CPU's TLB, assuming whoever did the allocation/free is + * likely to continue running on this CPU. + */ + preempt_disable(); flush_tlb_one_kernel(addr); + preempt_enable(); return true; }
From: Christophe Leroy christophe.leroy@csgroup.eu
mainline inclusion from mainline-v5.11-rc1 commit 035b19a15a98907916a42a6b1d025877c42f10ad category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Since commit 2b279c0348af ("powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC"), there is no real situation where mapping without BATs is required.
In order to simplify memory handling, always map kernel text and rodata with BATs even when "nobats" kernel parameter is set.
Also fix the 603 TLB miss exceptions that don't require anymore kernel page table if DEBUG_PAGEALLOC.
Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/da51f7ec632825a4ce43290a904aad61648408c0.160628501... Conflicts: arch/powerpc/kernel/head_book3s_32.S [Peng Liu: adjust context] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/powerpc/kernel/head_book3s_32.S | 4 ++-- arch/powerpc/mm/book3s32/mmu.c | 8 +++----- 2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 96b45901da64..b5df032726a7 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -456,13 +456,13 @@ InstructionTLBMiss: */ /* Get PTE (linux-style) and check access */ mfspr r3,SPRN_IMISS -#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) +#ifdef CONFIG_MODULES lis r1, TASK_SIZE@h /* check if kernel address */ cmplw 0,r1,r3 #endif mfspr r2, SPRN_SPRG_PGDIR li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER -#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) +#ifdef CONFIG_MODULES bgt- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c index a59e7ec98180..5c60dcade90a 100644 --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c @@ -157,11 +157,9 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) unsigned long done; unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
- if (__map_without_bats) { - pr_debug("RAM mapped without BATs\n"); - return base; - } - if (debug_pagealloc_enabled()) { + + if (debug_pagealloc_enabled() || __map_without_bats) { + pr_debug_once("Read-Write memory mapped without BATs\n"); if (base >= border) return base; if (top >= border)
From: Christophe Leroy christophe.leroy@csgroup.eu
mainline inclusion from mainline-v5.13-rc1 commit 90cbac0e995dd92f7bcf82f74aa50250bf194a4a category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Add architecture specific implementation details for KFENCE and enable KFENCE for the ppc32 architecture. In particular, this implements the required interface in <asm/kfence.h>.
KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the Read/Write linear map to be mapped at page granularity.
Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Acked-by: Marco Elver elver@google.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/8dfe1bd2abde26337c1d8c1ad0acfcc82185e0d5.161486844... Conflicts: arch/powerpc/Kconfig [Peng Liu: adjust context] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/powerpc/Kconfig | 13 ++++++------ arch/powerpc/include/asm/kfence.h | 33 +++++++++++++++++++++++++++++++ arch/powerpc/mm/book3s32/mmu.c | 2 +- arch/powerpc/mm/fault.c | 7 ++++++- arch/powerpc/mm/init_32.c | 3 +++ arch/powerpc/mm/mmu_decl.h | 5 +++++ arch/powerpc/mm/nohash/8xx.c | 4 ++-- 7 files changed, 57 insertions(+), 10 deletions(-) create mode 100644 arch/powerpc/include/asm/kfence.h
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2288b8d52af8..81a629c5133c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -184,6 +184,7 @@ config PPC select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 select HAVE_ARCH_KGDB + select HAVE_ARCH_KFENCE if PPC32 select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT select HAVE_ARCH_NVRAM_OPS @@ -798,7 +799,7 @@ config THREAD_SHIFT config DATA_SHIFT_BOOL bool "Set custom data alignment" depends on ADVANCED_OPTIONS - depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC + depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && \ (!PIN_TLB_TEXT || !STRICT_KERNEL_RWX)) help @@ -811,13 +812,13 @@ config DATA_SHIFT_BOOL config DATA_SHIFT int "Data shift" if DATA_SHIFT_BOOL default 24 if STRICT_KERNEL_RWX && PPC64 - range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_BOOK3S_32 - range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_8xx + range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && PPC_BOOK3S_32 + range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && PPC_8xx default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32 - default 18 if DEBUG_PAGEALLOC && PPC_BOOK3S_32 + default 18 if (DEBUG_PAGEALLOC || KFENCE) && PPC_BOOK3S_32 default 23 if STRICT_KERNEL_RWX && PPC_8xx - default 23 if DEBUG_PAGEALLOC && PPC_8xx && PIN_TLB_DATA - default 19 if DEBUG_PAGEALLOC && PPC_8xx + default 23 if (DEBUG_PAGEALLOC || KFENCE) && PPC_8xx && PIN_TLB_DATA + default 19 if (DEBUG_PAGEALLOC || KFENCE) && PPC_8xx default PPC_PAGE_SHIFT help On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO. diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h new file mode 100644 index 000000000000..a9846b68c6b9 --- /dev/null +++ b/arch/powerpc/include/asm/kfence.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * powerpc KFENCE support. + * + * Copyright (C) 2020 CS GROUP France + */ + +#ifndef __ASM_POWERPC_KFENCE_H +#define __ASM_POWERPC_KFENCE_H + +#include <linux/mm.h> +#include <asm/pgtable.h> + +static inline bool arch_kfence_init_pool(void) +{ + return true; +} + +static inline bool kfence_protect_page(unsigned long addr, bool protect) +{ + pte_t *kpte = virt_to_kpte(addr); + + if (protect) { + pte_update(&init_mm, addr, kpte, _PAGE_PRESENT, 0, 0); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + } else { + pte_update(&init_mm, addr, kpte, 0, _PAGE_PRESENT, 0); + } + + return true; +} + +#endif /* __ASM_POWERPC_KFENCE_H */ diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c index 5c60dcade90a..f23e4295214d 100644 --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c @@ -158,7 +158,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
- if (debug_pagealloc_enabled() || __map_without_bats) { + if (debug_pagealloc_enabled_or_kfence() || __map_without_bats) { pr_debug_once("Read-Write memory mapped without BATs\n"); if (base >= border) return base; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 55fdba8ba21c..88ba0c65941c 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -32,6 +32,7 @@ #include <linux/context_tracking.h> #include <linux/hugetlb.h> #include <linux/uaccess.h> +#include <linux/kfence.h>
#include <asm/firmware.h> #include <asm/page.h> @@ -417,8 +418,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, * take a page fault to a kernel address or a page fault to a user * address outside of dedicated places */ - if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) + if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) { + if (kfence_handle_page_fault(address, is_write, regs)) + return 0; + return SIGSEGV; + }
/* * If we're in an interrupt, have no user context or are running diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c index 02c7db4087cb..3d690be48e84 100644 --- a/arch/powerpc/mm/init_32.c +++ b/arch/powerpc/mm/init_32.c @@ -97,6 +97,9 @@ static void __init MMU_setup(void) if (IS_ENABLED(CONFIG_PPC_8xx)) return;
+ if (IS_ENABLED(CONFIG_KFENCE)) + __map_without_ltlbs = 1; + if (debug_pagealloc_enabled()) __map_without_ltlbs = 1;
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index a3d35c82f875..63ab2072bbc4 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -194,3 +194,8 @@ void ptdump_check_wx(void); #else static inline void ptdump_check_wx(void) { } #endif + +static inline bool debug_pagealloc_enabled_or_kfence(void) +{ + return IS_ENABLED(CONFIG_KFENCE) || debug_pagealloc_enabled(); +} diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c index 231ca95f9ffb..cf35643a74cf 100644 --- a/arch/powerpc/mm/nohash/8xx.c +++ b/arch/powerpc/mm/nohash/8xx.c @@ -149,7 +149,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) { unsigned long etext8 = ALIGN(__pa(_etext), SZ_8M); unsigned long sinittext = __pa(_sinittext); - bool strict_boundary = strict_kernel_rwx_enabled() || debug_pagealloc_enabled(); + bool strict_boundary = strict_kernel_rwx_enabled() || debug_pagealloc_enabled_or_kfence(); unsigned long boundary = strict_boundary ? sinittext : etext8; unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
@@ -161,7 +161,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top) return 0;
mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, true); - if (debug_pagealloc_enabled()) { + if (debug_pagealloc_enabled_or_kfence()) { top = boundary; } else { mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL_TEXT, true);
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.13-rc1 commit 94868a1e127bbe0e03a4467f27196cd668cbc344 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
After an out-of-bounds accesses, zero the guard page before re-protecting in kfence_guarded_free(). On one hand this helps make the failure mode of subsequent out-of-bounds accesses more deterministic, but could also prevent certain information leaks.
Link: https://lkml.kernel.org/r/20210312121653.348518-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Acked-by: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Jann Horn jannh@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index d53c91f881a4..768dbd58170d 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -372,6 +372,7 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z
/* Restore page protection if there was an OOB access. */ if (meta->unprotected_page) { + memzero_explicit((void *)ALIGN_DOWN(meta->unprotected_page, PAGE_SIZE), PAGE_SIZE); kfence_protect(meta->unprotected_page); meta->unprotected_page = 0; }
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.13-rc1 commit 407f1d8c1b5f3ec66a6a3eb835d3b81c76440f4e category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Patch series "kfence: optimize timer scheduling", v2.
We have observed that mostly-idle systems with KFENCE enabled wake up otherwise idle CPUs, preventing such to enter a lower power state. Debugging revealed that KFENCE spends too much active time in toggle_allocation_gate().
While the first version of KFENCE was using all the right bits to be scheduling optimal, and thus power efficient, by simply using wait_event() + wake_up(), that code was unfortunately removed.
As KFENCE was exposed to various different configs and tests, the scheduling optimal code slowly disappeared. First because of hung task warnings, and finally because of deadlocks when an allocation is made by timer code with debug objects enabled. Clearly, the "fixes" were not too friendly for devices that want to be power efficient.
Therefore, let's try a little harder to fix the hung task and deadlock problems that we have with wait_event() + wake_up(), while remaining as scheduling friendly and power efficient as possible.
Crucially, we need to defer the wake_up() to an irq_work, avoiding any potential for deadlock.
The result with this series is that on the devices where we observed a power regression, power usage returns back to baseline levels.
This patch (of 3):
On mostly-idle systems, we have observed that toggle_allocation_gate() is a cause of frequent wake-ups, preventing an otherwise idle CPU to go into a lower power state.
A late change in KFENCE's development, due to a potential deadlock [1], required changing the scheduling-friendly wait_event_timeout() and wake_up() to an open-coded wait-loop using schedule_timeout(). [1] https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
To avoid unnecessary wake-ups, switch to using wait_event_timeout().
Unfortunately, we still cannot use a version with direct wake_up() in __kfence_alloc() due to the same potential for deadlock as in [1]. Instead, add a level of indirection via an irq_work that is scheduled if we determine that the kfence_timer requires a wake_up().
Link: https://lkml.kernel.org/r/20210421105132.3965998-1-elver@google.com Link: https://lkml.kernel.org/r/20210421105132.3965998-2-elver@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Jann Horn jannh@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Hillf Danton hdanton@sina.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- lib/Kconfig.kfence | 1 + mm/kfence/core.c | 45 +++++++++++++++++++++++++++++---------------- 2 files changed, 30 insertions(+), 16 deletions(-)
diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence index 78f50ccb3b45..e641add33947 100644 --- a/lib/Kconfig.kfence +++ b/lib/Kconfig.kfence @@ -7,6 +7,7 @@ menuconfig KFENCE bool "KFENCE: low-overhead sampling-based memory safety error detector" depends on HAVE_ARCH_KFENCE && (SLAB || SLUB) select STACKTRACE + select IRQ_WORK help KFENCE is a low-overhead sampling-based detector of heap out-of-bounds access, use-after-free, and invalid-free errors. KFENCE is designed diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 768dbd58170d..235d726f88bc 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -10,6 +10,7 @@ #include <linux/atomic.h> #include <linux/bug.h> #include <linux/debugfs.h> +#include <linux/irq_work.h> #include <linux/kcsan-checks.h> #include <linux/kfence.h> #include <linux/kmemleak.h> @@ -587,6 +588,17 @@ late_initcall(kfence_debugfs_init);
/* === Allocation Gate Timer ================================================ */
+#ifdef CONFIG_KFENCE_STATIC_KEYS +/* Wait queue to wake up allocation-gate timer task. */ +static DECLARE_WAIT_QUEUE_HEAD(allocation_wait); + +static void wake_up_kfence_timer(struct irq_work *work) +{ + wake_up(&allocation_wait); +} +static DEFINE_IRQ_WORK(wake_up_kfence_timer_work, wake_up_kfence_timer); +#endif + /* * Set up delayed work, which will enable and disable the static key. We need to * use a work queue (rather than a simple timer), since enabling and disabling a @@ -604,25 +616,13 @@ static void toggle_allocation_gate(struct work_struct *work) if (!READ_ONCE(kfence_enabled)) return;
- /* Enable static key, and await allocation to happen. */ atomic_set(&kfence_allocation_gate, 0); #ifdef CONFIG_KFENCE_STATIC_KEYS + /* Enable static key, and await allocation to happen. */ static_branch_enable(&kfence_allocation_key); - /* - * Await an allocation. Timeout after 1 second, in case the kernel stops - * doing allocations, to avoid stalling this worker task for too long. - */ - { - unsigned long end_wait = jiffies + HZ; - - do { - set_current_state(TASK_UNINTERRUPTIBLE); - if (atomic_read(&kfence_allocation_gate) != 0) - break; - schedule_timeout(1); - } while (time_before(jiffies, end_wait)); - __set_current_state(TASK_RUNNING); - } + + wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), HZ); + /* Disable static key and reset timer. */ static_branch_disable(&kfence_allocation_key); #endif @@ -729,6 +729,19 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) */ if (atomic_read(&kfence_allocation_gate) || atomic_inc_return(&kfence_allocation_gate) > 1) return NULL; +#ifdef CONFIG_KFENCE_STATIC_KEYS + /* + * waitqueue_active() is fully ordered after the update of + * kfence_allocation_gate per atomic_inc_return(). + */ + if (waitqueue_active(&allocation_wait)) { + /* + * Calling wake_up() here may deadlock when allocations happen + * from within timer code. Use an irq_work to defer it. + */ + irq_work_queue(&wake_up_kfence_timer_work); + } +#endif
if (!READ_ONCE(kfence_enabled)) return NULL;
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.13-rc1 commit 37c9284f6932b915043717703d6496dfd59c85f5 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
The allocation wait timeout was initially added because of warnings due to CONFIG_DETECT_HUNG_TASK=y [1]. While the 1 sec timeout is sufficient to resolve the warnings (given the hung task timeout must be 1 sec or larger) it may cause unnecessary wake-ups if the system is idle:
https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJi...
Fix it by computing the timeout duration in terms of the current sysctl_hung_task_timeout_secs value.
Link: https://lkml.kernel.org/r/20210421105132.3965998-3-elver@google.com Signed-off-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Hillf Danton hdanton@sina.com Cc: Jann Horn jannh@google.com Cc: Mark Rutland mark.rutland@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 235d726f88bc..9742649f3f88 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -20,6 +20,7 @@ #include <linux/moduleparam.h> #include <linux/random.h> #include <linux/rcupdate.h> +#include <linux/sched/sysctl.h> #include <linux/seq_file.h> #include <linux/slab.h> #include <linux/spinlock.h> @@ -621,7 +622,16 @@ static void toggle_allocation_gate(struct work_struct *work) /* Enable static key, and await allocation to happen. */ static_branch_enable(&kfence_allocation_key);
- wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), HZ); + if (sysctl_hung_task_timeout_secs) { + /* + * During low activity with no allocations we might wait a + * while; let's avoid the hung task warning. + */ + wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), + sysctl_hung_task_timeout_secs * HZ / 2); + } else { + wait_event(allocation_wait, atomic_read(&kfence_allocation_gate)); + }
/* Disable static key and reset timer. */ static_branch_disable(&kfence_allocation_key);
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.13-rc1 commit 36f0b35d0894576fe63268ede80d9f5aa140be09 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Use the power-efficient work queue, to avoid the pathological case where we keep pinning ourselves on the same possibly idle CPU on systems that want to be power-efficient (https://lwn.net/Articles/731052/).
Link: https://lkml.kernel.org/r/20210421105132.3965998-4-elver@google.com Signed-off-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Hillf Danton hdanton@sina.com Cc: Jann Horn jannh@google.com Cc: Mark Rutland mark.rutland@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 9742649f3f88..e18fbbd5d9b4 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -636,7 +636,8 @@ static void toggle_allocation_gate(struct work_struct *work) /* Disable static key and reset timer. */ static_branch_disable(&kfence_allocation_key); #endif - schedule_delayed_work(&kfence_timer, msecs_to_jiffies(kfence_sample_interval)); + queue_delayed_work(system_power_efficient_wq, &kfence_timer, + msecs_to_jiffies(kfence_sample_interval)); } static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
@@ -665,7 +666,7 @@ void __init kfence_init(void) }
WRITE_ONCE(kfence_enabled, true); - schedule_delayed_work(&kfence_timer, 0); + queue_delayed_work(system_power_efficient_wq, &kfence_timer, 0); pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, (void *)(__kfence_pool + KFENCE_POOL_SIZE));
From: Jisheng Zhang Jisheng.Zhang@synaptics.com
mainline inclusion from mainline-v5.13-rc4 commit e69012400b0cb42b2070748322cb72f9effec00f category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
When we added KFENCE support for arm64, we intended that it would force the entire linear map to be mapped at page granularity, but we only enforced this in arch_add_memory() and not in map_mem(), so memory mapped at boot time can be mapped at a larger granularity.
When booting a kernel with KFENCE=y and RODATA_FULL=n, this results in the following WARNING at boot:
[ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at mm/memory.c:2462 apply_to_pmd_range+0xec/0x190 [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc1+ #10 [ 0.000000] Hardware name: linux,dummy-virt (DT) [ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--) [ 0.000000] pc : apply_to_pmd_range+0xec/0x190 [ 0.000000] lr : __apply_to_page_range+0x94/0x170 [ 0.000000] sp : ffffffc010573e20 [ 0.000000] x29: ffffffc010573e20 x28: ffffff801f400000 x27: ffffff801f401000 [ 0.000000] x26: 0000000000000001 x25: ffffff801f400fff x24: ffffffc010573f28 [ 0.000000] x23: ffffffc01002b710 x22: ffffffc0105fa450 x21: ffffffc010573ee4 [ 0.000000] x20: ffffff801fffb7d0 x19: ffffff801f401000 x18: 00000000fffffffe [ 0.000000] x17: 000000000000003f x16: 000000000000000a x15: ffffffc01060b940 [ 0.000000] x14: 0000000000000000 x13: 0098968000000000 x12: 0000000098968000 [ 0.000000] x11: 0000000000000000 x10: 0000000098968000 x9 : 0000000000000001 [ 0.000000] x8 : 0000000000000000 x7 : ffffffc010573ee4 x6 : 0000000000000001 [ 0.000000] x5 : ffffffc010573f28 x4 : ffffffc01002b710 x3 : 0000000040000000 [ 0.000000] x2 : ffffff801f5fffff x1 : 0000000000000001 x0 : 007800005f400705 [ 0.000000] Call trace: [ 0.000000] apply_to_pmd_range+0xec/0x190 [ 0.000000] __apply_to_page_range+0x94/0x170 [ 0.000000] apply_to_page_range+0x10/0x20 [ 0.000000] __change_memory_common+0x50/0xdc [ 0.000000] set_memory_valid+0x30/0x40 [ 0.000000] kfence_init_pool+0x9c/0x16c [ 0.000000] kfence_init+0x20/0x98 [ 0.000000] start_kernel+0x284/0x3f8
Fixes: 840b23986344 ("arm64, kfence: enable KFENCE for ARM64") Cc: stable@vger.kernel.org # 5.12.x Signed-off-by: Jisheng Zhang Jisheng.Zhang@synaptics.com Acked-by: Mark Rutland mark.rutland@arm.com Acked-by: Marco Elver elver@google.com Tested-by: Marco Elver elver@google.com Link: https://lore.kernel.org/r/20210525104551.2ec37f77@xhacker.debian Signed-off-by: Catalin Marinas catalin.marinas@arm.com Conflicts: arch/arm64/mm/mmu.c [Peng Liu: adjust context] Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm64/mm/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 347c929fb3a7..b6a9895d6655 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -492,7 +492,8 @@ static void __init map_mem(pgd_t *pgdp) int flags = 0; u64 i;
- if (rodata_full || crash_mem_map || debug_pagealloc_enabled()) + if (rodata_full || crash_mem_map || debug_pagealloc_enabled() || + IS_ENABLED(CONFIG_KFENCE)) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
/*
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.13-rc5 commit 8fd0e995cc7b6a7a8a40bc03d52a2cd445beeff4 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an allocation counts towards load. However, for KFENCE, this does not make any sense, since there is no busy work we're awaiting.
Instead, use TASK_IDLE via wait_event_idle() to not count towards load.
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565 Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com Fixes: 407f1d8c1b5f ("kfence: await for allocation using wait_event") Signed-off-by: Marco Elver elver@google.com Cc: Mel Gorman mgorman@suse.de Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: David Laight David.Laight@ACULAB.COM Cc: Hillf Danton hdanton@sina.com Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index e18fbbd5d9b4..4d21ac44d5d3 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -627,10 +627,10 @@ static void toggle_allocation_gate(struct work_struct *work) * During low activity with no allocations we might wait a * while; let's avoid the hung task warning. */ - wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), - sysctl_hung_task_timeout_secs * HZ / 2); + wait_event_idle_timeout(allocation_wait, atomic_read(&kfence_allocation_gate), + sysctl_hung_task_timeout_secs * HZ / 2); } else { - wait_event(allocation_wait, atomic_read(&kfence_allocation_gate)); + wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate)); }
/* Disable static key and reset timer. */
From: Hyeonggon Yoo 42.hyeyoo@gmail.com
mainline inclusion from mainline-v5.14-rc1 commit 588c7fa022d7b2361500ead5660d9a1a2ecd9b7d category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Currently when size is not supported by kmalloc_index, compiler will generate a run-time BUG() while compile-time error is also possible, and better. So change BUG to BUILD_BUG_ON_MSG to make compile-time check possible.
Also remove code that allocates more than 32MB because current implementation supports only up to 32MB.
[42.hyeyoo@gmail.com: fix support for clang 10] Link: https://lkml.kernel.org/r/20210518181247.GA10062@hyeyoo [vbabka@suse.cz: fix false-positive assert in kernel/bpf/local_storage.c] Link: https://lkml.kernel.org/r/bea97388-01df-8eac-091b-a3c89b4a4a09@suse.czLink: https://lkml.kernel.org/r/20210511173448.GA54466@hyeyoo [elver@google.com: kfence fix] Link: https://lkml.kernel.org/r/20210512195227.245000695c9014242e9a00e5@linux-foun...
Signed-off-by: Hyeonggon Yoo 42.hyeyoo@gmail.com Signed-off-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Marco Elver elver@google.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Marco Elver elver@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- include/linux/slab.h | 17 ++++++++++++++--- mm/kfence/kfence_test.c | 5 +++-- mm/slab_common.c | 7 +++---- 3 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index dd6897f62010..b27f7372f365 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -342,8 +342,14 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) * 1 = 65 .. 96 bytes * 2 = 129 .. 192 bytes * n = 2^(n-1)+1 .. 2^n + * + * Note: __kmalloc_index() is compile-time optimized, and not runtime optimized; + * typical usage is via kmalloc_index() and therefore evaluated at compile-time. + * Callers where !size_is_constant should only be test modules, where runtime + * overheads of __kmalloc_index() can be tolerated. Also see kmalloc_slab(). */ -static __always_inline unsigned int kmalloc_index(size_t size) +static __always_inline unsigned int __kmalloc_index(size_t size, + bool size_is_constant) { if (!size) return 0; @@ -378,12 +384,17 @@ static __always_inline unsigned int kmalloc_index(size_t size) if (size <= 8 * 1024 * 1024) return 23; if (size <= 16 * 1024 * 1024) return 24; if (size <= 32 * 1024 * 1024) return 25; - if (size <= 64 * 1024 * 1024) return 26; - BUG(); + + if ((IS_ENABLED(CONFIG_CC_IS_GCC) || CONFIG_CLANG_VERSION >= 110000) + && !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) + BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); + else + BUG();
/* Will never be reached. Needed because the compiler may complain */ return -1; } +#define kmalloc_index(s) __kmalloc_index(s, true) #endif /* !CONFIG_SLOB */
void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __malloc; diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 4acf4251ee04..7f24b9bcb2ec 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -197,7 +197,7 @@ static void test_cache_destroy(void)
static inline size_t kmalloc_cache_alignment(size_t size) { - return kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)]->align; + return kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]->align; }
/* Must always inline to match stack trace against caller. */ @@ -267,7 +267,8 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat
if (is_kfence_address(alloc)) { struct page *page = virt_to_head_page(alloc); - struct kmem_cache *s = test_cache ?: kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)]; + struct kmem_cache *s = test_cache ?: + kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)];
/* * Verify that various helpers return the right values diff --git a/mm/slab_common.c b/mm/slab_common.c index 89022933dfa6..5f1a1c38a815 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -655,8 +655,8 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
/* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. - * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is - * kmalloc-67108864. + * kmalloc_index() supports up to 2^25=32MB, so the final entry of the table is + * kmalloc-32M. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { INIT_KMALLOC_INFO(0, 0), @@ -684,8 +684,7 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = { INIT_KMALLOC_INFO(4194304, 4M), INIT_KMALLOC_INFO(8388608, 8M), INIT_KMALLOC_INFO(16777216, 16M), - INIT_KMALLOC_INFO(33554432, 32M), - INIT_KMALLOC_INFO(67108864, 64M) + INIT_KMALLOC_INFO(33554432, 32M) };
/*
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.14-rc1 commit 3d1a90ab0ed93362ec8ac85cf291243c87260c21 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Unconditionally use unbound work queue, and not just if wq_power_efficient is true. Because if the system is idle, KFENCE may wait, and by being run on the unbound work queue, we permit the scheduler to make better scheduling decisions and not require pinning KFENCE to the same CPU upon waking up.
Link: https://lkml.kernel.org/r/20210521111630.472579-1-elver@google.com Fixes: 36f0b35d0894 ("kfence: use power-efficient work queue to run delayed work") Signed-off-by: Marco Elver elver@google.com Reported-by: Hillf Danton hdanton@sina.com Reviewed-by: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 4d21ac44d5d3..d7666ace9d2e 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -636,7 +636,7 @@ static void toggle_allocation_gate(struct work_struct *work) /* Disable static key and reset timer. */ static_branch_disable(&kfence_allocation_key); #endif - queue_delayed_work(system_power_efficient_wq, &kfence_timer, + queue_delayed_work(system_unbound_wq, &kfence_timer, msecs_to_jiffies(kfence_sample_interval)); } static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate); @@ -666,7 +666,7 @@ void __init kfence_init(void) }
WRITE_ONCE(kfence_enabled, true); - queue_delayed_work(system_power_efficient_wq, &kfence_timer, 0); + queue_delayed_work(system_unbound_wq, &kfence_timer, 0); pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, (void *)(__kfence_pool + KFENCE_POOL_SIZE));
From: Weizhao Ouyang o451686892@gmail.com
mainline inclusion from mainline-v5.14-rc3 commit 32ae8a0669392248a92d7545a7363004543f3932 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
kfence_test_init and kunit_init both use the same level late_initcall, which means if kfence_test_init linked ahead of kunit_init, kfence_test_init will get a NULL debugfs_rootdir as parent dentry, then kfence_test_init and kfence_debugfs_init both create a debugfs node named "kfence" under debugfs_mount->mnt_root, and it will throw out "debugfs: Directory 'kfence' with parent '/' already present!" with EEXIST. So kfence_test_init should be deferred.
Link: https://lkml.kernel.org/r/20210714113140.2949995-1-o451686892@gmail.com Signed-off-by: Weizhao Ouyang o451686892@gmail.com Tested-by: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/kfence_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 7f24b9bcb2ec..942cbc16ad26 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -852,7 +852,7 @@ static void kfence_test_exit(void) tracepoint_synchronize_unregister(); }
-late_initcall(kfence_test_init); +late_initcall_sync(kfence_test_init); module_exit(kfence_test_exit);
MODULE_LICENSE("GPL v2");
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.14-rc3 commit 235a85cb32bb123854ad31de46fdbf04c1d57cda category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Check the allocation size before toggling kfence_allocation_gate.
This way allocations that can't be served by KFENCE will not result in waiting for another CONFIG_KFENCE_SAMPLE_INTERVAL without allocating anything.
Link: https://lkml.kernel.org/r/20210714092222.1890268-1-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Suggested-by: Marco Elver elver@google.com Reviewed-by: Marco Elver elver@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Marco Elver elver@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index d7666ace9d2e..2623ff401a10 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -733,6 +733,13 @@ void kfence_shutdown_cache(struct kmem_cache *s)
void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { + /* + * Perform size check before switching kfence_allocation_gate, so that + * we don't disable KFENCE without making an allocation. + */ + if (size > PAGE_SIZE) + return NULL; + /* * allocation_gate only needs to become non-zero, so it doesn't make * sense to continue writing to it and pay the associated contention @@ -757,9 +764,6 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) if (!READ_ONCE(kfence_enabled)) return NULL;
- if (size > PAGE_SIZE) - return NULL; - return kfence_guarded_alloc(s, size, flags); }
From: Alexander Potapenko glider@google.com
mainline inclusion from mainline-v5.14-rc3 commit 236e9f1538523d3d380dda1cc99571d587058f37 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Allocation requests outside ZONE_NORMAL (MOVABLE, HIGHMEM or DMA) cannot be fulfilled by KFENCE, because KFENCE memory pool is located in a zone different from the requested one.
Because callers of kmem_cache_alloc() may actually rely on the allocation to reside in the requested zone (e.g. memory allocations done with __GFP_DMA must be DMAable), skip all allocations done with GFP_ZONEMASK and/or respective SLAB flags (SLAB_CACHE_DMA and SLAB_CACHE_DMA32).
Link: https://lkml.kernel.org/r/20210714092222.1890268-2-glider@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: Marco Elver elver@google.com Acked-by: Souptick Joarder jrdr.linux@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Souptick Joarder jrdr.linux@gmail.com Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/core.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 2623ff401a10..575c685aa642 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -740,6 +740,15 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) if (size > PAGE_SIZE) return NULL;
+ /* + * Skip allocations from non-default zones, including DMA. We cannot + * guarantee that pages in the KFENCE pool will have the requested + * properties (e.g. reside in DMAable memory). + */ + if ((flags & GFP_ZONEMASK) || + (s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) + return NULL; + /* * allocation_gate only needs to become non-zero, so it doesn't make * sense to continue writing to it and pay the associated contention
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.14-rc7 commit a7cb5d23eaea148f8582229846f8dfff192f05c3 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Originally the addr != NULL check was meant to take care of the case where __kfence_pool == NULL (KFENCE is disabled). However, this does not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE.
This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or any other faulting access with addr < KFENCE_POOL_SIZE. While the kernel would likely crash, the stack traces and report might be confusing due to double faults upon KFENCE's attempt to unprotect such an address.
Fix it by just checking that __kfence_pool != NULL instead.
Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver elver@google.com Reported-by: Kuan-Ying Lee Kuan-Ying.Lee@mediatek.com Acked-by: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- include/linux/kfence.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/linux/kfence.h b/include/linux/kfence.h index a70d1ea03532..3fe6dd8a18c1 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -51,10 +51,11 @@ extern atomic_t kfence_allocation_gate; static __always_inline bool is_kfence_address(const void *addr) { /* - * The non-NULL check is required in case the __kfence_pool pointer was - * never initialized; keep it in the slow-path after the range-check. + * The __kfence_pool != NULL check is required to deal with the case + * where __kfence_pool == NULL && addr < KFENCE_POOL_SIZE. Keep it in + * the slow-path after the range-check! */ - return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && addr); + return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && __kfence_pool); }
/**
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.15-rc1 commit 00e67bf030e74a01afab8e0109244b9b0d7e2e43 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
x86's <asm/tlbflush.h> only declares non-module accessible functions (such as flush_tlb_one_kernel) if !MODULE.
In preparation of including <asm/kfence.h> from the KFENCE test module, only define the helpers if !MODULE to avoid breaking the build with CONFIG_KFENCE_KUNIT_TEST=m.
Signed-off-by: Marco Elver elver@google.com Link: https://lore.kernel.org/r/YQJdarx6XSUQ1tFZ@elver.google.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/x86/include/asm/kfence.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h index 05b48b33baf0..ff5c7134a37a 100644 --- a/arch/x86/include/asm/kfence.h +++ b/arch/x86/include/asm/kfence.h @@ -8,6 +8,8 @@ #ifndef _ASM_X86_KFENCE_H #define _ASM_X86_KFENCE_H
+#ifndef MODULE + #include <linux/bug.h> #include <linux/kfence.h>
@@ -66,4 +68,6 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect) return true; }
+#endif /* !MODULE */ + #endif /* _ASM_X86_KFENCE_H */
From: Sven Schnelle svens@linux.ibm.com
mainline inclusion from mainline-v5.15-rc1 commit f99e12b21b84feb1fd9d845a15096772f1659461 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
s390 only reports the page address during a translation fault. To make the kfence unit tests pass, add a function that might be implemented by architectures to mask out address bits.
Signed-off-by: Sven Schnelle svens@linux.ibm.com Reviewed-by: Marco Elver elver@google.com Link: https://lore.kernel.org/r/20210728190254.3921642-3-hca@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/kfence_test.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 942cbc16ad26..eb6307c199ea 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -23,8 +23,15 @@ #include <linux/tracepoint.h> #include <trace/events/printk.h>
+#include <asm/kfence.h> + #include "kfence.h"
+/* May be overridden by <asm/kfence.h>. */ +#ifndef arch_kfence_test_address +#define arch_kfence_test_address(addr) (addr) +#endif + /* Report as observed from console. */ static struct { spinlock_t lock; @@ -82,6 +89,7 @@ static const char *get_access_type(const struct expect_report *r) /* Check observed report matches information in @r. */ static bool report_matches(const struct expect_report *r) { + unsigned long addr = (unsigned long)r->addr; bool ret = false; unsigned long flags; typeof(observed.lines) expect; @@ -131,22 +139,25 @@ static bool report_matches(const struct expect_report *r) switch (r->type) { case KFENCE_ERROR_OOB: cur += scnprintf(cur, end - cur, "Out-of-bounds %s at", get_access_type(r)); + addr = arch_kfence_test_address(addr); break; case KFENCE_ERROR_UAF: cur += scnprintf(cur, end - cur, "Use-after-free %s at", get_access_type(r)); + addr = arch_kfence_test_address(addr); break; case KFENCE_ERROR_CORRUPTION: cur += scnprintf(cur, end - cur, "Corrupted memory at"); break; case KFENCE_ERROR_INVALID: cur += scnprintf(cur, end - cur, "Invalid %s at", get_access_type(r)); + addr = arch_kfence_test_address(addr); break; case KFENCE_ERROR_INVALID_FREE: cur += scnprintf(cur, end - cur, "Invalid free of"); break; }
- cur += scnprintf(cur, end - cur, " 0x%p", (void *)r->addr); + cur += scnprintf(cur, end - cur, " 0x%p", (void *)addr);
spin_lock_irqsave(&observed.lock, flags); if (!report_available())
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.13-rc1 commit 9f961c2e08741579aa53095d0dbffbcb25a9ae66 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Do not show no_hash_pointers message multiple times if the option was passed more than once (e.g. via generated command line).
Signed-off-by: Marco Elver elver@google.com Reviewed-by: Petr Mladek pmladek@suse.com Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20210305194206.3165917-1-elver@google.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- lib/vsprintf.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c index dc7d6a269e2f..931d850891b6 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2178,6 +2178,9 @@ EXPORT_SYMBOL_GPL(no_hash_pointers);
static int __init no_hash_pointers_enable(char *str) { + if (no_hash_pointers) + return 0; + no_hash_pointers = true;
pr_warn("**********************************************************\n");
From: Vlastimil Babka vbabka@suse.cz
mainline inclusion from mainline-v5.13-rc1 commit a48849e2358ecf1a347a03b33dc27b9b2f25f8fd category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
We have several modifiers for plain pointers (%p, %px and %pK) and now also the no_hash_pointers boot parameter. The documentation should help to choose which variant to use. Importantly, we should discourage %px in favor of %p (with the new boot parameter when debugging), and stress that %pK should be only used for procfs and similar files, not dmesg buffer. This patch clarifies the documentation in that regard.
Signed-off-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: Petr Mladek pmladek@suse.com Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20210225164639.27212-1-vbabka@suse.cz Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- Documentation/core-api/printk-formats.rst | 26 ++++++++++++++++++++++- lib/vsprintf.c | 7 ++++-- 2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 93c3e48ff30d..dd7ea06eda03 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -79,7 +79,19 @@ Pointers printed without a specifier extension (i.e unadorned %p) are hashed to prevent leaking information about the kernel memory layout. This has the added benefit of providing a unique identifier. On 64-bit machines the first 32 bits are zeroed. The kernel will print ``(ptrval)`` until it -gathers enough entropy. If you *really* want the address see %px below. +gathers enough entropy. + +When possible, use specialised modifiers such as %pS or %pB (described below) +to avoid the need of providing an unhashed address that has to be interpreted +post-hoc. If not possible, and the aim of printing the address is to provide +more information for debugging, use %p and boot the kernel with the +``no_hash_pointers`` parameter during debugging, which will print all %p +addresses unmodified. If you *really* always want the unmodified address, see +%px below. + +If (and only if) you are printing addresses as a content of a virtual file in +e.g. procfs or sysfs (using e.g. seq_printf(), not printk()) read by a +userspace process, use the %pK modifier described below instead of %p or %px.
Error Pointers -------------- @@ -139,6 +151,11 @@ For printing kernel pointers which should be hidden from unprivileged users. The behaviour of %pK depends on the kptr_restrict sysctl - see Documentation/admin-guide/sysctl/kernel.rst for more details.
+This modifier is *only* intended when producing content of a file read by +userspace from e.g. procfs or sysfs, not for dmesg. Please refer to the +section about %p above for discussion about how to manage hashing pointers +in printk(). + Unmodified Addresses --------------------
@@ -153,6 +170,13 @@ equivalent to %lx (or %lu). %px is preferred because it is more uniquely grep'able. If in the future we need to modify the way the kernel handles printing pointers we will be better equipped to find the call sites.
+Before using %px, consider if using %p is sufficient together with enabling the +``no_hash_pointers`` kernel parameter during debugging sessions (see the %p +description above). One valid scenario for %px might be printing information +immediately before a panic, which prevents any sensitive information to be +exploited anyway, and with %px there would be no need to reproduce the panic +with no_hash_pointers. + Pointer Differences -------------------
diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 931d850891b6..a8d311811d90 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2271,7 +2271,9 @@ early_param("no_hash_pointers", no_hash_pointers_enable); * Implements a "recursive vsnprintf". * Do not use this feature without some mechanism to verify the * correctness of the format string and va_list arguments. - * - 'K' For a kernel pointer that should be hidden from unprivileged users + * - 'K' For a kernel pointer that should be hidden from unprivileged users. + * Use only for procfs, sysfs and similar files, not printk(); please + * read the documentation (path below) first. * - 'NF' For a netdev_features_t * - 'h[CDN]' For a variable-length buffer, it prints it as a hex string with * a certain separator (' ' by default): @@ -2310,7 +2312,8 @@ early_param("no_hash_pointers", no_hash_pointers_enable); * Without an option prints the full name of the node * f full name * P node name, including a possible unit address - * - 'x' For printing the address. Equivalent to "%lx". + * - 'x' For printing the address unmodified. Equivalent to "%lx". + * Please read the documentation (path below) before using! * - '[ku]s' For a BPF/tracing related format specifier, e.g. used out of * bpf_trace_printk() where [ku] prefix specifies either kernel (k) * or user (u) memory to probe, and:
From: Stephen Boyd swboyd@chromium.org
mainline inclusion from mainline-v5.14-rc1 commit 792702911f581f7793962fbeb99d5c3a1b28f4c3 category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Obscuring the pointers that slub shows when debugging makes for some confusing slub debug messages:
Padding overwritten. 0x0000000079f0674a-0x000000000d4dce17
Those addresses are hashed for kernel security reasons. If we're trying to be secure with slub_debug on the commandline we have some big problems given that we dump whole chunks of kernel memory to the kernel logs. Let's force on the no_hash_pointers commandline flag when slub_debug is on the commandline. This makes slub debug messages more meaningful and if by chance a kernel address is in some slub debug object dump we will have a better chance of figuring out what went wrong.
Note that we don't use %px in the slub code because we want to reduce the number of places that %px is used in the kernel. This also nicely prints a big fat warning at kernel boot if slub_debug is on the commandline so that we know that this kernel shouldn't be used on production systems.
[akpm@linux-foundation.org: fix build with CONFIG_SLUB_DEBUG=n]
Link: https://lkml.kernel.org/r/20210601182202.3011020-5-swboyd@chromium.org Signed-off-by: Stephen Boyd swboyd@chromium.org Acked-by: Vlastimil Babka vbabka@suse.cz Acked-by: Petr Mladek pmladek@suse.com Cc: Joe Perches joe@perches.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Muchun Song songmuchun@bytedance.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- include/linux/kernel.h | 2 ++ lib/vsprintf.c | 2 +- mm/slub.c | 20 +++++++++++++++++++- 3 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 189c51afc877..601ecfb13690 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -511,6 +511,8 @@ int sscanf(const char *, const char *, ...); extern __scanf(2, 0) int vsscanf(const char *, const char *, va_list);
+extern int no_hash_pointers_enable(char *str); + extern int get_option(char **str, int *pint); extern char *get_options(const char *str, int nints, int *ints); extern unsigned long long memparse(const char *ptr, char **retptr); diff --git a/lib/vsprintf.c b/lib/vsprintf.c index a8d311811d90..e5751a33dfa0 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2176,7 +2176,7 @@ char *fwnode_string(char *buf, char *end, struct fwnode_handle *fwnode, bool no_hash_pointers __ro_after_init; EXPORT_SYMBOL_GPL(no_hash_pointers);
-static int __init no_hash_pointers_enable(char *str) +int __init no_hash_pointers_enable(char *str) { if (no_hash_pointers) return 0; diff --git a/mm/slub.c b/mm/slub.c index 4d8ad3f9f8e0..865b5b967619 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -117,12 +117,26 @@ */
#ifdef CONFIG_SLUB_DEBUG + #ifdef CONFIG_SLUB_DEBUG_ON DEFINE_STATIC_KEY_TRUE(slub_debug_enabled); #else DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); #endif -#endif + +static inline bool __slub_debug_enabled(void) +{ + return static_branch_unlikely(&slub_debug_enabled); +} + +#else /* CONFIG_SLUB_DEBUG */ + +static inline bool __slub_debug_enabled(void) +{ + return false; +} + +#endif /* CONFIG_SLUB_DEBUG */
static inline bool kmem_cache_debug(struct kmem_cache *s) { @@ -4389,6 +4403,10 @@ void __init kmem_cache_init(void) if (debug_guardpage_minorder()) slub_max_order = 0;
+ /* Print slub debugging pointers without hashing */ + if (__slub_debug_enabled()) + no_hash_pointers_enable(NULL); + kmem_cache_node = &boot_kmem_cache_node; kmem_cache = &boot_kmem_cache;
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.15-rc1 commit c40c6e593bf978e232bae1fab81f4111f2e2c956 category: bugfix bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Fail kfence_test fast if KFENCE was disabled at boot, instead of each test case trying several seconds to allocate from KFENCE and failing. KUnit will fail all test cases if kunit_suite::init returns an error.
Even if KFENCE was disabled, we still want the test to fail, so that CI systems that parse KUnit output will alert on KFENCE being disabled (accidentally or otherwise).
Link: https://lkml.kernel.org/r/20210825105533.1247922-1-elver@google.com Signed-off-by: Marco Elver elver@google.com Reported-by: Kefeng Wang wangkefeng.wang@huawei.com Tested-by: Kefeng Wang wangkefeng.wang@huawei.com Acked-by: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/kfence/kfence_test.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index eb6307c199ea..f1690cf54199 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -800,6 +800,9 @@ static int test_init(struct kunit *test) unsigned long flags; int i;
+ if (!__kfence_pool) + return -EINVAL; + spin_lock_irqsave(&observed.lock, flags); for (i = 0; i < ARRAY_SIZE(observed.lines); i++) observed.lines[i][0] = '\0';
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.15-rc1 commit 4bbf04aa9aa88ae41205e387d35743a9bf5e933d category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
Record cpu and timestamp on allocations and frees, and show them in reports. Upon an error, this can help correlate earlier messages in the kernel log via allocation and free timestamps.
Link: https://lkml.kernel.org/r/20210714175312.2947941-1-elver@google.com Suggested-by: Joern Engel joern@purestorage.com Signed-off-by: Marco Elver elver@google.com Acked-by: Alexander Potapenko glider@google.com Acked-by: Joern Engel joern@purestorage.com Cc: Yuanyuan Zhong yzhong@purestorage.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- Documentation/dev-tools/kfence.rst | 98 ++++++++++++++++-------------- mm/kfence/core.c | 3 + mm/kfence/kfence.h | 2 + mm/kfence/report.c | 18 ++++-- 4 files changed, 70 insertions(+), 51 deletions(-)
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst index fdf04e741ea5..0fbe3308bf37 100644 --- a/Documentation/dev-tools/kfence.rst +++ b/Documentation/dev-tools/kfence.rst @@ -65,25 +65,27 @@ Error reports A typical out-of-bounds access looks like this::
================================================================== - BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b + BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234
- Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17): - test_out_of_bounds_read+0xa3/0x22b - kunit_try_run_case+0x51/0x85 + Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72): + test_out_of_bounds_read+0xa6/0x234 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507: - test_alloc+0xf3/0x25b - test_out_of_bounds_read+0x98/0x22b - kunit_try_run_case+0x51/0x85 + kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32 + + allocated by task 484 on cpu 0 at 32.919330s: + test_alloc+0xfe/0x738 + test_out_of_bounds_read+0x9b/0x234 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ==================================================================
The header of the report provides a short summary of the function involved in @@ -96,30 +98,32 @@ Use-after-free accesses are reported as:: ================================================================== BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
- Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24): + Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79): test_use_after_free_read+0xb3/0x143 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507: - test_alloc+0xf3/0x25b + kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32 + + allocated by task 488 on cpu 2 at 33.871326s: + test_alloc+0xfe/0x738 test_use_after_free_read+0x76/0x143 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- freed by task 507: + freed by task 488 on cpu 2 at 33.871358s: test_use_after_free_read+0xa8/0x143 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ==================================================================
KFENCE also reports on invalid frees, such as double-frees:: @@ -127,30 +131,32 @@ KFENCE also reports on invalid frees, such as double-frees:: ================================================================== BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
- Invalid free of 0xffffffffb6741000: + Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81): test_double_free+0xdc/0x171 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507: - test_alloc+0xf3/0x25b + kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32 + + allocated by task 490 on cpu 1 at 34.175321s: + test_alloc+0xfe/0x738 test_double_free+0x76/0x171 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- freed by task 507: + freed by task 490 on cpu 1 at 34.175348s: test_double_free+0xa8/0x171 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ==================================================================
KFENCE also uses pattern-based redzones on the other side of an object's guard @@ -160,23 +166,25 @@ These are reported on frees:: ================================================================== BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
- Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69): + Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156): test_kmalloc_aligned_oob_write+0xef/0x184 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507: - test_alloc+0xf3/0x25b + kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96 + + allocated by task 502 on cpu 7 at 42.159302s: + test_alloc+0xfe/0x738 test_kmalloc_aligned_oob_write+0x57/0x184 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30
- CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ==================================================================
For such errors, the address where the corruption occurred as well as the diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 575c685aa642..7a97db8bc8e7 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -20,6 +20,7 @@ #include <linux/moduleparam.h> #include <linux/random.h> #include <linux/rcupdate.h> +#include <linux/sched/clock.h> #include <linux/sched/sysctl.h> #include <linux/seq_file.h> #include <linux/slab.h> @@ -196,6 +197,8 @@ static noinline void metadata_update_state(struct kfence_metadata *meta, */ track->num_stack_entries = stack_trace_save(track->stack_entries, KFENCE_STACK_DEPTH, 1); track->pid = task_pid_nr(current); + track->cpu = raw_smp_processor_id(); + track->ts_nsec = local_clock(); /* Same source as printk timestamps. */
/* * Pairs with READ_ONCE() in diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 24065321ff8a..c1f23c61e5f9 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -36,6 +36,8 @@ enum kfence_object_state { /* Alloc/free tracking information. */ struct kfence_track { pid_t pid; + int cpu; + u64 ts_nsec; int num_stack_entries; unsigned long stack_entries[KFENCE_STACK_DEPTH]; }; diff --git a/mm/kfence/report.c b/mm/kfence/report.c index e3f71451ad9e..c0e01edfe6b6 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -100,6 +100,13 @@ static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadat bool show_alloc) { const struct kfence_track *track = show_alloc ? &meta->alloc_track : &meta->free_track; + u64 ts_sec = track->ts_nsec; + unsigned long rem_nsec = do_div(ts_sec, NSEC_PER_SEC); + + /* Timestamp matches printk timestamp format. */ + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", + show_alloc ? "allocated" : "freed", track->pid, + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
if (track->num_stack_entries) { /* Skip allocation/free internals stack. */ @@ -126,15 +133,14 @@ void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *met return; }
- seq_con_printf(seq, - "kfence-#%td [0x%p-0x%p" - ", size=%d, cache=%s] allocated by task %d:\n", - meta - kfence_metadata, (void *)start, (void *)(start + size - 1), size, - (cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid); + seq_con_printf(seq, "kfence-#%td: 0x%p-0x%p, size=%d, cache=%s\n\n", + meta - kfence_metadata, (void *)start, (void *)(start + size - 1), + size, (cache && cache->name) ? cache->name : "<destroyed>"); + kfence_print_stack(seq, meta, true);
if (meta->state == KFENCE_OBJECT_FREED) { - seq_con_printf(seq, "\nfreed by task %d:\n", meta->free_track.pid); + seq_con_printf(seq, "\n"); kfence_print_stack(seq, meta, false); } }
From: Kefeng Wang wangkefeng.wang@huawei.com
hulk inclusion category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
-----------------------------------------------
This function validates and invalidates PTE entries.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm/include/asm/set_memory.h | 5 ++++ arch/arm/mm/pageattr.c | 41 +++++++++++++++++++++++-------- 2 files changed, 36 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/set_memory.h b/arch/arm/include/asm/set_memory.h index a1ceff4295d3..b37e425ec3fd 100644 --- a/arch/arm/include/asm/set_memory.h +++ b/arch/arm/include/asm/set_memory.h @@ -11,11 +11,16 @@ int set_memory_ro(unsigned long addr, int numpages); int set_memory_rw(unsigned long addr, int numpages); int set_memory_x(unsigned long addr, int numpages); int set_memory_nx(unsigned long addr, int numpages); +int set_memory_valid(unsigned long addr, int numpages, int enable); #else static inline int set_memory_ro(unsigned long addr, int numpages) { return 0; } static inline int set_memory_rw(unsigned long addr, int numpages) { return 0; } static inline int set_memory_x(unsigned long addr, int numpages) { return 0; } static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } +static inline int set_memory_valid(unsigned long addr, int numpages, int enable) +{ + return 0; +} #endif
#ifdef CONFIG_STRICT_KERNEL_RWX diff --git a/arch/arm/mm/pageattr.c b/arch/arm/mm/pageattr.c index 9790ae3a8c68..7612a1c6b614 100644 --- a/arch/arm/mm/pageattr.c +++ b/arch/arm/mm/pageattr.c @@ -31,6 +31,24 @@ static bool in_range(unsigned long start, unsigned long size, return start >= range_start && start < range_end && size <= range_end - start; } +/* + * This function assumes that the range is mapped with PAGE_SIZE pages. + */ +static int __change_memory_common(unsigned long start, unsigned long size, + pgprot_t set_mask, pgprot_t clear_mask) +{ + struct page_change_data data; + int ret; + + data.set_mask = set_mask; + data.clear_mask = clear_mask; + + ret = apply_to_page_range(&init_mm, start, size, change_page_range, + &data); + + flush_tlb_kernel_range(start, start + size); + return ret; +}
static int change_memory_common(unsigned long addr, int numpages, pgprot_t set_mask, pgprot_t clear_mask) @@ -38,8 +56,6 @@ static int change_memory_common(unsigned long addr, int numpages, unsigned long start = addr & PAGE_MASK; unsigned long end = PAGE_ALIGN(addr) + numpages * PAGE_SIZE; unsigned long size = end - start; - int ret; - struct page_change_data data;
WARN_ON_ONCE(start != addr);
@@ -50,14 +66,7 @@ static int change_memory_common(unsigned long addr, int numpages, !in_range(start, size, VMALLOC_START, VMALLOC_END)) return -EINVAL;
- data.set_mask = set_mask; - data.clear_mask = clear_mask; - - ret = apply_to_page_range(&init_mm, start, size, change_page_range, - &data); - - flush_tlb_kernel_range(start, end); - return ret; + return __change_memory_common(start, size, set_mask, clear_mask); }
int set_memory_ro(unsigned long addr, int numpages) @@ -87,3 +96,15 @@ int set_memory_x(unsigned long addr, int numpages) __pgprot(0), __pgprot(L_PTE_XN)); } + +int set_memory_valid(unsigned long addr, int numpages, int enable) +{ + if (enable) + return __change_memory_common(addr, PAGE_SIZE * numpages, + __pgprot(L_PTE_VALID), + __pgprot(0)); + else + return __change_memory_common(addr, PAGE_SIZE * numpages, + __pgprot(0), + __pgprot(L_PTE_VALID)); +}
From: Kefeng Wang wangkefeng.wang@huawei.com
hulk inclusion category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
-----------------------------------------------
The function will check whether the fault is caused by a write access.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm/mm/fault.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 64df4c6cfb8d..792a99f66c55 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -112,6 +112,11 @@ void show_pte(const char *lvl, struct mm_struct *mm, unsigned long addr) { } #endif /* CONFIG_MMU */
+static inline bool is_write_fault(unsigned int fsr) +{ + return (fsr & FSR_WRITE) && !(fsr & FSR_CM); +} + static void die_kernel_fault(const char *msg, struct mm_struct *mm, unsigned long addr, unsigned int fsr, struct pt_regs *regs) @@ -274,7 +279,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) if (user_mode(regs)) flags |= FAULT_FLAG_USER;
- if ((fsr & FSR_WRITE) && !(fsr & FSR_CM)) { + if (is_write_fault(fsr)) { flags |= FAULT_FLAG_WRITE; vm_flags = VM_WRITE; }
From: Kefeng Wang wangkefeng.wang@huawei.com
hulk inclusion category: feature bugzilla: 181005 https://gitee.com/openeuler/kernel/issues/I4EUY7
-----------------------------------------------
Add architecture specific implementation details for KFENCE and enable KFENCE on ARM. In particular, this implements the required interface in <asm/kfence.h>.
KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the kfence pool to be mapped at page granularity.
Testing this patch using the testcases in kfence_test.c and all passed with or without ARM_LPAE.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm/Kconfig | 1 + arch/arm/include/asm/kfence.h | 53 +++++++++++++++++++++++++++++++++++ arch/arm/mm/fault.c | 9 ++++-- 3 files changed, 61 insertions(+), 2 deletions(-) create mode 100644 arch/arm/include/asm/kfence.h
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 257dbc0ce341..3d7affdf4b68 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -67,6 +67,7 @@ config ARM select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU + select HAVE_ARCH_KFENCE if MMU select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL select HAVE_ARCH_MMAP_RND_BITS if MMU diff --git a/arch/arm/include/asm/kfence.h b/arch/arm/include/asm/kfence.h new file mode 100644 index 000000000000..8335a84034a2 --- /dev/null +++ b/arch/arm/include/asm/kfence.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ASM_ARM_KFENCE_H +#define __ASM_ARM_KFENCE_H + +#include <linux/kfence.h> +#include <asm/set_memory.h> +#include <asm/pgalloc.h> + +static inline int split_pmd_page(pmd_t *pmd, unsigned long addr) +{ + int i; + unsigned long pfn = PFN_DOWN(__pa((addr & PMD_MASK))); + pte_t *pte = pte_alloc_one_kernel(&init_mm); + + if (!pte) + return -ENOMEM; + + for (i = 0; i < PTRS_PER_PTE; i++) + set_pte_ext(pte + i, pfn_pte(pfn + i, PAGE_KERNEL), 0); + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, pte); + + flush_tlb_kernel_range(addr, addr + PMD_SIZE); + return 0; +} + +static inline bool arch_kfence_init_pool(void) +{ + unsigned long addr; + pmd_t *pmd; + + for (addr = (unsigned long)__kfence_pool; is_kfence_address((void *)addr); + addr += PAGE_SIZE) { + pmd = pmd_off_k(addr); + + if (pmd_leaf(*pmd)) { + if (split_pmd_page(pmd, addr)) + return false; + } + } + + return true; +} + +static inline bool kfence_protect_page(unsigned long addr, bool protect) +{ + set_memory_valid(addr, 1, !protect); + + return true; +} + +#endif /* __ASM_ARM_KFENCE_H */ diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 792a99f66c55..91965fb043de 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -17,6 +17,7 @@ #include <linux/sched/debug.h> #include <linux/highmem.h> #include <linux/perf_event.h> +#include <linux/kfence.h>
#include <asm/system_misc.h> #include <asm/system_info.h> @@ -149,10 +150,14 @@ __do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr, /* * No handler, we'll have to terminate things with extreme prejudice. */ - if (addr < PAGE_SIZE) + if (addr < PAGE_SIZE) { msg = "NULL pointer dereference"; - else + } else { + if (kfence_handle_page_fault(addr, is_write_fault(fsr), regs)) + return; + msg = "paging request"; + }
die_kernel_fault(msg, mm, addr, fsr, regs); }