GONG Ruiqi (1): Randomized slab caches for kmalloc()
Jesper Dangaard Brouer (1): mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
include/linux/percpu.h | 12 ++++++-- include/linux/slab.h | 49 +++++++++++++++++++++++++++++--- init/Kconfig | 19 ++++++++++++- mm/kfence/kfence_test.c | 14 +++++---- mm/slab.c | 8 +++--- mm/slab.h | 7 +++-- mm/slab_common.c | 63 +++++++++++++++++++++++++++++++++-------- mm/slub.c | 8 +++--- 8 files changed, 143 insertions(+), 37 deletions(-)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/12527 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/6...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/12527 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/6...
From: Jesper Dangaard Brouer brouer@redhat.com
mainline inclusion from mainline-v6.5-rc1 commit d0bf7d5759c1d89fb013aa41cca5832e00b9632a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9K8D1
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Allow API users of kmem_cache_create to specify that they don't want any slab merge or aliasing (with similar sized objects). Use this in kfence_test.
The SKB (sk_buff) kmem_cache slab is critical for network performance. Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain performance by amortising the alloc/free cost.
For the bulk API to perform efficiently the slub fragmentation need to be low. Especially for the SLUB allocator, the efficiency of bulk free API depend on objects belonging to the same slab (page).
When running different network performance microbenchmarks, I started to notice that performance was reduced (slightly) when machines had longer uptimes. I believe the cause was 'skbuff_head_cache' got aliased/merged into the general slub for 256 bytes sized objects (with my kernel config, without CONFIG_HARDENED_USERCOPY).
For SKB kmem_cache network stack have reasons for not merging, but it varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE for this kmem_cache.
Another use case for the flag has been described by David Sterba [1]:
This can be used for more fine grained control over the caches or for debugging builds where separate slabs can verify that no objects leak.
The slab_nomerge boot option is too coarse and would need to be enabled on all testing hosts. There are some other ways how to disable merging, e.g. a slab constructor but this disables poisoning besides that it adds additional overhead. Other flags are internal and may have other semantics.
A concrete example what motivates the flag. During 'btrfs balance' slab top reported huge increase in caches like
1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot
which was confusing and that it's because of slab merging was not the first idea. After rebooting with slab_nomerge all the caches were from btrfs_ namespace as expected.
[1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
[ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the one proposed by David so it does not collide with internal SLAB/SLUB flags, write a comment for the flag, expand changelog, drop the skbuff part to be handled spearately ]
Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@fi... Reported-by: David Sterba dsterba@suse.com Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Roman Gushchin roman.gushchin@linux.dev Conflicts: mm/slab.h mm/slab_common.c [Conflicts come from the lack of some SLAB_* flags in 5.10.] Signed-off-by: GONG, Ruiqi gongruiqi1@huawei.com --- include/linux/slab.h | 12 ++++++++++++ mm/kfence/kfence_test.c | 7 +++---- mm/slab.h | 5 +++-- mm/slab_common.c | 2 +- 4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index b27f7372f365..58686a11f998 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -93,6 +93,18 @@ /* Avoid kmemleak tracing */ #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U)
+/* + * Prevent merging with compatible kmem caches. This flag should be used + * cautiously. Valid use cases: + * + * - caches created for self-tests (e.g. kunit) + * - general caches created and used by a subsystem, only when a + * (subsystem-specific) debug option is enabled + * - performance critical caches, should be very rare and consulted with slab + * maintainers, and not used together with CONFIG_SLUB_TINY + */ +#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U) + /* Fault injection mark */ #ifdef CONFIG_FAILSLAB # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 0acbc7365412..08a3c1e793c1 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -186,11 +186,10 @@ static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t fla kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor);
/* - * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any - * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to - * allocate via memcg, if enabled. + * Use SLAB_NO_MERGE to prevent merging with existing caches. + * Use SLAB_ACCOUNT to allocate via memcg, if enabled. */ - flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT; + flags |= SLAB_NO_MERGE | SLAB_ACCOUNT; test_cache = kmem_cache_create("test", size, 1, flags, ctor); KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache");
diff --git a/mm/slab.h b/mm/slab.h index 8414c345127b..594f8b0c65b9 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -142,10 +142,10 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size, #if defined(CONFIG_SLAB) #define SLAB_CACHE_FLAGS (SLAB_MEM_SPREAD | SLAB_NOLEAKTRACE | \ SLAB_RECLAIM_ACCOUNT | SLAB_TEMPORARY | \ - SLAB_ACCOUNT) + SLAB_ACCOUNT | SLAB_NO_MERGE) #elif defined(CONFIG_SLUB) #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \ - SLAB_TEMPORARY | SLAB_ACCOUNT) + SLAB_TEMPORARY | SLAB_ACCOUNT | SLAB_NO_MERGE) #else #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE) #endif @@ -164,6 +164,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size, SLAB_NOLEAKTRACE | \ SLAB_RECLAIM_ACCOUNT | \ SLAB_TEMPORARY | \ + SLAB_NO_MERGE | \ SLAB_ACCOUNT)
bool __kmem_cache_empty(struct kmem_cache *); diff --git a/mm/slab_common.c b/mm/slab_common.c index 5f1a1c38a815..1b8857d07e06 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -54,7 +54,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, */ #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \ - SLAB_FAILSLAB | SLAB_KASAN) + SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_KASAN)
#define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
mainline inclusion from mainline-v6.6-rc1 commit 3c6152940584290668b35fa0800026f6a1ae05fe category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9K8D1
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc.
There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill.
To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed.
Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups.
The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below:
sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec)
control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759
experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343
The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies:
control 4 copies 8 copies 16 copies
total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M
Co-developed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: GONG, Ruiqi gongruiqi@huaweicloud.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Hyeonggon Yoo 42.hyeyoo@gmail.com Acked-by: Dennis Zhou dennis@kernel.org # percpu Signed-off-by: Vlastimil Babka vbabka@suse.cz Conflicts: include/linux/percpu.h include/linux/slab.h init/Kconfig mm/Kconfig mm/kfence/kfence_test.c mm/slab.c mm/slab.h mm/slab_common.c mm/slub.c [There's a big difference between 5.10 and 6.6 with regards to the code structure of SLUB, and a bit of code refactoring is needed.] Signed-off-by: GONG Ruiqi gongruiqi1@huawei.com --- include/linux/percpu.h | 12 ++++++-- include/linux/slab.h | 37 ++++++++++++++++++++++--- init/Kconfig | 19 ++++++++++++- mm/kfence/kfence_test.c | 7 +++-- mm/slab.c | 8 +++--- mm/slab.h | 2 +- mm/slab_common.c | 61 +++++++++++++++++++++++++++++++++-------- mm/slub.c | 8 +++--- 8 files changed, 124 insertions(+), 30 deletions(-)
diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 5e76af742c80..c5bf45dead9d 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -35,6 +35,12 @@ #define PCPU_BITMAP_BLOCK_BITS (PCPU_BITMAP_BLOCK_SIZE >> \ PCPU_MIN_ALLOC_SHIFT)
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES +#define PERCPU_DYNAMIC_SIZE_SHIFT 12 +#else +#define PERCPU_DYNAMIC_SIZE_SHIFT 10 +#endif + /* * Percpu allocator can serve percpu allocations before slab is * initialized which allows slab to depend on the percpu allocator. @@ -43,7 +49,7 @@ * larger than PERCPU_DYNAMIC_EARLY_SIZE. */ #define PERCPU_DYNAMIC_EARLY_SLOTS 128 -#define PERCPU_DYNAMIC_EARLY_SIZE (12 << 10) +#define PERCPU_DYNAMIC_EARLY_SIZE (12 << PERCPU_DYNAMIC_SIZE_SHIFT)
/* * PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy @@ -57,9 +63,9 @@ * intelligent way to determine this would be nice. */ #if BITS_PER_LONG > 32 -#define PERCPU_DYNAMIC_RESERVE (28 << 10) +#define PERCPU_DYNAMIC_RESERVE (28 << PERCPU_DYNAMIC_SIZE_SHIFT) #else -#define PERCPU_DYNAMIC_RESERVE (20 << 10) +#define PERCPU_DYNAMIC_RESERVE (20 << PERCPU_DYNAMIC_SIZE_SHIFT) #endif
extern void *pcpu_base_addr; diff --git a/include/linux/slab.h b/include/linux/slab.h index 58686a11f998..d78d8fbd8c16 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -17,6 +17,7 @@ #include <linux/types.h> #include <linux/workqueue.h> #include <linux/percpu-refcount.h> +#include <linux/hash.h>
/* @@ -310,12 +311,26 @@ static inline void __check_heap_object(const void *ptr, unsigned long n, #define SLAB_OBJ_MIN_SIZE (KMALLOC_MIN_SIZE < 16 ? \ (KMALLOC_MIN_SIZE) : 16)
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES +#define RANDOM_KMALLOC_CACHES_NR 15 // # of cache copies +#else +#define RANDOM_KMALLOC_CACHES_NR 0 +#endif + /* * Whenever changing this, take care of that kmalloc_type() and * create_kmalloc_caches() still work as intended. */ enum kmalloc_cache_type { KMALLOC_NORMAL = 0, + /* + * This config control, not present in Linux mainline, is just for + * kABI maintenance. + */ +#ifdef CONFIG_RANDOM_KMALLOC_CACHES + KMALLOC_RANDOM_START = KMALLOC_NORMAL, + KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + RANDOM_KMALLOC_CACHES_NR, +#endif KMALLOC_RECLAIM, #ifdef CONFIG_ZONE_DMA KMALLOC_DMA, @@ -327,7 +342,9 @@ enum kmalloc_cache_type { extern struct kmem_cache * kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
-static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) +extern unsigned long random_kmalloc_seed; + +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller) { #ifdef CONFIG_ZONE_DMA /* @@ -335,7 +352,13 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) * with a single branch for both flags. */ if (likely((flags & (__GFP_DMA | __GFP_RECLAIMABLE)) == 0)) +#ifdef CONFIG_RANDOM_KMALLOC_CACHES + /* RANDOM_KMALLOC_CACHES_NR (=15) copies + the KMALLOC_NORMAL */ + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed, + ilog2(RANDOM_KMALLOC_CACHES_NR + 1)); +#else return KMALLOC_NORMAL; +#endif
/* * At least one of the flags has to be set. If both are, __GFP_DMA @@ -343,7 +366,13 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) */ return flags & __GFP_DMA ? KMALLOC_DMA : KMALLOC_RECLAIM; #else - return flags & __GFP_RECLAIMABLE ? KMALLOC_RECLAIM : KMALLOC_NORMAL; + return flags & __GFP_RECLAIMABLE ? KMALLOC_RECLAIM : +#ifdef CONFIG_RANDOM_KMALLOC_CACHES + KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed, + ilog2(RANDOM_KMALLOC_CACHES_NR + 1)); +#else + KMALLOC_NORMAL; +#endif #endif }
@@ -573,7 +602,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return ZERO_SIZE_PTR;
return kmem_cache_alloc_trace( - kmalloc_caches[kmalloc_type(flags)][index], + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], flags, size); #endif } @@ -591,7 +620,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) return ZERO_SIZE_PTR;
return kmem_cache_alloc_node_trace( - kmalloc_caches[kmalloc_type(flags)][i], + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][i], flags, node, size); } #endif diff --git a/init/Kconfig b/init/Kconfig index 9dcc12704729..be284bca406c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2203,6 +2203,23 @@ config SLUB_CPU_PARTIAL which requires the taking of locks that may cause latency spikes. Typically one would choose no for a realtime system.
+config RANDOM_KMALLOC_CACHES + default n + depends on SLUB && !SLUB_TINY + bool "Randomize slab caches for normal kmalloc" + help + A hardening feature that creates multiple copies of slab caches for + normal kmalloc allocation and makes kmalloc randomly pick one based + on code address, which makes the attackers more difficult to spray + vulnerable memory objects on the heap for the purpose of exploiting + memory vulnerabilities. + + Currently the number of copies is set to 16, a reasonably large value + that effectively diverges the memory objects allocated for different + subsystems or modules into different caches, at the expense of a + limited degree of memory and CPU overhead that relates to hardware and + system workload. + config MMAP_ALLOW_UNINITIALIZED bool "Allow mmapped anonymous memory to be uninitialized" depends on EXPERT && !MMU @@ -2605,4 +2622,4 @@ config NOKASLR_MEM_RANGE default n help Say y here and add kernel parameters as nokaslr=nn[KMG]-ss[KMG] to avoid kaslr - place kernel image in such memory regions \ No newline at end of file + place kernel image in such memory regions diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 08a3c1e793c1..81cac28ca34e 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -207,7 +207,9 @@ static void test_cache_destroy(void)
static inline size_t kmalloc_cache_alignment(size_t size) { - return kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]->align; + /* just to get ->align so no need to pass in the real caller */ + enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, 0); + return kmalloc_caches[type][__kmalloc_index(size, false)]->align; }
/* Must always inline to match stack trace against caller. */ @@ -277,8 +279,9 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat
if (is_kfence_address(alloc)) { struct page *page = virt_to_head_page(alloc); + enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, _RET_IP_); struct kmem_cache *s = test_cache ?: - kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]; + kmalloc_caches[type][__kmalloc_index(size, false)];
/* * Verify that various helpers return the right values diff --git a/mm/slab.c b/mm/slab.c index ae84578f3fde..ca71d3f3e34b 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1684,7 +1684,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep, size_t freelist_size;
freelist_size = num * sizeof(freelist_idx_t); - freelist_cache = kmalloc_slab(freelist_size, 0u); + freelist_cache = kmalloc_slab(freelist_size, 0u, _RET_IP_); if (!freelist_cache) continue;
@@ -2074,7 +2074,7 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
if (OFF_SLAB(cachep)) { cachep->freelist_cache = - kmalloc_slab(cachep->freelist_size, 0u); + kmalloc_slab(cachep->freelist_size, 0u, _RET_IP_); }
err = setup_cpu_cache(cachep, gfp); @@ -3628,7 +3628,7 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) return NULL; - cachep = kmalloc_slab(size, flags); + cachep = kmalloc_slab(size, flags, _RET_IP_); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; ret = kmem_cache_alloc_node_trace(cachep, flags, node, size); @@ -3667,7 +3667,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) return NULL; - cachep = kmalloc_slab(size, flags); + cachep = kmalloc_slab(size, flags, _RET_IP_); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; ret = slab_alloc(cachep, flags, size, caller); diff --git a/mm/slab.h b/mm/slab.h index 594f8b0c65b9..a0e92203e99f 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -86,7 +86,7 @@ void setup_kmalloc_cache_index_table(void); void create_kmalloc_caches(slab_flags_t);
/* Find the kmalloc slab corresponding for a certain size */ -struct kmem_cache *kmalloc_slab(size_t, gfp_t); +struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller); #endif
gfp_t kmalloc_fix_flags(gfp_t flags); diff --git a/mm/slab_common.c b/mm/slab_common.c index 1b8857d07e06..fbfc628c5e3a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -576,6 +576,11 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init = { /* initialization for https://bugs.llvm.org/show_bug.cgi?id=42570 */ }; EXPORT_SYMBOL(kmalloc_caches);
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES +unsigned long random_kmalloc_seed __ro_after_init; +EXPORT_SYMBOL(random_kmalloc_seed); +#endif + /* * Conversion table for small slabs sizes / 8 to the index in the * kmalloc array. This is necessary for slabs < 192 since we have non power @@ -618,7 +623,7 @@ static inline unsigned int size_index_elem(unsigned int bytes) * Find the kmem_cache structure that serves a given size of * allocation */ -struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) +struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller) { unsigned int index;
@@ -633,25 +638,51 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) index = fls(size - 1); }
- return kmalloc_caches[kmalloc_type(flags)][index]; + return kmalloc_caches[kmalloc_type(flags, caller)][index]; }
#ifdef CONFIG_ZONE_DMA -#define INIT_KMALLOC_INFO(__size, __short_size) \ -{ \ - .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ - .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ - .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ - .size = __size, \ -} +#define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz, +#else +#define KMALLOC_DMA_NAME(sz) +#endif + +#ifndef CONFIG_SLUB_TINY +#define KMALLOC_RCL_NAME(sz) .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #sz, #else +#define KMALLOC_RCL_NAME(sz) +#endif + +#ifdef CONFIG_RANDOM_KMALLOC_CACHES +#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz) +#define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 1] = "kmalloc-rnd-01-" #sz, +#define KMA_RAND_2(sz) KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 2] = "kmalloc-rnd-02-" #sz, +#define KMA_RAND_3(sz) KMA_RAND_2(sz) .name[KMALLOC_RANDOM_START + 3] = "kmalloc-rnd-03-" #sz, +#define KMA_RAND_4(sz) KMA_RAND_3(sz) .name[KMALLOC_RANDOM_START + 4] = "kmalloc-rnd-04-" #sz, +#define KMA_RAND_5(sz) KMA_RAND_4(sz) .name[KMALLOC_RANDOM_START + 5] = "kmalloc-rnd-05-" #sz, +#define KMA_RAND_6(sz) KMA_RAND_5(sz) .name[KMALLOC_RANDOM_START + 6] = "kmalloc-rnd-06-" #sz, +#define KMA_RAND_7(sz) KMA_RAND_6(sz) .name[KMALLOC_RANDOM_START + 7] = "kmalloc-rnd-07-" #sz, +#define KMA_RAND_8(sz) KMA_RAND_7(sz) .name[KMALLOC_RANDOM_START + 8] = "kmalloc-rnd-08-" #sz, +#define KMA_RAND_9(sz) KMA_RAND_8(sz) .name[KMALLOC_RANDOM_START + 9] = "kmalloc-rnd-09-" #sz, +#define KMA_RAND_10(sz) KMA_RAND_9(sz) .name[KMALLOC_RANDOM_START + 10] = "kmalloc-rnd-10-" #sz, +#define KMA_RAND_11(sz) KMA_RAND_10(sz) .name[KMALLOC_RANDOM_START + 11] = "kmalloc-rnd-11-" #sz, +#define KMA_RAND_12(sz) KMA_RAND_11(sz) .name[KMALLOC_RANDOM_START + 12] = "kmalloc-rnd-12-" #sz, +#define KMA_RAND_13(sz) KMA_RAND_12(sz) .name[KMALLOC_RANDOM_START + 13] = "kmalloc-rnd-13-" #sz, +#define KMA_RAND_14(sz) KMA_RAND_13(sz) .name[KMALLOC_RANDOM_START + 14] = "kmalloc-rnd-14-" #sz, +#define KMA_RAND_15(sz) KMA_RAND_14(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-rnd-15-" #sz, +#else // CONFIG_RANDOM_KMALLOC_CACHES +#define KMALLOC_RANDOM_NAME(N, sz) +#endif + #define INIT_KMALLOC_INFO(__size, __short_size) \ { \ .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ - .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + KMALLOC_RCL_NAME(__short_size) \ + KMALLOC_DMA_NAME(__short_size) \ + KMALLOC_RANDOM_NAME(RANDOM_KMALLOC_CACHES_NR, __short_size) \ .size = __size, \ } -#endif
/* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. @@ -740,6 +771,11 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT;
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES + if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END) + flags |= SLAB_NO_MERGE; +#endif + kmalloc_caches[type][idx] = create_kmalloc_cache( kmalloc_info[idx].name[type], kmalloc_info[idx].size, flags, 0, @@ -774,6 +810,9 @@ void __init create_kmalloc_caches(slab_flags_t flags) new_kmalloc_cache(2, type, flags); } } +#ifdef CONFIG_RANDOM_KMALLOC_CACHES + random_kmalloc_seed = get_random_u64(); +#endif
/* Kmalloc array is now usable */ slab_state = UP; diff --git a/mm/slub.c b/mm/slub.c index ec1c3a376d36..9dd4cc478ec3 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4012,7 +4012,7 @@ void *__kmalloc(size_t size, gfp_t flags) if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) return kmalloc_large(size, flags);
- s = kmalloc_slab(size, flags); + s = kmalloc_slab(size, flags, _RET_IP_);
if (unlikely(ZERO_OR_NULL_PTR(s))) return s; @@ -4060,7 +4060,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node) return ret; }
- s = kmalloc_slab(size, flags); + s = kmalloc_slab(size, flags, _RET_IP_);
if (unlikely(ZERO_OR_NULL_PTR(s))) return s; @@ -4520,7 +4520,7 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller) if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) return kmalloc_large(size, gfpflags);
- s = kmalloc_slab(size, gfpflags); + s = kmalloc_slab(size, gfpflags, _RET_IP_);
if (unlikely(ZERO_OR_NULL_PTR(s))) return s; @@ -4551,7 +4551,7 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags, return ret; }
- s = kmalloc_slab(size, gfpflags); + s = kmalloc_slab(size, gfpflags, _RET_IP_);
if (unlikely(ZERO_OR_NULL_PTR(s))) return s;