From: Vlastimil Babka vbabka@suse.cz
mainline inclusion from mainline-v5.17-rc1 commit 2dba5eb1c73b6ba2988ced07250edeac0f8cbf5a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated from memblock, even if stack depot ends up not actually used. The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit.
This is fine for use-cases such as KASAN which is also a config option and has overhead on its own. But it's an issue for functionality that has to be actually enabled on boot (page_owner) or depends on hardware (GPU drivers) and thus the memory might be wasted. This was raised as an issue [1] when attempting to add stackdepot support for SLUB's debug object tracking functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or create only specific kmem caches with debugging for testing purposes.
It would thus be more efficient if stackdepot's table was allocated only when actually going to be used. This patch thus makes the allocation (and whole stack_depot_init() call) optional:
- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN select this flag.
- Other users have to call stack_depot_init() as part of their own init when it's determined that stack depot will actually be used. This may depend on both config and runtime conditions. Convert current users which are page_owner and several in the DRM subsystem. Same will be done for SLUB later.
- Because the init might now be called after the boot-time memblock allocation has given all memory to the buddy allocator, change stack_depot_init() to allocate stack_table with kvmalloc() when memblock is no longer available. Also handle allocation failure by disabling stackdepot (could have theoretically happened even with memblock allocation previously), and don't unnecessarily align the memblock allocation to its own size anymore.
[1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvb...
Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Dmitry Vyukov dvyukov@google.com Reviewed-by: Marco Elver elver@google.com # stackdepot Cc: Marco Elver elver@google.com Cc: Vijayanand Jitta vjitta@codeaurora.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Oliver Glitta glittao@gmail.com Cc: Imran Khan imran.f.khan@oracle.com From: Colin Ian King colin.king@canonical.com Subject: lib/stackdepot: fix spelling mistake and grammar in pr_err message
There is a spelling mistake of the work allocation so fix this and re-phrase the message to make it easier to read.
Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.com Signed-off-by: Colin Ian King colin.king@canonical.com Cc: Vlastimil Babka vbabka@suse.cz From: Vlastimil Babka vbabka@suse.cz Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup
On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init() which means stack_depot_init() (called by page owner init) will not recognize properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc() will also not issue a warning and return a block memory that can be invalid and cause kernel page fault when saving stacks, as reported by the kernel test robot [1].
Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(), but a different page_ext_init() even later in the boot process.
Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue.
While at it, also actually resolve a checkpatch warning in stack_depot_init() from DRM CI, which was supposed to be in the original patch already.
[1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reported-by: kernel test robot oliver.sang@intel.com Cc: Mike Rapoport rppt@kernel.org Cc: Stephen Rothwell sfr@canb.auug.org.au From: Vlastimil Babka vbabka@suse.cz Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3
Due to cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks without backoff") landing recently to -next adding a new stack depot user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate call to stack_depot_init() there as well.
Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Cc: Jani Nikula jani.nikula@intel.com Cc: Naresh Kamboju naresh.kamboju@linaro.org Cc: Marco Elver elver@google.com Cc: Vijayanand Jitta vjitta@codeaurora.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Oliver Glitta glittao@gmail.com Cc: Imran Khan imran.f.khan@oracle.com Cc: Stephen Rothwell sfr@canb.auug.org.au From: Vlastimil Babka vbabka@suse.cz Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4
Due to 4e66934eaadc ("lib: add reference counting tracking infrastructure") landing recently to net-next adding a new stack depot user in lib/ref_tracker.c we need to add an appropriate call to stack_depot_init() there as well.
Link: https://lkml.kernel.org/r/45c1b738-1a2f-5b5f-2f6d-86fab206d01c@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Eric Dumazet edumazet@google.com Cc: Jiri Slab jirislaby@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: drivers/gpu/drm/drm_modeset_lock.c include/linux/stackdepot.h init/main.c lib/Kconfig.kasan lib/Kconfig [conflicts in drivers/gpu/drm/drm_modeset_lock.c is due to prior commit cd06ab2fd48f2c0243b06344a36056e811d263b8 isn't merged, other conflicts is context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- drivers/gpu/drm/drm_dp_mst_topology.c | 1 + drivers/gpu/drm/drm_mm.c | 4 +++ drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++ include/linux/stackdepot.h | 25 ++++++++++++------- init/main.c | 9 ++++--- lib/Kconfig | 4 +++ lib/Kconfig.kasan | 4 +-- lib/stackdepot.c | 33 +++++++++++++++++++++---- mm/page_owner.c | 2 ++ 9 files changed, 66 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 27305f339881..58bff96e43a6 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -5441,6 +5441,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr, mutex_init(&mgr->probe_lock); #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) mutex_init(&mgr->topology_ref_history_lock); + stack_depot_init(); #endif INIT_LIST_HEAD(&mgr->tx_msg_downq); INIT_LIST_HEAD(&mgr->destroy_port_list); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index a4a04d246135..52faf3369ce2 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size) add_hole(&mm->head_node);
mm->scan_active = 0; + +#ifdef CONFIG_DRM_DEBUG_MM + stack_depot_init(); +#endif } EXPORT_SYMBOL(drm_mm_init);
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 8b725efb2254..b3d7ba685d9c 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack, static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); + + if (rpm->available) + stack_depot_init(); }
static noinline depot_stack_handle_t diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 22919a94ca19..30dfc8549b5f 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -13,19 +13,26 @@
typedef u32 depot_stack_handle_t;
+/* + * Every user of stack depot has to call this during its own init when it's + * decided that it will be calling stack_depot_save() later. + * + * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot + * enabled as part of mm_init(), for subsystems where it's known at compile time + * that stack depot will be used. + */ +int stack_depot_init(void); + +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT +static inline int stack_depot_early_init(void) { return stack_depot_init(); } +#else +static inline int stack_depot_early_init(void) { return 0; } +#endif + depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t gfp_flags);
unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries);
-#ifdef CONFIG_STACKDEPOT -int stack_depot_init(void); -#else -static inline int stack_depot_init(void) -{ - return 0; -} -#endif /* CONFIG_STACKDEPOT */ - #endif diff --git a/init/main.c b/init/main.c index 68284805d173..b8306e52d046 100644 --- a/init/main.c +++ b/init/main.c @@ -830,11 +830,14 @@ static void __init mm_init(void) init_debug_pagealloc(); kfence_alloc_pool(); report_meminit(); - stack_depot_init(); + stack_depot_early_init(); mem_init(); - /* page_owner must be initialized after buddy is ready */ - page_ext_init_flatmem_late(); kmem_cache_init(); + /* + * page_owner must be initialized after buddy is ready, and also after + * slab is ready so that stack_depot_init() works properly + */ + page_ext_init_flatmem_late(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); diff --git a/lib/Kconfig b/lib/Kconfig index 8026964596fd..497304f1647c 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -664,6 +664,10 @@ config STACKDEPOT bool select STACKTRACE
+config STACKDEPOT_ALWAYS_INIT + bool + select STACKDEPOT + config SBITMAP bool
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index 542a9c18398e..24d309819b74 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -49,7 +49,7 @@ config KASAN_GENERIC depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) select SLUB_DEBUG if SLUB select CONSTRUCTORS - select STACKDEPOT + select STACKDEPOT_ALWAYS_INIT help Enables generic KASAN mode.
@@ -73,7 +73,7 @@ config KASAN_SW_TAGS depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) select SLUB_DEBUG if SLUB select CONSTRUCTORS - select STACKDEPOT + select STACKDEPOT_ALWAYS_INIT help Enables software tag-based KASAN mode.
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index fba6332e645d..ede2a433055c 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -23,6 +23,7 @@ #include <linux/jhash.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/mutex.h> #include <linux/percpu.h> #include <linux/printk.h> #include <linux/slab.h> @@ -163,18 +164,40 @@ static int __init is_stack_depot_disabled(char *str) } early_param("stack_depot_disable", is_stack_depot_disabled);
-int __init stack_depot_init(void) +/* + * __ref because of memblock_alloc(), which will not be actually called after + * the __init code is gone, because at that point slab_is_available() is true + */ +__ref int stack_depot_init(void) { - if (!stack_depot_disable) { + static DEFINE_MUTEX(stack_depot_init_mutex); + + mutex_lock(&stack_depot_init_mutex); + if (!stack_depot_disable && !stack_table) { size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); int i;
- stack_table = memblock_alloc(size, size); - for (i = 0; i < STACK_HASH_SIZE; i++) - stack_table[i] = NULL; + if (slab_is_available()) { + pr_info("Stack Depot allocating hash table with kvmalloc\n"); + stack_table = kvmalloc(size, GFP_KERNEL); + } else { + pr_info("Stack Depot allocating hash table with memblock_alloc\n"); + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); + } + if (stack_table) { + for (i = 0; i < STACK_HASH_SIZE; i++) + stack_table[i] = NULL; + } else { + pr_err("Stack Depot hash table allocation failed, disabling\n"); + stack_depot_disable = true; + mutex_unlock(&stack_depot_init_mutex); + return -ENOMEM; + } } + mutex_unlock(&stack_depot_init_mutex); return 0; } +EXPORT_SYMBOL_GPL(stack_depot_init);
/* Calculate hash for a stack */ static inline u32 hash_stack(unsigned long *entries, unsigned int size) diff --git a/mm/page_owner.c b/mm/page_owner.c index 5b93fc85dc73..61ba829e895c 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -86,6 +86,8 @@ static void init_page_owner(void) if (!page_owner_enabled) return;
+ stack_depot_init(); + register_dummy_stack(); register_failure_stack(); register_early_stack();