Support page_owner_module feature.
Akira Yokosawa (2): docs: vm/page_owner: use literal blocks for param description docs: vm/page_owner: tweak literal block in STANDARD FORMAT SPECIFIERS
Charan Teja Kalla (2): mm: fix use-after free of page_ext after race with memory-offline mm/page_exit: fix kernel doc warning in page_ext_put()
Chen Xiao (1): docs: mm/page_owner: fix spelling mistakes
Chongxi Zhao (1): tools/vm/page_owner_sort.c: support sorting pid and time
Eric Dumazet (1): mm/page_owner: use strscpy() instead of strlcpy()
Haowen Bai (1): tools/vm/page_owner: support debug log to avoid huge log print
Hyeonggon Yoo (1): mm/page_owner: record single timestamp value for high order allocations
Jiajian Ye (9): tools/vm/page_owner_sort.c: add a security check tools/vm/page_owner_sort.c: support sorting by tgid and update documentation tools/vm/page_owner_sort: fix three trivival places tools/vm/page_owner_sort: support for sorting by task command name tools/vm/page_owner_sort.c: support for selecting by PID, TGID or task command name tools/vm/page_owner_sort.c: support for user-defined culling rules tools/vm/page_owner_sort.c: use fprintf() to send error messages to stderr tools/vm/page_owner_sort.c: support for multi-value selection in single argument tools/vm/page_owner_sort.c: support sorting blocks by multiple keys
Jianlin Lv (1): tools/vm/page_owner_sort: free memory before exit
Jinjiang Tu (11): mm/page_owner: support identifying pages allocated by modules mm/page_owner: support filtering by modules when dumping page_owner mm/page_owner: show top modules allocating pages when oom occurred mm/page_owner: support configuring the num of modules to dump mm/page_owner: supporting showing the module statistics via debugfs mm/page_owner: record leaked module into individual list Documentation: update page_owner document for module statistics tools/vm/page_owner_sort: support for selecting by module name tools/vm/page_owner_sort: support for culling by module name tools/vm/page_owner_sort: support sorting by module name Documentation: update document for page_owner_sort tool
Jonathan Corbet (1): docs: fix RST error in vm/page_owner.rst
Sean Anderson (2): tools/vm/page_owner_sort.c: sort by stacktrace before culling tools/vm/page_owner_sort.c: support sorting by stack trace
Sergei Trofimovich (1): mm: page_owner: detect page_owner recursion via task_struct
Shenghong Han (2): tools/vm/page_owner_sort.c: two trivial fixes Documentation/vm/page_owner.rst: update the documentation
Steve Chou (1): tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
Tang Bin (1): tools/vm/page_owner_sort.c: check malloc() return
Ting Liu (1): mm: make some vars and functions static or __init
Vijayanand Jitta (2): lib: stackdepot: add support to disable stack depot lib: stackdepot: fix ignoring return value warning
Ville Syrjälä (1): drm/i915: Fix oops due to missing stack depot
Vlastimil Babka (3): lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() lib/stackdepot: allow requesting early initialization dynamically lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
Waiman Long (3): mm/page_owner: use scnprintf() to avoid excessive buffer overrun check mm/page_owner: print memcg information mm/page_owner: record task command name
Yinan Zhang (3): tools/vm/page_owner_sort.c: add switch between culling by stacktrace and txt tools/vm/page_owner_sort.c: remove -c option doc/vm/page_owner.rst: remove content related to -c option
Yixuan Cao (7): mm/page_owner.c: record tgid tools/vm/page_owner_sort.c: delete invalid duplicate code tools/vm/page_owner_sort.c: fix the instructions for use tools/vm/page_owner_sort.c: provide allocator labelling and update --cull and --sort options tools/vm/page_owner_sort.c: avoid repeated judgments tools/vm/page_owner_sort.c: adjust the indent in is_need() tools/vm/page_owner_sort: fix -f option
Yogesh Lal (1): lib: stackdepot: add support to configure STACK_HASH_SIZE
Zhenhua Huang (1): mm/page_owner.c: remove redundant drain_all_pages
Zhenliang Wei (1): tools/vm/page_owner_sort.c: count and sort by mem
.../admin-guide/kernel-parameters.txt | 6 + Documentation/vm/page_owner.rst | 129 ++- drivers/gpu/drm/drm_dp_mst_topology.c | 1 + drivers/gpu/drm/drm_mm.c | 4 + drivers/gpu/drm/i915/intel_runtime_pm.c | 1 + include/linux/page_ext.h | 17 +- include/linux/page_idle.h | 34 +- include/linux/sched.h | 3 + include/linux/stackdepot.h | 34 +- init/main.c | 9 +- lib/Kconfig | 4 + lib/Kconfig.kasan | 4 +- lib/stackdepot.c | 126 ++- mm/Kconfig.debug | 9 + mm/Makefile | 1 + mm/page_ext.c | 108 ++- mm/page_owner.c | 209 ++-- mm/page_owner.h | 71 ++ mm/page_owner_module.c | 402 ++++++++ tools/vm/page_owner_sort.c | 904 +++++++++++++++++- 20 files changed, 1927 insertions(+), 149 deletions(-) create mode 100644 mm/page_owner.h create mode 100644 mm/page_owner_module.c
From: Vijayanand Jitta vjitta@codeaurora.org
mainline inclusion from mainline-v5.12-rc1 commit e1fdc403349c64fa58f4c163f4bf9b860b4db808 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add a kernel parameter stack_depot_disable to disable stack depot. So that stack hash table doesn't consume any memory when stack depot is disabled.
The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this patch, stackdepot will consume the memory for the hashtable. By default, it's 8M which is never trivial.
With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off, stack_depot_disable in kernel command line, we could save the wasted memory for the hashtable.
[akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build]
Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeauror... Signed-off-by: Vinayak Menon vinmenon@codeaurora.org Signed-off-by: Vijayanand Jitta vjitta@codeaurora.org Cc: Alexander Potapenko glider@google.com Cc: Minchan Kim minchan@kernel.org Cc: Yogesh Lal ylal@codeaurora.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: Documentation/admin-guide/kernel-parameters.txt include/linux/stackdepot.h init/main.c lib/stackdepot.c [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- .../admin-guide/kernel-parameters.txt | 6 ++++ include/linux/stackdepot.h | 8 +++++ init/main.c | 2 ++ lib/stackdepot.c | 32 ++++++++++++++++--- 4 files changed, 44 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 5ad5cd51d015..de3afdddcf6a 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5543,6 +5543,12 @@ growing up) the main stack are reserved for no other mapping. Default value is 256 pages.
+ stack_depot_disable= [KNL] + Setting this to true through kernel command line will + disable the stack depot thereby saving the static memory + consumed by the stack hash table. By default this is set + to false. + stacktrace [FTRACE] Enabled the stack tracer on boot up.
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 1cbe5ad0577d..22919a94ca19 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -19,5 +19,13 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries);
+#ifdef CONFIG_STACKDEPOT +int stack_depot_init(void); +#else +static inline int stack_depot_init(void) +{ + return 0; +} +#endif /* CONFIG_STACKDEPOT */
#endif diff --git a/init/main.c b/init/main.c index f06fbe79a84a..68284805d173 100644 --- a/init/main.c +++ b/init/main.c @@ -98,6 +98,7 @@ #include <linux/jump_label.h> #include <linux/kcsan.h> #include <linux/init_syscalls.h> +#include <linux/stackdepot.h> #include <linux/randomize_kstack.h>
#include <asm/io.h> @@ -829,6 +830,7 @@ static void __init mm_init(void) init_debug_pagealloc(); kfence_alloc_pool(); report_meminit(); + stack_depot_init(); mem_init(); /* page_owner must be initialized after buddy is ready */ page_ext_init_flatmem_late(); diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 3cab9ba618df..33e22f20ffa4 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -30,6 +30,7 @@ #include <linux/stackdepot.h> #include <linux/string.h> #include <linux/types.h> +#include <linux/memblock.h>
#define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
@@ -146,9 +147,32 @@ static struct stack_record *depot_alloc_stack(unsigned long *entries, int size, #define STACK_HASH_MASK (STACK_HASH_SIZE - 1) #define STACK_HASH_SEED 0x9747b28c
-static struct stack_record *stack_table[STACK_HASH_SIZE] = { - [0 ... STACK_HASH_SIZE - 1] = NULL -}; +static bool stack_depot_disable; +static struct stack_record **stack_table; + +static int __init is_stack_depot_disabled(char *str) +{ + kstrtobool(str, &stack_depot_disable); + if (stack_depot_disable) { + pr_info("Stack Depot is disabled\n"); + stack_table = NULL; + } + return 0; +} +early_param("stack_depot_disable", is_stack_depot_disabled); + +int __init stack_depot_init(void) +{ + if (!stack_depot_disable) { + size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); + int i; + + stack_table = memblock_alloc(size, size); + for (i = 0; i < STACK_HASH_SIZE; i++) + stack_table[i] = NULL; + } + return 0; +}
/* Calculate hash for a stack */ static inline u32 hash_stack(unsigned long *entries, unsigned int size) @@ -242,7 +266,7 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned long flags; u32 hash;
- if (unlikely(nr_entries == 0)) + if (unlikely(nr_entries == 0) || stack_depot_disable) goto fast_exit;
hash = hash_stack(entries, nr_entries);
From: Vijayanand Jitta vjitta@codeaurora.org
mainline inclusion from mainline-v5.12-rc1 commit 64427985c76fcb54c783de617edf353009499a03 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Fix the below ignoring return value warning for kstrtobool in is_stack_depot_disabled function.
lib/stackdepot.c: In function 'is_stack_depot_disabled': lib/stackdepot.c:154:2: warning: ignoring return value of 'kstrtobool' declared with attribute 'warn_unused_result' [-Wunused-result]
Link: https://lkml.kernel.org/r/1612163048-28026-1-git-send-email-vjitta@codeauror... Fixes: b9779abb09a8 ("lib: stackdepot: add support to disable stack depot") Signed-off-by: Vijayanand Jitta vjitta@codeaurora.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- lib/stackdepot.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 33e22f20ffa4..fba6332e645d 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -152,8 +152,10 @@ static struct stack_record **stack_table;
static int __init is_stack_depot_disabled(char *str) { - kstrtobool(str, &stack_depot_disable); - if (stack_depot_disable) { + int ret; + + ret = kstrtobool(str, &stack_depot_disable); + if (!ret && stack_depot_disable) { pr_info("Stack Depot is disabled\n"); stack_table = NULL; }
From: Vlastimil Babka vbabka@suse.cz
mainline inclusion from mainline-v5.17-rc1 commit 2dba5eb1c73b6ba2988ced07250edeac0f8cbf5a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated from memblock, even if stack depot ends up not actually used. The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit.
This is fine for use-cases such as KASAN which is also a config option and has overhead on its own. But it's an issue for functionality that has to be actually enabled on boot (page_owner) or depends on hardware (GPU drivers) and thus the memory might be wasted. This was raised as an issue [1] when attempting to add stackdepot support for SLUB's debug object tracking functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or create only specific kmem caches with debugging for testing purposes.
It would thus be more efficient if stackdepot's table was allocated only when actually going to be used. This patch thus makes the allocation (and whole stack_depot_init() call) optional:
- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN select this flag.
- Other users have to call stack_depot_init() as part of their own init when it's determined that stack depot will actually be used. This may depend on both config and runtime conditions. Convert current users which are page_owner and several in the DRM subsystem. Same will be done for SLUB later.
- Because the init might now be called after the boot-time memblock allocation has given all memory to the buddy allocator, change stack_depot_init() to allocate stack_table with kvmalloc() when memblock is no longer available. Also handle allocation failure by disabling stackdepot (could have theoretically happened even with memblock allocation previously), and don't unnecessarily align the memblock allocation to its own size anymore.
[1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvb...
Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Dmitry Vyukov dvyukov@google.com Reviewed-by: Marco Elver elver@google.com # stackdepot Cc: Marco Elver elver@google.com Cc: Vijayanand Jitta vjitta@codeaurora.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Oliver Glitta glittao@gmail.com Cc: Imran Khan imran.f.khan@oracle.com From: Colin Ian King colin.king@canonical.com Subject: lib/stackdepot: fix spelling mistake and grammar in pr_err message
There is a spelling mistake of the work allocation so fix this and re-phrase the message to make it easier to read.
Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.com Signed-off-by: Colin Ian King colin.king@canonical.com Cc: Vlastimil Babka vbabka@suse.cz From: Vlastimil Babka vbabka@suse.cz Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup
On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init() which means stack_depot_init() (called by page owner init) will not recognize properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc() will also not issue a warning and return a block memory that can be invalid and cause kernel page fault when saving stacks, as reported by the kernel test robot [1].
Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(), but a different page_ext_init() even later in the boot process.
Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue.
While at it, also actually resolve a checkpatch warning in stack_depot_init() from DRM CI, which was supposed to be in the original patch already.
[1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reported-by: kernel test robot oliver.sang@intel.com Cc: Mike Rapoport rppt@kernel.org Cc: Stephen Rothwell sfr@canb.auug.org.au From: Vlastimil Babka vbabka@suse.cz Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3
Due to cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks without backoff") landing recently to -next adding a new stack depot user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate call to stack_depot_init() there as well.
Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Cc: Jani Nikula jani.nikula@intel.com Cc: Naresh Kamboju naresh.kamboju@linaro.org Cc: Marco Elver elver@google.com Cc: Vijayanand Jitta vjitta@codeaurora.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Oliver Glitta glittao@gmail.com Cc: Imran Khan imran.f.khan@oracle.com Cc: Stephen Rothwell sfr@canb.auug.org.au From: Vlastimil Babka vbabka@suse.cz Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4
Due to 4e66934eaadc ("lib: add reference counting tracking infrastructure") landing recently to net-next adding a new stack depot user in lib/ref_tracker.c we need to add an appropriate call to stack_depot_init() there as well.
Link: https://lkml.kernel.org/r/45c1b738-1a2f-5b5f-2f6d-86fab206d01c@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Eric Dumazet edumazet@google.com Cc: Jiri Slab jirislaby@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: drivers/gpu/drm/drm_modeset_lock.c include/linux/stackdepot.h init/main.c lib/Kconfig.kasan lib/Kconfig [conflicts in drivers/gpu/drm/drm_modeset_lock.c is due to prior commit cd06ab2fd48f2c0243b06344a36056e811d263b8 isn't merged, other conflicts is context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- drivers/gpu/drm/drm_dp_mst_topology.c | 1 + drivers/gpu/drm/drm_mm.c | 4 +++ drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++ include/linux/stackdepot.h | 25 ++++++++++++------- init/main.c | 9 ++++--- lib/Kconfig | 4 +++ lib/Kconfig.kasan | 4 +-- lib/stackdepot.c | 33 +++++++++++++++++++++---- mm/page_owner.c | 2 ++ 9 files changed, 66 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 27305f339881..58bff96e43a6 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -5441,6 +5441,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr, mutex_init(&mgr->probe_lock); #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) mutex_init(&mgr->topology_ref_history_lock); + stack_depot_init(); #endif INIT_LIST_HEAD(&mgr->tx_msg_downq); INIT_LIST_HEAD(&mgr->destroy_port_list); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index a4a04d246135..52faf3369ce2 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size) add_hole(&mm->head_node);
mm->scan_active = 0; + +#ifdef CONFIG_DRM_DEBUG_MM + stack_depot_init(); +#endif } EXPORT_SYMBOL(drm_mm_init);
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 8b725efb2254..b3d7ba685d9c 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack, static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); + + if (rpm->available) + stack_depot_init(); }
static noinline depot_stack_handle_t diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 22919a94ca19..30dfc8549b5f 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -13,19 +13,26 @@
typedef u32 depot_stack_handle_t;
+/* + * Every user of stack depot has to call this during its own init when it's + * decided that it will be calling stack_depot_save() later. + * + * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot + * enabled as part of mm_init(), for subsystems where it's known at compile time + * that stack depot will be used. + */ +int stack_depot_init(void); + +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT +static inline int stack_depot_early_init(void) { return stack_depot_init(); } +#else +static inline int stack_depot_early_init(void) { return 0; } +#endif + depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t gfp_flags);
unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries);
-#ifdef CONFIG_STACKDEPOT -int stack_depot_init(void); -#else -static inline int stack_depot_init(void) -{ - return 0; -} -#endif /* CONFIG_STACKDEPOT */ - #endif diff --git a/init/main.c b/init/main.c index 68284805d173..b8306e52d046 100644 --- a/init/main.c +++ b/init/main.c @@ -830,11 +830,14 @@ static void __init mm_init(void) init_debug_pagealloc(); kfence_alloc_pool(); report_meminit(); - stack_depot_init(); + stack_depot_early_init(); mem_init(); - /* page_owner must be initialized after buddy is ready */ - page_ext_init_flatmem_late(); kmem_cache_init(); + /* + * page_owner must be initialized after buddy is ready, and also after + * slab is ready so that stack_depot_init() works properly + */ + page_ext_init_flatmem_late(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); diff --git a/lib/Kconfig b/lib/Kconfig index 8026964596fd..497304f1647c 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -664,6 +664,10 @@ config STACKDEPOT bool select STACKTRACE
+config STACKDEPOT_ALWAYS_INIT + bool + select STACKDEPOT + config SBITMAP bool
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index 542a9c18398e..24d309819b74 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -49,7 +49,7 @@ config KASAN_GENERIC depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) select SLUB_DEBUG if SLUB select CONSTRUCTORS - select STACKDEPOT + select STACKDEPOT_ALWAYS_INIT help Enables generic KASAN mode.
@@ -73,7 +73,7 @@ config KASAN_SW_TAGS depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) select SLUB_DEBUG if SLUB select CONSTRUCTORS - select STACKDEPOT + select STACKDEPOT_ALWAYS_INIT help Enables software tag-based KASAN mode.
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index fba6332e645d..ede2a433055c 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -23,6 +23,7 @@ #include <linux/jhash.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/mutex.h> #include <linux/percpu.h> #include <linux/printk.h> #include <linux/slab.h> @@ -163,18 +164,40 @@ static int __init is_stack_depot_disabled(char *str) } early_param("stack_depot_disable", is_stack_depot_disabled);
-int __init stack_depot_init(void) +/* + * __ref because of memblock_alloc(), which will not be actually called after + * the __init code is gone, because at that point slab_is_available() is true + */ +__ref int stack_depot_init(void) { - if (!stack_depot_disable) { + static DEFINE_MUTEX(stack_depot_init_mutex); + + mutex_lock(&stack_depot_init_mutex); + if (!stack_depot_disable && !stack_table) { size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); int i;
- stack_table = memblock_alloc(size, size); - for (i = 0; i < STACK_HASH_SIZE; i++) - stack_table[i] = NULL; + if (slab_is_available()) { + pr_info("Stack Depot allocating hash table with kvmalloc\n"); + stack_table = kvmalloc(size, GFP_KERNEL); + } else { + pr_info("Stack Depot allocating hash table with memblock_alloc\n"); + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); + } + if (stack_table) { + for (i = 0; i < STACK_HASH_SIZE; i++) + stack_table[i] = NULL; + } else { + pr_err("Stack Depot hash table allocation failed, disabling\n"); + stack_depot_disable = true; + mutex_unlock(&stack_depot_init_mutex); + return -ENOMEM; + } } + mutex_unlock(&stack_depot_init_mutex); return 0; } +EXPORT_SYMBOL_GPL(stack_depot_init);
/* Calculate hash for a stack */ static inline u32 hash_stack(unsigned long *entries, unsigned int size) diff --git a/mm/page_owner.c b/mm/page_owner.c index 5b93fc85dc73..61ba829e895c 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -86,6 +86,8 @@ static void init_page_owner(void) if (!page_owner_enabled) return;
+ stack_depot_init(); + register_dummy_stack(); register_failure_stack(); register_early_stack();
From: Vlastimil Babka vbabka@suse.cz
mainline inclusion from mainline-v5.19-rc1 commit a5f1783be29adae15666fd803efd7d2979130869 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In a later patch we want to add stackdepot support for object owner tracking in slub caches, which is enabled by slub_debug boot parameter. This creates a bootstrap problem as some caches are created early in boot when slab_is_available() is false and thus stack_depot_init() tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are already beyond memblock_free_all(). Ideally memblock allocation should fail, yet it succeeds, but later the system crashes, which is a separately handled issue.
To resolve this boostrap issue in a robust way, this patch adds another way to request stack_depot_early_init(), which happens at a well-defined point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT, code that's e.g. processing boot parameters (which happens early enough) can call a new function stack_depot_want_early_init(), which sets a flag that stack_depot_early_init() will check.
In this patch we also convert page_owner to this approach. While it doesn't have the bootstrap issue as slub, it's also a functionality enabled by a boot param and can thus request stack_depot_early_init() with memblock allocation instead of later initialization with kvmalloc().
As suggested by Mike, make stack_depot_early_init() only attempt memblock allocation and stack_depot_init() only attempt kvmalloc(). Also change the latter to kvcalloc(). In both cases we can lose the explicit array zeroing, which the allocations do already.
As suggested by Marco, provide empty implementations of the init functions for !CONFIG_STACKDEPOT builds to simplify the callers.
[1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1...
Reported-by: Hyeonggon Yoo 42.hyeyoo@gmail.com Suggested-by: Mike Rapoport rppt@linux.ibm.com Suggested-by: Marco Elver elver@google.com Signed-off-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Marco Elver elver@google.com Reviewed-and-tested-by: Hyeonggon Yoo 42.hyeyoo@gmail.com Reviewed-by: Mike Rapoport rppt@linux.ibm.com Acked-by: David Rientjes rientjes@google.com
Conflicts: include/linux/stackdepot.h lib/stackdepot.c mm/page_owner.c [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/stackdepot.h | 25 +++++++++++--- lib/stackdepot.c | 67 +++++++++++++++++++++++++------------- mm/page_owner.c | 5 +-- 3 files changed, 69 insertions(+), 28 deletions(-)
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 30dfc8549b5f..563e92754035 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -14,18 +14,35 @@ typedef u32 depot_stack_handle_t;
/* - * Every user of stack depot has to call this during its own init when it's - * decided that it will be calling stack_depot_save() later. + * Every user of stack depot has to call stack_depot_init() during its own init + * when it's decided that it will be calling stack_depot_save() later. This is + * recommended for e.g. modules initialized later in the boot process, when + * slab_is_available() is true. * * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot * enabled as part of mm_init(), for subsystems where it's known at compile time * that stack depot will be used. + * + * Another alternative is to call stack_depot_want_early_init(), when the + * decision to use stack depot is taken e.g. when evaluating kernel boot + * parameters, which precedes the enablement point in mm_init(). + * + * stack_depot_init() and stack_depot_want_early_init() can be called regardless + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print + * functions should only be called from code that makes sure CONFIG_STACKDEPOT + * is enabled. */ +#ifdef CONFIG_STACKDEPOT int stack_depot_init(void); +void __init stack_depot_want_early_init(void);
-#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT -static inline int stack_depot_early_init(void) { return stack_depot_init(); } +/* This is supposed to be called only from mm_init() */ +int __init stack_depot_early_init(void); #else +static inline int stack_depot_init(void) { return 0; } + +static inline void stack_depot_want_early_init(void) { } + static inline int stack_depot_early_init(void) { return 0; } #endif
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index ede2a433055c..8aed500d5ee8 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -66,6 +66,9 @@ struct stack_record { unsigned long entries[1]; /* Variable-sized array of entries. */ };
+static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT); +static bool __stack_depot_early_init_passed __initdata; + static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
static int depot_index; @@ -164,38 +167,58 @@ static int __init is_stack_depot_disabled(char *str) } early_param("stack_depot_disable", is_stack_depot_disabled);
-/* - * __ref because of memblock_alloc(), which will not be actually called after - * the __init code is gone, because at that point slab_is_available() is true - */ -__ref int stack_depot_init(void) +void __init stack_depot_want_early_init(void) +{ + /* Too late to request early init now */ + WARN_ON(__stack_depot_early_init_passed); + + __stack_depot_want_early_init = true; +} + +int __init stack_depot_early_init(void) +{ + size_t size; + + /* This is supposed to be called only once, from mm_init() */ + if (WARN_ON(__stack_depot_early_init_passed)) + return 0; + + __stack_depot_early_init_passed = true; + + if (!__stack_depot_want_early_init || stack_depot_disable) + return 0; + + size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); + pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n", + size); + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); + + if (!stack_table) { + pr_err("Stack Depot hash table allocation failed, disabling\n"); + stack_depot_disable = true; + return -ENOMEM; + } + + return 0; +} + +int stack_depot_init(void) { static DEFINE_MUTEX(stack_depot_init_mutex); + int ret = 0;
mutex_lock(&stack_depot_init_mutex); if (!stack_depot_disable && !stack_table) { - size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); - int i; - - if (slab_is_available()) { - pr_info("Stack Depot allocating hash table with kvmalloc\n"); - stack_table = kvmalloc(size, GFP_KERNEL); - } else { - pr_info("Stack Depot allocating hash table with memblock_alloc\n"); - stack_table = memblock_alloc(size, SMP_CACHE_BYTES); - } - if (stack_table) { - for (i = 0; i < STACK_HASH_SIZE; i++) - stack_table[i] = NULL; - } else { + pr_info("Stack Depot allocating hash table with kvcalloc\n"); + stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL); + if (!stack_table) { pr_err("Stack Depot hash table allocation failed, disabling\n"); stack_depot_disable = true; - mutex_unlock(&stack_depot_init_mutex); - return -ENOMEM; + ret = -ENOMEM; } } mutex_unlock(&stack_depot_init_mutex); - return 0; + return ret; } EXPORT_SYMBOL_GPL(stack_depot_init);
diff --git a/mm/page_owner.c b/mm/page_owner.c index 61ba829e895c..a80917074147 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -48,6 +48,9 @@ static int __init early_page_owner_param(char *buf) if (strcmp(buf, "on") == 0) page_owner_enabled = true;
+ if (page_owner_enabled) + stack_depot_want_early_init(); + return 0; } early_param("page_owner", early_page_owner_param); @@ -86,8 +89,6 @@ static void init_page_owner(void) if (!page_owner_enabled) return;
- stack_depot_init(); - register_dummy_stack(); register_failure_stack(); register_early_stack();
From: Ville Syrjälä ville.syrjala@linux.intel.com
mainline inclusion from mainline-v5.17-rc4 commit eb48d42198792f1330bbb3e82ac725d43c13fe02 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We call __save_depot_stack() unconditionally so the stack depot must always be initialized or else we'll oops on platforms without runtime pm support.
Presumably we've not seen this in CI due to stack_depot_init() already getting called via drm_mm_init()+CONFIG_DRM_DEBUG_MM.
Cc: Vlastimil Babka vbabka@suse.cz Cc: Dmitry Vyukov dvyukov@google.com Cc: Marco Elver elver@google.com # stackdepot Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Imre Deak imre.deak@intel.com Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()") Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20220126081539.23227-1-ville.s... Acked-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Imre Deak imre.deak@intel.com (cherry picked from commit 751a9d69b19702af35b0fedfb8ff362027c1cf0c) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- drivers/gpu/drm/i915/intel_runtime_pm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index b3d7ba685d9c..c90210ac5fb7 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -78,9 +78,7 @@ static void __print_depot_stack(depot_stack_handle_t stack, static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); - - if (rpm->available) - stack_depot_init(); + stack_depot_init(); }
static noinline depot_stack_handle_t
From: Yogesh Lal ylal@codeaurora.org
mainline inclusion from mainline-v5.12-rc1 commit d262093656a0eec6d6114a3178a9d887fddd0ded category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Use CONFIG_STACK_HASH_ORDER to configure STACK_HASH_SIZE.
Aim is to have configurable value for STACK_HASH_SIZE, so depend on use case one can configure it.
One example is of Page Owner, CONFIG_PAGE_OWNER works only if page_owner=on via kernel parameter on CONFIG_PAGE_OWNER configured system. Thus, unless admin enable it via command line option, the stackdepot will just waste 8M memory without any customer.
Making it configurable and use lower value helps to enable features like CONFIG_PAGE_OWNER without any significant overhead.
Link: https://lkml.kernel.org/r/1611749198-24316-1-git-send-email-vjitta@codeauror... Signed-off-by: Yogesh Lal ylal@codeaurora.org Signed-off-by: Vinayak Menon vinmenon@codeaurora.org Signed-off-by: Vijayanand Jitta vjitta@codeaurora.org Reviewed-by: Minchan Kim minchan@kernel.org Reviewed-by: Alexander Potapenko glider@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: lib/Kconfig [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- lib/Kconfig | 9 +++++++++ lib/stackdepot.c | 3 +-- 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/lib/Kconfig b/lib/Kconfig index 497304f1647c..06a796a94828 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -664,6 +664,15 @@ config STACKDEPOT bool select STACKTRACE
+config STACK_HASH_ORDER + int "stack depot hash size (12 => 4KB, 20 => 1024KB)" + range 12 20 + default 20 + depends on STACKDEPOT + help + Select the hash size as a power of 2 for the stackdepot hash table. + Choose a lower value to reduce the memory impact. + config STACKDEPOT_ALWAYS_INIT bool select STACKDEPOT diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 8aed500d5ee8..e3275f07076c 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -146,8 +146,7 @@ static struct stack_record *depot_alloc_stack(unsigned long *entries, int size, return stack; }
-#define STACK_HASH_ORDER 20 -#define STACK_HASH_SIZE (1L << STACK_HASH_ORDER) +#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER) #define STACK_HASH_MASK (STACK_HASH_SIZE - 1) #define STACK_HASH_SEED 0x9747b28c
From: Vlastimil Babka vbabka@suse.cz
mainline inclusion from mainline-v6.0-rc1 commit f9987921cb541b1187a648141a9048547ea89ffb category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As Linus explained [1], setting the stackdepot hash table size as a config option is suboptimal, especially as stackdepot becomes a dependency of less "expert" subsystems than initially (e.g. DRM, networking, SLUB_DEBUG):
: (a) it introduces a new compile-time question that isn't sane to ask : a regular user, but is now exposed to regular users.
: (b) this by default uses 1MB of memory for a feature that didn't in : the past, so now if you have small machines you need to make sure you : make a special kernel config for them.
Ideally we would employ rhashtable for fully automatic resizing, which should be feasible for many of the new users, but problematic for the original users with restricted context that call __stack_depot_save() with can_alloc == false, i.e. KASAN.
However we can easily remove the config option and scale the hash table automatically with system memory. The STACK_HASH_MASK constant becomes stack_hash_mask variable and is used only in one mask operation, so the overhead should be negligible to none. For early allocation we can employ the existing alloc_large_system_hash() function and perform similar scaling for the late allocation.
The existing limits of the config option (between 4k and 1M buckets) are preserved, and scaling factor is set to one bucket per 16kB memory so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB system, while a 1GB system will use 512kB.
Because KASAN is reported to need the maximum number of buckets even with smaller amounts of memory [2], set it as such when kasan_enabled().
If needed, the automatic scaling could be complemented with a boot-time kernel parameter, but it feels pointless to add it without a specific use case.
[1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oV... [2] https://lore.kernel.org/all/CACT4Y+Y4GZfXOru2z5tFPzFdaSUd+GFc6KVL=bsa0+1m197...
Link: https://lkml.kernel.org/r/20220620150249.16814-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reported-by: Linus Torvalds torvalds@linux-foundation.org Acked-by: Dmitry Vyukov dvyukov@google.com Cc: Marco Elver elver@google.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: lib/Kconfig lib/stackdepot.c [lib/stackdepot.c needs to call kasan_enabled() that hasn't been merged, call IS_ENABLED(CONFIG_KASAN) directly. other conflicts are context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- lib/Kconfig | 9 -------- lib/stackdepot.c | 59 ++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 49 insertions(+), 19 deletions(-)
diff --git a/lib/Kconfig b/lib/Kconfig index 06a796a94828..497304f1647c 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -664,15 +664,6 @@ config STACKDEPOT bool select STACKTRACE
-config STACK_HASH_ORDER - int "stack depot hash size (12 => 4KB, 20 => 1024KB)" - range 12 20 - default 20 - depends on STACKDEPOT - help - Select the hash size as a power of 2 for the stackdepot hash table. - Choose a lower value to reduce the memory impact. - config STACKDEPOT_ALWAYS_INIT bool select STACKDEPOT diff --git a/lib/stackdepot.c b/lib/stackdepot.c index e3275f07076c..2f73c861e81c 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -32,6 +32,7 @@ #include <linux/string.h> #include <linux/types.h> #include <linux/memblock.h> +extern unsigned long nr_free_buffer_pages(void);
#define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
@@ -146,10 +147,16 @@ static struct stack_record *depot_alloc_stack(unsigned long *entries, int size, return stack; }
-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER) -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1) +/* one hash table bucket entry per 16kB of memory */ +#define STACK_HASH_SCALE 14 +/* limited between 4k and 1M buckets */ +#define STACK_HASH_ORDER_MIN 12 +#define STACK_HASH_ORDER_MAX 20 #define STACK_HASH_SEED 0x9747b28c
+static unsigned int stack_hash_order; +static unsigned int stack_hash_mask; + static bool stack_depot_disable; static struct stack_record **stack_table;
@@ -176,7 +183,7 @@ void __init stack_depot_want_early_init(void)
int __init stack_depot_early_init(void) { - size_t size; + unsigned long entries = 0;
/* This is supposed to be called only once, from mm_init() */ if (WARN_ON(__stack_depot_early_init_passed)) @@ -184,13 +191,23 @@ int __init stack_depot_early_init(void)
__stack_depot_early_init_passed = true;
+ if (IS_ENABLED(CONFIG_KASAN) && !stack_hash_order) + stack_hash_order = STACK_HASH_ORDER_MAX; + if (!__stack_depot_want_early_init || stack_depot_disable) return 0;
- size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); - pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n", - size); - stack_table = memblock_alloc(size, SMP_CACHE_BYTES); + if (stack_hash_order) + entries = 1UL << stack_hash_order; + stack_table = alloc_large_system_hash("stackdepot", + sizeof(struct stack_record *), + entries, + STACK_HASH_SCALE, + HASH_EARLY | HASH_ZERO, + NULL, + &stack_hash_mask, + 1UL << STACK_HASH_ORDER_MIN, + 1UL << STACK_HASH_ORDER_MAX);
if (!stack_table) { pr_err("Stack Depot hash table allocation failed, disabling\n"); @@ -208,13 +225,35 @@ int stack_depot_init(void)
mutex_lock(&stack_depot_init_mutex); if (!stack_depot_disable && !stack_table) { - pr_info("Stack Depot allocating hash table with kvcalloc\n"); - stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL); + unsigned long entries; + int scale = STACK_HASH_SCALE; + + if (stack_hash_order) { + entries = 1UL << stack_hash_order; + } else { + entries = nr_free_buffer_pages(); + entries = roundup_pow_of_two(entries); + + if (scale > PAGE_SHIFT) + entries >>= (scale - PAGE_SHIFT); + else + entries <<= (PAGE_SHIFT - scale); + } + + if (entries < 1UL << STACK_HASH_ORDER_MIN) + entries = 1UL << STACK_HASH_ORDER_MIN; + if (entries > 1UL << STACK_HASH_ORDER_MAX) + entries = 1UL << STACK_HASH_ORDER_MAX; + + pr_info("Stack Depot allocating hash table of %lu entries with kvcalloc\n", + entries); + stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL); if (!stack_table) { pr_err("Stack Depot hash table allocation failed, disabling\n"); stack_depot_disable = true; ret = -ENOMEM; } + stack_hash_mask = entries - 1; } mutex_unlock(&stack_depot_init_mutex); return ret; @@ -317,7 +356,7 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, goto fast_exit;
hash = hash_stack(entries, nr_entries); - bucket = &stack_table[hash & STACK_HASH_MASK]; + bucket = &stack_table[hash & stack_hash_mask];
/* * Fast path: look the stack trace up without locking.
From: Sergei Trofimovich slyfox@gentoo.org
mainline inclusion from mainline-v5.13-rc1 commit 8e9b16c47680f6e7d6e5864a37f313f905a91cf5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Before the change page_owner recursion was detected via fetching backtrace and inspecting it for current instruction pointer. It has a few problems:
- it is slightly slow as it requires extra backtrace and a linear stack scan of the result
- it is too late to check if backtrace fetching required memory allocation itself (ia64's unwinder requires it).
To simplify recursion tracking let's use page_owner recursion flag in 'struct task_struct'.
The change make page_owner=on work on ia64 by avoiding infinite recursion in: kmalloc() -> __set_page_owner() -> save_stack() -> unwind() [ia64-specific] -> build_script() -> kmalloc() -> __set_page_owner() [we short-circuit here] -> save_stack() -> unwind() [recursion]
Link: https://lkml.kernel.org/r/20210402115342.1463781-1-slyfox@gentoo.org Signed-off-by: Sergei Trofimovich slyfox@gentoo.org Reviewed-by: Andrew Morton akpm@linux-foundation.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Cc: Daniel Bristot de Oliveira bristot@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: include/linux/sched.h [This patch adds a bit field in struct task_struct. Checked by pahole, there are holes to store the bit. Use macro KABI_FILL_HOLE to avoid kabi breakage warning.] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/sched.h | 3 +++ mm/page_owner.c | 32 ++++++++++---------------------- 2 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index b4ab407cab37..40022e4a48a6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -884,6 +884,9 @@ struct task_struct { #ifdef CONFIG_IOMMU_SVA KABI_FILL_HOLE(unsigned pasid_activated:1) #endif +#ifdef CONFIG_PAGE_OWNER + KABI_FILL_HOLE(unsigned in_page_owner:1) +#endif
unsigned long atomic_flags; /* Flags requiring atomic access. */
diff --git a/mm/page_owner.c b/mm/page_owner.c index a80917074147..15f2b8f1e0c5 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -107,42 +107,30 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext) return (void *)page_ext + page_owner_ops.offset; }
-static inline bool check_recursive_alloc(unsigned long *entries, - unsigned int nr_entries, - unsigned long ip) -{ - unsigned int i; - - for (i = 0; i < nr_entries; i++) { - if (entries[i] == ip) - return true; - } - return false; -} - static noinline depot_stack_handle_t save_stack(gfp_t flags) { unsigned long entries[PAGE_OWNER_STACK_DEPTH]; depot_stack_handle_t handle; unsigned int nr_entries;
- nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); - /* - * We need to check recursion here because our request to - * stackdepot could trigger memory allocation to save new - * entry. New memory allocation would reach here and call - * stack_depot_save_entries() again if we don't catch it. There is - * still not enough memory in stackdepot so it would try to - * allocate memory again and loop forever. + * Avoid recursion. + * + * Sometimes page metadata allocation tracking requires more + * memory to be allocated: + * - when new stack trace is saved to stack depot + * - when backtrace itself is calculated (ia64) */ - if (check_recursive_alloc(entries, nr_entries, _RET_IP_)) + if (current->in_page_owner) return dummy_handle; + current->in_page_owner = 1;
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); handle = stack_depot_save(entries, nr_entries, flags); if (!handle) handle = failure_handle;
+ current->in_page_owner = 0; return handle; }
From: Ting Liu liuting.0x7c00@bytedance.com
mainline inclusion from mainline-v5.17-rc1 commit cab0a7c115546a4865fb7439558af9077a569574 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
"page_idle_ops" as a global var, but its scope of use within this document. So it should be static.
"page_ext_ops" is a var used in the kernel initial phase. And other functions are aslo used in the kernel initial phase. So they should be __init or __initdata to reclaim memory.
Link: https://lkml.kernel.org/r/20211217095023.67293-1-liuting.0x7c00@bytedance.co... Signed-off-by: Ting Liu liuting.0x7c00@bytedance.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: include/linux/page_idle.h [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/page_idle.h | 1 - mm/page_ext.c | 4 ++-- mm/page_owner.c | 4 ++-- 3 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index d8a6aecf99cb..8ac4926d396f 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -43,7 +43,6 @@ static inline void clear_page_idle(struct page *page) * If there is not enough space to store Idle and Young bits in page flags, use * page ext flags instead. */ -extern struct page_ext_operations page_idle_ops;
static inline bool page_is_young(struct page *page) { diff --git a/mm/page_ext.c b/mm/page_ext.c index 8e59da0f4367..e807366017ff 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -63,12 +63,12 @@ static bool need_page_idle(void) { return true; } -struct page_ext_operations page_idle_ops = { +static struct page_ext_operations page_idle_ops __initdata = { .need = need_page_idle, }; #endif
-static struct page_ext_operations *page_ext_ops[] = { +static struct page_ext_operations *page_ext_ops[] __initdata = { #ifdef CONFIG_PAGE_OWNER &page_owner_ops, #endif diff --git a/mm/page_owner.c b/mm/page_owner.c index 15f2b8f1e0c5..bd1b0e9e11de 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -55,7 +55,7 @@ static int __init early_page_owner_param(char *buf) } early_param("page_owner", early_page_owner_param);
-static bool need_page_owner(void) +static __init bool need_page_owner(void) { return page_owner_enabled; } @@ -84,7 +84,7 @@ static noinline void register_early_stack(void) early_handle = create_dummy_stack(); }
-static void init_page_owner(void) +static __init void init_page_owner(void) { if (!page_owner_enabled) return;
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v5.18-rc1 commit 3ebc439761273274ea00258da84d997841f01e72 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The snprintf() function can return a length greater than the given input size. That will require a check for buffer overrun after each invocation of snprintf(). scnprintf(), on the other hand, will never return a greater length.
By using scnprintf() in selected places, we can avoid some buffer overrun checks except after stack_depot_snprint() and after the last snprintf().
Link: https://lkml.kernel.org/r/20220202203036.744010-3-longman@redhat.com Signed-off-by: Waiman Long longman@redhat.com Acked-by: David Rientjes rientjes@google.com Reviewed-by: Sergey Senozhatsky senozhatsky@chromium.org Acked-by: Rafael Aquini aquini@redhat.com Acked-by: Mike Rapoport rppt@linux.ibm.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@kernel.org Cc: Petr Mladek pmladek@suse.com Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Steven Rostedt (Google) rostedt@goodmis.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/page_owner.c [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index bd1b0e9e11de..46521c006665 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -350,19 +350,16 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, if (!kbuf) return -ENOMEM;
- ret = snprintf(kbuf, count, + ret = scnprintf(kbuf, count, "Page allocated via order %u, mask %#x(%pGg), pid %d, ts %llu ns, free_ts %llu ns\n", page_owner->order, page_owner->gfp_mask, &page_owner->gfp_mask, page_owner->pid, page_owner->ts_nsec, page_owner->free_ts_nsec);
- if (ret >= count) - goto err; - /* Print information relevant to grouping pages by mobility */ pageblock_mt = get_pageblock_migratetype(page); page_mt = gfp_migratetype(page_owner->gfp_mask); - ret += snprintf(kbuf + ret, count - ret, + ret += scnprintf(kbuf + ret, count - ret, "PFN %lu type %s Block %lu type %s Flags %#lx(%pGp)\n", pfn, migratetype_names[page_mt], @@ -370,20 +367,15 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, migratetype_names[pageblock_mt], page->flags, &page->flags);
- if (ret >= count) - goto err; - nr_entries = stack_depot_fetch(handle, &entries); ret += stack_trace_snprint(kbuf + ret, count - ret, entries, nr_entries, 0); if (ret >= count) goto err;
if (page_owner->last_migrate_reason != -1) { - ret += snprintf(kbuf + ret, count - ret, + ret += scnprintf(kbuf + ret, count - ret, "Page has been migrated, last migrate reason: %s\n", migrate_reason_names[page_owner->last_migrate_reason]); - if (ret >= count) - goto err; }
ret += snprintf(kbuf + ret, count - ret, "\n");
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v5.18-rc1 commit fcf8935832b86d3437f00e732c6d0d4d2819d6a9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
It was found that a number of offline memcgs were not freed because they were pinned by some charged pages that were present. Even "echo 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These offline but not freed memcgs tend to increase in number over time with the side effect that percpu memory consumption as shown in /proc/meminfo also increases over time.
In order to find out more information about those pages that pin offline memcgs, the page_owner feature is extended to print memory cgroup information especially whether the cgroup is offline or not. RCU read lock is taken when memcg is being accessed to make sure that it won't be freed.
Link: https://lkml.kernel.org/r/20220202203036.744010-4-longman@redhat.com Signed-off-by: Waiman Long longman@redhat.com Acked-by: David Rientjes rientjes@google.com Acked-by: Roman Gushchin guro@fb.com Acked-by: Rafael Aquini aquini@redhat.com Acked-by: Mike Rapoport rppt@linux.ibm.com Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@kernel.org Cc: Petr Mladek pmladek@suse.com Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: Steven Rostedt (Google) rostedt@goodmis.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c index 46521c006665..08f8a861b25c 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -10,6 +10,7 @@ #include <linux/migrate.h> #include <linux/stackdepot.h> #include <linux/seq_file.h> +#include <linux/memcontrol.h> #include <linux/sched/clock.h>
#include "internal.h" @@ -335,6 +336,45 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m, seq_putc(m, '\n'); }
+/* + * Looking for memcg information and print it out + */ +static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, + struct page *page) +{ +#ifdef CONFIG_MEMCG + unsigned long memcg_data; + struct mem_cgroup *memcg; + bool online; + char name[80]; + + rcu_read_lock(); + memcg_data = READ_ONCE(page->memcg_data); + if (!memcg_data) + goto out_unlock; + + if (memcg_data & MEMCG_DATA_OBJCGS) + ret += scnprintf(kbuf + ret, count - ret, + "Slab cache page\n"); + + memcg = page_memcg_check(page); + if (!memcg) + goto out_unlock; + + online = (memcg->css.flags & CSS_ONLINE); + cgroup_name(memcg->css.cgroup, name, sizeof(name)); + ret += scnprintf(kbuf + ret, count - ret, + "Charged %sto %smemcg %s\n", + PageMemcgKmem(page) ? "(via objcg) " : "", + online ? "" : "offline ", + name); +out_unlock: + rcu_read_unlock(); +#endif /* CONFIG_MEMCG */ + + return ret; +} + static ssize_t print_page_owner(char __user *buf, size_t count, unsigned long pfn, struct page *page, struct page_owner *page_owner, @@ -378,6 +418,8 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, migrate_reason_names[page_owner->last_migrate_reason]); }
+ ret = print_page_owner_memcg(kbuf, count, ret, page); + ret += snprintf(kbuf + ret, count - ret, "\n"); if (ret >= count) goto err;
From: Waiman Long longman@redhat.com
mainline inclusion from mainline-v5.18-rc1 commit 865ed6a3278654ce4a55eb74c5283eeb82ad4699 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The page_owner information currently includes the pid of the calling task. That is useful as long as the task is still running. Otherwise, the number is meaningless. To have more information about the allocating tasks that had exited by the time the page_owner information is retrieved, we need to store the command name of the task.
Add a new comm field into page_owner structure to store the command name and display it when the page_owner information is retrieved.
Link: https://lkml.kernel.org/r/20220202203036.744010-5-longman@redhat.com Signed-off-by: Waiman Long longman@redhat.com Acked-by: Rafael Aquini aquini@redhat.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: David Rientjes rientjes@google.com Cc: Ira Weiny ira.weiny@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@kernel.org Cc: Mike Rapoport rppt@kernel.org Cc: Petr Mladek pmladek@suse.com Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: Steven Rostedt (Google) rostedt@goodmis.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/page_owner.c [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index 08f8a861b25c..0b4b5b204257 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -29,6 +29,7 @@ struct page_owner { depot_stack_handle_t free_handle; u64 ts_nsec; u64 free_ts_nsec; + char comm[TASK_COMM_LEN]; pid_t pid; };
@@ -172,6 +173,8 @@ static inline void __set_page_owner_handle(struct page *page, page_owner->last_migrate_reason = -1; page_owner->pid = current->pid; page_owner->ts_nsec = local_clock(); + strlcpy(page_owner->comm, current->comm, + sizeof(page_owner->comm)); __set_bit(PAGE_EXT_OWNER, &page_ext->flags); __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
@@ -239,6 +242,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage) new_page_owner->pid = old_page_owner->pid; new_page_owner->ts_nsec = old_page_owner->ts_nsec; new_page_owner->free_ts_nsec = old_page_owner->ts_nsec; + strcpy(new_page_owner->comm, old_page_owner->comm);
/* * We don't clear the bit on the oldpage as it's going to be freed @@ -391,10 +395,11 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, return -ENOMEM;
ret = scnprintf(kbuf, count, - "Page allocated via order %u, mask %#x(%pGg), pid %d, ts %llu ns, free_ts %llu ns\n", + "Page allocated via order %u, mask %#x(%pGg), pid %d (%s), ts %llu ns, free_ts %llu ns\n", page_owner->order, page_owner->gfp_mask, &page_owner->gfp_mask, page_owner->pid, - page_owner->ts_nsec, page_owner->free_ts_nsec); + page_owner->comm, page_owner->ts_nsec, + page_owner->free_ts_nsec);
/* Print information relevant to grouping pages by mobility */ pageblock_mt = get_pageblock_migratetype(page); @@ -464,9 +469,10 @@ void __dump_page_owner(struct page *page) else pr_alert("page_owner tracks the page as freed\n");
- pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg), pid %d, ts %llu, free_ts %llu\n", + pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg), pid %d (%s), ts %llu, free_ts %llu\n", page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask, - page_owner->pid, page_owner->ts_nsec, page_owner->free_ts_nsec); + page_owner->pid, page_owner->comm, page_owner->ts_nsec, + page_owner->free_ts_nsec);
handle = READ_ONCE(page_owner->handle); if (!handle) {
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit bf215eab785a30756ea1e53b62a5638d1177a795 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In a single-threaded process, the pid in kernel task_struct is the same as the tgid, which can mark the process of page allocation. But in a multithreaded process, only the task_struct of the thread leader has the same pid as tgid, and the pids of other threads are different from tgid. Therefore, tgid is recorded to provide effective information for debugging and data statistics of multithreaded programs.
This can also be achieved by observing the task name (executable file name) for a specific process. However, when the same program is started multiple times, the task name is the same and the tgid is different. Therefore, in the debugging of multi-threaded programs, combined with the task name and tgid, more accurate runtime information of a certain run of the program can be obtained.
Link: https://lkml.kernel.org/r/20220219180450.2399-1-caoyixuan2019@email.szu.edu.... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Waiman Long longman@redhat.com Cc: Rafael Aquini aquini@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: mm/page_owner.c [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index 0b4b5b204257..ce44df1d603a 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -31,6 +31,7 @@ struct page_owner { u64 free_ts_nsec; char comm[TASK_COMM_LEN]; pid_t pid; + pid_t tgid; };
static bool page_owner_enabled = false; @@ -172,6 +173,7 @@ static inline void __set_page_owner_handle(struct page *page, page_owner->gfp_mask = gfp_mask; page_owner->last_migrate_reason = -1; page_owner->pid = current->pid; + page_owner->tgid = current->tgid; page_owner->ts_nsec = local_clock(); strlcpy(page_owner->comm, current->comm, sizeof(page_owner->comm)); @@ -240,6 +242,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage) old_page_owner->last_migrate_reason; new_page_owner->handle = old_page_owner->handle; new_page_owner->pid = old_page_owner->pid; + new_page_owner->tgid = old_page_owner->tgid; new_page_owner->ts_nsec = old_page_owner->ts_nsec; new_page_owner->free_ts_nsec = old_page_owner->ts_nsec; strcpy(new_page_owner->comm, old_page_owner->comm); @@ -395,11 +398,11 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, return -ENOMEM;
ret = scnprintf(kbuf, count, - "Page allocated via order %u, mask %#x(%pGg), pid %d (%s), ts %llu ns, free_ts %llu ns\n", + "Page allocated via order %u, mask %#x(%pGg), pid %d, tgid %d (%s), ts %llu ns, free_ts %llu ns\n", page_owner->order, page_owner->gfp_mask, &page_owner->gfp_mask, page_owner->pid, - page_owner->comm, page_owner->ts_nsec, - page_owner->free_ts_nsec); + page_owner->tgid, page_owner->comm, + page_owner->ts_nsec, page_owner->free_ts_nsec);
/* Print information relevant to grouping pages by mobility */ pageblock_mt = get_pageblock_migratetype(page); @@ -469,10 +472,10 @@ void __dump_page_owner(struct page *page) else pr_alert("page_owner tracks the page as freed\n");
- pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg), pid %d (%s), ts %llu, free_ts %llu\n", + pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg), pid %d, tgid %d (%s), ts %llu, free_ts %llu\n", page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask, - page_owner->pid, page_owner->comm, page_owner->ts_nsec, - page_owner->free_ts_nsec); + page_owner->pid, page_owner->tgid, page_owner->comm, + page_owner->ts_nsec, page_owner->free_ts_nsec);
handle = READ_ONCE(page_owner->handle); if (!handle) {
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.19-rc1 commit cd8c1fd8cdd14158f2d8bea2d1bfe8015dccfa3a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
current->comm[] is not a string (no guarantee for a zero byte in it).
strlcpy(s1, s2, l) is calling strlen(s2), potentially causing out-of-bound access, as reported by syzbot:
detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:980! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 4087 Comm: dhcpcd-run-hooks Not tainted 5.18.0-rc3-syzkaller-01537-g20b87e7c29df #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x18/0x1a lib/string_helpers.c:980 Code: 8c e8 c5 ba e1 fa e9 23 0f bf fa e8 0b 5d 8c f8 eb db 55 48 89 fd e8 e0 49 40 f8 48 89 ee 48 c7 c7 80 f5 26 8a e8 99 09 f1 ff <0f> 0b e8 ca 49 40 f8 48 8b 54 24 18 4c 89 f1 48 c7 c7 00 00 27 8a RSP: 0018:ffffc900000074a8 EFLAGS: 00010286
RAX: 000000000000002c RBX: ffff88801226b728 RCX: 0000000000000000 RDX: ffff8880198e0000 RSI: ffffffff81600458 RDI: fffff52000000e87 RBP: ffffffff89da2aa0 R08: 000000000000002c R09: 0000000000000000 R10: ffffffff815fae2e R11: 0000000000000000 R12: ffff88801226b700 R13: ffff8880198e0830 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5876ad6ff8 CR3: 000000001a48c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: <IRQ> __fortify_strlen include/linux/fortify-string.h:128 [inline] strlcpy include/linux/fortify-string.h:143 [inline] __set_page_owner_handle+0x2b1/0x3e0 mm/page_owner.c:171 __set_page_owner+0x3e/0x50 mm/page_owner.c:190 prep_new_page mm/page_alloc.c:2441 [inline] get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408 alloc_pages+0x1aa/0x310 mm/mempolicy.c:2272 alloc_slab_page mm/slub.c:1799 [inline] allocate_slab+0x26c/0x3c0 mm/slub.c:1944 new_slab mm/slub.c:2004 [inline] ___slab_alloc+0x8df/0xf20 mm/slub.c:3005 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3092 slab_alloc_node mm/slub.c:3183 [inline] slab_alloc mm/slub.c:3225 [inline] __kmem_cache_alloc_lru mm/slub.c:3232 [inline] kmem_cache_alloc+0x360/0x3b0 mm/slub.c:3242 dst_alloc+0x146/0x1f0 net/core/dst.c:92
Link: https://lkml.kernel.org/r/20220509145949.265184-1-eric.dumazet@gmail.com Fixes: 865ed6a32786 ("mm/page_owner: record task command name") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Waiman Long longman@redhat.com Acked-by: Shakeel Butt shakeelb@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index ce44df1d603a..84f6d08632c3 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -175,7 +175,7 @@ static inline void __set_page_owner_handle(struct page *page, page_owner->pid = current->pid; page_owner->tgid = current->tgid; page_owner->ts_nsec = local_clock(); - strlcpy(page_owner->comm, current->comm, + strscpy(page_owner->comm, current->comm, sizeof(page_owner->comm)); __set_bit(PAGE_EXT_OWNER, &page_ext->flags); __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
From: Charan Teja Kalla quic_charante@quicinc.com
mainline inclusion from mainline-v6.1-rc1 commit b1d5488a252dc9c0d9574100d0b8d807bf154603 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The below is one path where race between page_ext and offline of the respective memory blocks will cause use-after-free on the access of page_ext structure.
process1 process2 --------- --------- a)doing /proc/page_owner doing memory offline through offline_pages.
b) PageBuddy check is failed thus proceed to get the page_owner information through page_ext access. page_ext = lookup_page_ext(page);
migrate_pages(); ................. Since all pages are successfully migrated as part of the offline operation,send MEM_OFFLINE notification where for page_ext it calls: offline_page_ext()--> __free_page_ext()--> free_page_ext()--> vfree(ms->page_ext) mem_section->page_ext = NULL
c) Check for the PAGE_EXT flags in the page_ext->flags access results into the use-after-free (leading to the translation faults).
As mentioned above, there is really no synchronization between page_ext access and its freeing in the memory_offline.
The memory offline steps(roughly) on a memory block is as below:
1) Isolate all the pages
2) while(1) try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
3) delete the pages from this buddy list.
4) Then free page_ext.(Note: The struct page is still alive as it is freed only during hot remove of the memory which frees the memmap, which steps the user might not perform).
This design leads to the state where struct page is alive but the struct page_ext is freed, where the later is ideally part of the former which just representing the page_flags (check [3] for why this design is chosen).
The abovementioned race is just one example __but the problem persists in the other paths too involving page_ext->flags access(eg: page_is_idle())__.
Fix all the paths where offline races with page_ext access by maintaining synchronization with rcu lock and is achieved in 3 steps:
1) Invalidate all the page_ext's of the sections of a memory block by storing a flag in the LSB of mem_section->page_ext.
2) Wait until all the existing readers to finish working with the ->page_ext's with synchronize_rcu(). Any parallel process that starts after this call will not get page_ext, through lookup_page_ext(), for the block parallel offline operation is being performed.
3) Now safely free all sections ->page_ext's of the block on which offline operation is being performed.
Note: If synchronize_rcu() takes time then optimizations can be done in this path through call_rcu()[2].
Thanks to David Hildenbrand for his views/suggestions on the initial discussion[1] and Pavan kondeti for various inputs on this patch.
[1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicin... [2] https://lore.kernel.org/all/a26ce299-aed1-b8ad-711e-a49e82bdd180@quicinc.com... [3] https://lore.kernel.org/all/6fa6b7aa-731e-891c-3efb-a03d6a700efa@redhat.com/
[quic_charante@quicinc.com: rename label `loop' to `ext_put_continue' per David] Link: https://lkml.kernel.org/r/1661496993-11473-1-git-send-email-quic_charante@qu... Link: https://lkml.kernel.org/r/1660830600-9068-1-git-send-email-quic_charante@qui... Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Suggested-by: David Hildenbrand david@redhat.com Suggested-by: Michal Hocko mhocko@suse.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: David Hildenbrand david@redhat.com Cc: Fernand Sieber sieberf@amazon.com Cc: Minchan Kim minchan@google.com Cc: Pasha Tatashin pasha.tatashin@soleen.com Cc: Pavan Kondeti quic_pkondeti@quicinc.com Cc: SeongJae Park sjpark@amazon.de Cc: Shakeel Butt shakeelb@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: William Kucharski william.kucharski@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: include/linux/page_idle.h mm/page_ext.c mm/page_owner.c mm/page_table.c
[mm/page_table.c is not exsited, the commit df4e817b710809425d899340dbfa8504a3ca4ba5 is not backported. Other conflicts are context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- include/linux/page_ext.h | 17 ++++--- include/linux/page_idle.h | 33 ++++++++---- mm/page_ext.c | 104 +++++++++++++++++++++++++++++++++++--- mm/page_owner.c | 72 +++++++++++++++++++------- 4 files changed, 186 insertions(+), 40 deletions(-)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index fabb2e1e087f..ed27198cdaf4 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -55,7 +55,8 @@ static inline void page_ext_init(void) } #endif
-struct page_ext *lookup_page_ext(const struct page *page); +extern struct page_ext *page_ext_get(struct page *page); +extern void page_ext_put(struct page_ext *page_ext);
static inline struct page_ext *page_ext_next(struct page_ext *curr) { @@ -71,11 +72,6 @@ static inline void pgdat_page_ext_init(struct pglist_data *pgdat) { }
-static inline struct page_ext *lookup_page_ext(const struct page *page) -{ - return NULL; -} - static inline void page_ext_init(void) { } @@ -87,5 +83,14 @@ static inline void page_ext_init_flatmem_late(void) static inline void page_ext_init_flatmem(void) { } + +static inline struct page_ext *page_ext_get(struct page *page) +{ + return NULL; +} + +static inline void page_ext_put(struct page_ext *page_ext) +{ +} #endif /* CONFIG_PAGE_EXTENSION */ #endif /* __LINUX_PAGE_EXT_H */ diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 8ac4926d396f..83ccdf07d29f 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -46,62 +46,77 @@ static inline void clear_page_idle(struct page *page)
static inline bool page_is_young(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page); + bool page_young;
if (unlikely(!page_ext)) return false;
- return test_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_young = test_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_ext_put(page_ext); + + return page_young; }
static inline void set_page_young(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page);
if (unlikely(!page_ext)) return;
set_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_ext_put(page_ext); }
static inline bool test_and_clear_page_young(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page); + bool page_young;
if (unlikely(!page_ext)) return false;
- return test_and_clear_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_young = test_and_clear_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_ext_put(page_ext); + + return page_young; }
static inline bool page_is_idle(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page); + bool page_idle;
if (unlikely(!page_ext)) return false;
- return test_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_idle = test_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_ext_put(page_ext); + + return page_idle; }
static inline void set_page_idle(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page);
if (unlikely(!page_ext)) return;
set_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_ext_put(page_ext); }
static inline void clear_page_idle(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page);
if (unlikely(!page_ext)) return;
clear_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_ext_put(page_ext); } #endif /* CONFIG_64BIT */
diff --git a/mm/page_ext.c b/mm/page_ext.c index e807366017ff..c40d3f7456df 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -8,6 +8,7 @@ #include <linux/kmemleak.h> #include <linux/page_owner.h> #include <linux/page_idle.h> +#include <linux/rcupdate.h>
/* * struct page extension @@ -58,6 +59,10 @@ * can utilize this callback to initialize the state of it correctly. */
+#ifdef CONFIG_SPARSEMEM +#define PAGE_EXT_INVALID (0x1) +#endif + #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) static bool need_page_idle(void) { @@ -80,6 +85,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = { unsigned long page_ext_size = sizeof(struct page_ext);
static unsigned long total_usage; +static struct page_ext *lookup_page_ext(const struct page *page);
static bool __init invoke_need_callbacks(void) { @@ -121,6 +127,48 @@ static inline struct page_ext *get_entry(void *base, unsigned long index) return base + page_ext_size * index; }
+/** + * page_ext_get() - Get the extended information for a page. + * @page: The page we're interested in. + * + * Ensures that the page_ext will remain valid until page_ext_put() + * is called. + * + * Return: NULL if no page_ext exists for this page. + * Context: Any context. Caller may not sleep until they have called + * page_ext_put(). + */ +struct page_ext *page_ext_get(struct page *page) +{ + struct page_ext *page_ext; + + rcu_read_lock(); + page_ext = lookup_page_ext(page); + if (!page_ext) { + rcu_read_unlock(); + return NULL; + } + + return page_ext; +} + +/** + * page_ext_put() - Working with page extended information is done. + * @page_ext - Page extended information received from page_ext_get(). + * + * The page extended information of the page may not be valid after this + * function is called. + * + * Return: None. + * Context: Any context with corresponding page_ext_get() is called. + */ +void page_ext_put(struct page_ext *page_ext) +{ + if (unlikely(!page_ext)) + return; + + rcu_read_unlock(); +} #ifndef CONFIG_SPARSEMEM
@@ -129,12 +177,13 @@ void __meminit pgdat_page_ext_init(struct pglist_data *pgdat) pgdat->node_page_ext = NULL; }
-struct page_ext *lookup_page_ext(const struct page *page) +static struct page_ext *lookup_page_ext(const struct page *page) { unsigned long pfn = page_to_pfn(page); unsigned long index; struct page_ext *base;
+ WARN_ON_ONCE(!rcu_read_lock_held()); base = NODE_DATA(page_to_nid(page))->node_page_ext; /* * The sanity checks the page allocator does upon freeing a @@ -203,19 +252,27 @@ void __init page_ext_init_flatmem(void)
#else /* CONFIG_FLAT_NODE_MEM_MAP */
-struct page_ext *lookup_page_ext(const struct page *page) +static bool page_ext_invalid(struct page_ext *page_ext) +{ + return !page_ext || (((unsigned long)page_ext & PAGE_EXT_INVALID) == PAGE_EXT_INVALID); +} + +static struct page_ext *lookup_page_ext(const struct page *page) { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); + struct page_ext *page_ext = READ_ONCE(section->page_ext); + + WARN_ON_ONCE(!rcu_read_lock_held()); /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are * allocated when feeding a range of pages to the allocator * for the first time during bootup or memory hotplug. */ - if (!section->page_ext) + if (page_ext_invalid(page_ext)) return NULL; - return get_entry(section->page_ext, pfn); + return get_entry(page_ext, pfn); }
static void *__meminit alloc_page_ext(size_t size, int nid) @@ -294,9 +351,30 @@ static void __free_page_ext(unsigned long pfn) ms = __pfn_to_section(pfn); if (!ms || !ms->page_ext) return; - base = get_entry(ms->page_ext, pfn); + + base = READ_ONCE(ms->page_ext); + /* + * page_ext here can be valid while doing the roll back + * operation in online_page_ext(). + */ + if (page_ext_invalid(base)) + base = (void *)base - PAGE_EXT_INVALID; + WRITE_ONCE(ms->page_ext, NULL); + + base = get_entry(base, pfn); free_page_ext(base); - ms->page_ext = NULL; +} + +static void __invalidate_page_ext(unsigned long pfn) +{ + struct mem_section *ms; + void *val; + + ms = __pfn_to_section(pfn); + if (!ms || !ms->page_ext) + return; + val = (void *)ms->page_ext + PAGE_EXT_INVALID; + WRITE_ONCE(ms->page_ext, val); }
static int __meminit online_page_ext(unsigned long start_pfn, @@ -339,6 +417,20 @@ static int __meminit offline_page_ext(unsigned long start_pfn, start = SECTION_ALIGN_DOWN(start_pfn); end = SECTION_ALIGN_UP(start_pfn + nr_pages);
+ /* + * Freeing of page_ext is done in 3 steps to avoid + * use-after-free of it: + * 1) Traverse all the sections and mark their page_ext + * as invalid. + * 2) Wait for all the existing users of page_ext who + * started before invalidation to finish. + * 3) Free the page_ext. + */ + for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) + __invalidate_page_ext(pfn); + + synchronize_rcu(); + for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) __free_page_ext(pfn); return 0; diff --git a/mm/page_owner.c b/mm/page_owner.c index 84f6d08632c3..6aad07c996b3 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -147,7 +147,7 @@ void __reset_page_owner(struct page *page, unsigned int order)
handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
- page_ext = lookup_page_ext(page); + page_ext = page_ext_get(page); if (unlikely(!page_ext)) return; for (i = 0; i < (1 << order); i++) { @@ -157,6 +157,7 @@ void __reset_page_owner(struct page *page, unsigned int order) page_owner->free_ts_nsec = free_ts_nsec; page_ext = page_ext_next(page_ext); } + page_ext_put(page_ext); }
static inline void __set_page_owner_handle(struct page *page, @@ -187,19 +188,22 @@ static inline void __set_page_owner_handle(struct page *page, noinline void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext; depot_stack_handle_t handle;
+ handle = save_stack(gfp_mask); + + page_ext = page_ext_get(page); if (unlikely(!page_ext)) return;
- handle = save_stack(gfp_mask); __set_page_owner_handle(page, page_ext, handle, order, gfp_mask); + page_ext_put(page_ext); }
void __set_page_owner_migrate_reason(struct page *page, int reason) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page); struct page_owner *page_owner;
if (unlikely(!page_ext)) @@ -207,12 +211,13 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
page_owner = get_page_owner(page_ext); page_owner->last_migrate_reason = reason; + page_ext_put(page_ext); }
void __split_page_owner(struct page *page, unsigned int nr) { int i; - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page); struct page_owner *page_owner;
if (unlikely(!page_ext)) @@ -223,17 +228,25 @@ void __split_page_owner(struct page *page, unsigned int nr) page_owner->order = 0; page_ext = page_ext_next(page_ext); } + page_ext_put(page_ext); }
void __copy_page_owner(struct page *oldpage, struct page *newpage) { - struct page_ext *old_ext = lookup_page_ext(oldpage); - struct page_ext *new_ext = lookup_page_ext(newpage); + struct page_ext *old_ext; + struct page_ext *new_ext; struct page_owner *old_page_owner, *new_page_owner;
- if (unlikely(!old_ext || !new_ext)) + old_ext = page_ext_get(oldpage); + if (unlikely(!old_ext)) return;
+ new_ext = page_ext_get(newpage); + if (unlikely(!new_ext)) { + page_ext_put(old_ext); + return; + } + old_page_owner = get_page_owner(old_ext); new_page_owner = get_page_owner(new_ext); new_page_owner->order = old_page_owner->order; @@ -258,6 +271,8 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage) */ __set_bit(PAGE_EXT_OWNER, &new_ext->flags); __set_bit(PAGE_EXT_OWNER_ALLOCATED, &new_ext->flags); + page_ext_put(new_ext); + page_ext_put(old_ext); }
void pagetypeinfo_showmixedcount_print(struct seq_file *m, @@ -314,12 +329,12 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m, if (PageReserved(page)) continue;
- page_ext = lookup_page_ext(page); + page_ext = page_ext_get(page); if (unlikely(!page_ext)) continue;
if (!test_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags)) - continue; + goto ext_put_continue;
page_owner = get_page_owner(page_ext); page_mt = gfp_migratetype(page_owner->gfp_mask); @@ -330,9 +345,12 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m, count[pageblock_mt]++;
pfn = block_end_pfn; + page_ext_put(page_ext); break; } pfn += (1UL << page_owner->order) - 1; +ext_put_continue: + page_ext_put(page_ext); } }
@@ -445,7 +463,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
void __dump_page_owner(struct page *page) { - struct page_ext *page_ext = lookup_page_ext(page); + struct page_ext *page_ext = page_ext_get(page); struct page_owner *page_owner; depot_stack_handle_t handle; unsigned long *entries; @@ -464,6 +482,7 @@ void __dump_page_owner(struct page *page)
if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) { pr_alert("page_owner info is not present (never set?)\n"); + page_ext_put(page_ext); return; }
@@ -497,6 +516,7 @@ void __dump_page_owner(struct page *page) if (page_owner->last_migrate_reason != -1) pr_alert("page has been migrated, last migrate reason: %s\n", migrate_reason_names[page_owner->last_migrate_reason]); + page_ext_put(page_ext); }
static ssize_t @@ -522,6 +542,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
/* Find an allocated page */ for (; pfn < max_pfn; pfn++) { + /* + * This temporary page_owner is required so + * that we can avoid the context switches while holding + * the rcu lock and copying the page owner information to + * user through copy_to_user() or GFP_KERNEL allocations. + */ + struct page_owner page_owner_tmp; + /* * If the new page is in a new MAX_ORDER_NR_PAGES area, * validate the area as existing, skip it if not @@ -544,7 +572,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) continue; }
- page_ext = lookup_page_ext(page); + page_ext = page_ext_get(page); if (unlikely(!page_ext)) continue;
@@ -553,14 +581,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) * because we don't hold the zone lock. */ if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) - continue; + goto ext_put_continue;
/* * Although we do have the info about past allocation of free * pages, it's not relevant for current memory usage. */ if (!test_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags)) - continue; + goto ext_put_continue;
page_owner = get_page_owner(page_ext);
@@ -569,7 +597,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) * would inflate the stats. */ if (!IS_ALIGNED(pfn, 1 << page_owner->order)) - continue; + goto ext_put_continue;
/* * Access to page_ext->handle isn't synchronous so we should @@ -577,13 +605,17 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) */ handle = READ_ONCE(page_owner->handle); if (!handle) - continue; + goto ext_put_continue;
/* Record the next PFN to read in the file offset */ *ppos = (pfn - min_low_pfn) + 1;
+ page_owner_tmp = *page_owner; + page_ext_put(page_ext); return print_page_owner(buf, count, pfn, page, - page_owner, handle); + &page_owner_tmp, handle); +ext_put_continue: + page_ext_put(page_ext); }
return 0; @@ -641,18 +673,20 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) if (PageReserved(page)) continue;
- page_ext = lookup_page_ext(page); + page_ext = page_ext_get(page); if (unlikely(!page_ext)) continue;
/* Maybe overlapping zone */ if (test_bit(PAGE_EXT_OWNER, &page_ext->flags)) - continue; + goto ext_put_continue;
/* Found early allocated page */ __set_page_owner_handle(page, page_ext, early_handle, 0, 0); count++; +ext_put_continue: + page_ext_put(page_ext); } cond_resched(); }
From: Zhenhua Huang quic_zhenhuah@quicinc.com
mainline inclusion from mainline-v6.1-rc1 commit 0bba9af03d55d2cc1aa7616a8b9e522ceb49d180 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Remove an expensive and unnecessary operation as PCP pages are safely skipped when reading page owner.PCP pages can be skipped because PAGE_EXT_OWNER_ALLOCATED is cleared.
With draining PCP pages, these pages are moved to buddy list so they can be identified as buddy pages and skipped quickly. Although it improved efficiency of PFN walker, the drain is guaranteed expensive that is unlikely to be offset by a slight increase in efficiency when skipping free pages.
PAGE_EXT_OWNER_ALLOCATED is cleared in the page owner reset path below: free_unref_page -> free_unref_page_prepare -> free_pcp_prepare -> free_pages_prepare which do page owner reset -> free_unref_page_commit which add pages into pcp list
Link: https://lkml.kernel.org/r/1662704326-15899-1-git-send-email-quic_zhenhuah@qu... Link: https://lkml.kernel.org/r/1662633204-10044-1-git-send-email-quic_zhenhuah@qu... Link: https://lkml.kernel.org/r/1662537673-9392-1-git-send-email-quic_zhenhuah@qui... Signed-off-by: Zhenhua Huang quic_zhenhuah@quicinc.com Acked-by: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index 6aad07c996b3..ea1a1e4063b2 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -538,8 +538,6 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) pfn++;
- drain_all_pages(NULL); - /* Find an allocated page */ for (; pfn < max_pfn; pfn++) { /*
From: Hyeonggon Yoo 42.hyeyoo@gmail.com
mainline inclusion from mainline-v6.3-rc1 commit 05a421995503b746095d8ac93fa0ddadfc3c81bc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When allocating a high-order page, separate allocation timestamp is recorded for each sub-page resulting in different timestamp values between them.
This behavior is not consistent with the behavior when recording free timestamp and caused confusion when analyzing memory dumps. Record single timestamp for the entire allocation, aligning with the behavior for free timestamps.
Link: https://lkml.kernel.org/r/20230121165054.520507-1-42.hyeyoo@gmail.com Signed-off-by: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: David Hildenbrand david@redhat.com Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Michal Hocko mhocko@kernel.org Cc: Mike Rapoport rppt@linux.ibm.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index ea1a1e4063b2..119f028ef559 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -166,6 +166,7 @@ static inline void __set_page_owner_handle(struct page *page, { struct page_owner *page_owner; int i; + u64 ts_nsec = local_clock();
for (i = 0; i < (1 << order); i++) { page_owner = get_page_owner(page_ext); @@ -175,7 +176,7 @@ static inline void __set_page_owner_handle(struct page *page, page_owner->last_migrate_reason = -1; page_owner->pid = current->pid; page_owner->tgid = current->tgid; - page_owner->ts_nsec = local_clock(); + page_owner->ts_nsec = ts_nsec; strscpy(page_owner->comm, current->comm, sizeof(page_owner->comm)); __set_bit(PAGE_EXT_OWNER, &page_ext->flags);
From: Charan Teja Kalla quic_charante@quicinc.com
mainline inclusion from mainline-v6.1-rc7 commit ed86b74874f839f0e579bdf92ea0a5aabdfabebb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Fix the below compiler warnings reported with 'make W=1 mm/'. mm/page_ext.c:178: warning: Function parameter or member 'page_ext' not described in 'page_ext_put'.
[quic_pkondeti@quicinc.com: better patch title] Link: https://lkml.kernel.org/r/1667884582-2465-1-git-send-email-quic_charante@qui... Fixes: b1d5488a252dc9 ("mm: fix use-after free of page_ext after race with memory-offline") Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Reported-by: Vlastimil Babka vbabka@suse.cz Tested-by: Vlastimil Babka vbabka@suse.cz Cc: Pavan Kondeti quic_pkondeti@quicinc.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_ext.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_ext.c b/mm/page_ext.c index c40d3f7456df..d521c93b3ed6 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -154,7 +154,7 @@ struct page_ext *page_ext_get(struct page *page)
/** * page_ext_put() - Working with page extended information is done. - * @page_ext - Page extended information received from page_ext_get(). + * @page_ext: Page extended information received from page_ext_get(). * * The page extended information of the page may not be valid after this * function is called.
From: Tang Bin tangbin@cmss.chinamobile.com
mainline inclusion from mainline-v5.14-rc1 commit 85f29cd6a12d430706c39247e7d0207590f581df category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Link: https://lkml.kernel.org/r/20210506131402.10416-1-tangbin@cmss.chinamobile.co... Signed-off-by: Zhang Shengju zhangshengju@cmss.chinamobile.com Signed-off-by: Tang Bin tangbin@cmss.chinamobile.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 85eb65ea16d3..0e75f22c9475 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -132,6 +132,10 @@ int main(int argc, char **argv) qsort(list, list_size, sizeof(list[0]), compare_txt);
list2 = malloc(sizeof(*list) * list_size); + if (!list2) { + printf("Out of memory\n"); + exit(1); + }
printf("culling\n");
From: Zhenliang Wei weizhenliang@huawei.com
mainline inclusion from mainline-v5.16-rc1 commit f7df2b1cf03af680354bd4300f48f7ea11316ce8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing page owner information, we may be more concerned about the total memory rather than the times of stack appears. Therefore, the following adjustments are made:
1. Added the statistics on the total number of pages.
2. Added the optional parameter "-m" to configure the program to sort by memory (total pages).
The general output of page_owner is as follows:
Page allocated via order XXX, ... PFN XXX ... // Detailed stack
Page allocated via order XXX, ... PFN XXX ... // Detailed stack
The original page_owner_sort ignores PFN rows, puts the remaining rows in buf, counts the times of buf, and finally sorts them according to the times. General output:
XXX times: Page allocated via order XXX, ... // Detailed stack
Now, we use regexp to extract the page order value from the buf, and count the total pages for the buf. General output:
XXX times, XXX pages: Page allocated via order XXX, ... // Detailed stack
By default, it is still sorted by the times of buf; If you want to sort by the pages nums of buf, use the new -m parameter.
Link: https://lkml.kernel.org/r/1631678242-41033-1-git-send-email-weizhenliang@hua... Signed-off-by: Zhenliang Wei weizhenliang@huawei.com Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Zhang Shengju zhangshengju@cmss.chinamobile.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Xiaoming Ni nixiaoming@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 23 +++++++- tools/vm/page_owner_sort.c | 94 +++++++++++++++++++++++++++++---- 2 files changed, 107 insertions(+), 10 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 2175465c9bf2..9837fc8147dd 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -85,5 +85,26 @@ Usage cat /sys/kernel/debug/page_owner > page_owner_full.txt ./page_owner_sort page_owner_full.txt sorted_page_owner.txt
+ The general output of ``page_owner_full.txt`` is as follows: + + Page allocated via order XXX, ... + PFN XXX ... + // Detailed stack + + Page allocated via order XXX, ... + PFN XXX ... + // Detailed stack + + The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows + in buf, uses regexp to extract the page order value, counts the times + and pages of buf, and finally sorts them according to the times. + See the result about who allocated each page - in the ``sorted_page_owner.txt``. + in the ``sorted_page_owner.txt``. General output: + + XXX times, XXX pages: + Page allocated via order XXX, ... + // Detailed stack + + By default, ``page_owner_sort`` is sorted according to the times of buf. + If you want to sort by the pages nums of buf, use the ``-m`` parameter. diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 0e75f22c9475..9ebb84a9c731 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -5,6 +5,8 @@ * Example use: * cat /sys/kernel/debug/page_owner > page_owner_full.txt * ./page_owner_sort page_owner_full.txt sorted_page_owner.txt + * Or sort by total memory: + * ./page_owner_sort -m page_owner_full.txt sorted_page_owner.txt * * See Documentation/vm/page_owner.rst */ @@ -16,14 +18,18 @@ #include <fcntl.h> #include <unistd.h> #include <string.h> +#include <regex.h> +#include <errno.h>
struct block_list { char *txt; int len; int num; + int page_num; };
- +static int sort_by_memory; +static regex_t order_pattern; static struct block_list *list; static int list_size; static int max_size; @@ -59,12 +65,50 @@ static int compare_num(const void *p1, const void *p2) return l2->num - l1->num; }
+static int compare_page_num(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return l2->page_num - l1->page_num; +} + +static int get_page_num(char *buf) +{ + int err, val_len, order_val; + char order_str[4] = {0}; + char *endptr; + regmatch_t pmatch[2]; + + err = regexec(&order_pattern, buf, 2, pmatch, REG_NOTBOL); + if (err != 0 || pmatch[1].rm_so == -1) { + printf("no order pattern in %s\n", buf); + return 0; + } + val_len = pmatch[1].rm_eo - pmatch[1].rm_so; + if (val_len > 2) /* max_order should not exceed 2 digits */ + goto wrong_order; + + memcpy(order_str, buf + pmatch[1].rm_so, val_len); + + errno = 0; + order_val = strtol(order_str, &endptr, 10); + if (errno != 0 || endptr == order_str || *endptr != '\0') + goto wrong_order; + + return 1 << order_val; + +wrong_order: + printf("wrong order in follow buf:\n%s\n", buf); + return 0; +} + static void add_list(char *buf, int len) { if (list_size != 0 && len == list[list_size-1].len && memcmp(buf, list[list_size-1].txt, len) == 0) { list[list_size-1].num++; + list[list_size-1].page_num += get_page_num(buf); return; } if (list_size == max_size) { @@ -74,6 +118,7 @@ static void add_list(char *buf, int len) list[list_size].txt = malloc(len+1); list[list_size].len = len; list[list_size].num = 1; + list[list_size].page_num = get_page_num(buf); memcpy(list[list_size].txt, buf, len); list[list_size].txt[len] = 0; list_size++; @@ -85,6 +130,13 @@ static void add_list(char *buf, int len)
#define BUF_SIZE (128 * 1024)
+static void usage(void) +{ + printf("Usage: ./page_owner_sort [-m] <input> <output>\n" + "-m Sort by total memory. If this option is unset, sort by times\n" + ); +} + int main(int argc, char **argv) { FILE *fin, *fout; @@ -92,21 +144,39 @@ int main(int argc, char **argv) int ret, i, count; struct block_list *list2; struct stat st; + int err; + int opt;
- if (argc < 3) { - printf("Usage: ./program <input> <output>\n"); - perror("open: "); + while ((opt = getopt(argc, argv, "m")) != -1) + switch (opt) { + case 'm': + sort_by_memory = 1; + break; + default: + usage(); + exit(1); + } + + if (optind >= (argc - 1)) { + usage(); exit(1); }
- fin = fopen(argv[1], "r"); - fout = fopen(argv[2], "w"); + fin = fopen(argv[optind], "r"); + fout = fopen(argv[optind + 1], "w"); if (!fin || !fout) { - printf("Usage: ./program <input> <output>\n"); + usage(); perror("open: "); exit(1); }
+ err = regcomp(&order_pattern, "order\s*([0-9]*),", REG_EXTENDED|REG_NEWLINE); + if (err != 0 || order_pattern.re_nsub != 1) { + printf("%s: Invalid pattern 'order\s*([0-9]*),' code %d\n", + argv[0], err); + exit(1); + } + fstat(fileno(fin), &st); max_size = st.st_size / 100; /* hack ... */
@@ -145,13 +215,19 @@ int main(int argc, char **argv) list2[count++] = list[i]; } else { list2[count-1].num += list[i].num; + list2[count-1].page_num += list[i].page_num; } }
- qsort(list2, count, sizeof(list[0]), compare_num); + if (sort_by_memory) + qsort(list2, count, sizeof(list[0]), compare_page_num); + else + qsort(list2, count, sizeof(list[0]), compare_num);
for (i = 0; i < count; i++) - fprintf(fout, "%d times:\n%s\n", list2[i].num, list2[i].txt); + fprintf(fout, "%d times, %d pages:\n%s\n", + list2[i].num, list2[i].page_num, list2[i].txt);
+ regfree(&order_pattern); return 0; }
From: Sean Anderson seanga2@gmail.com
mainline inclusion from mainline-v5.18-rc1 commit ba5a396be51cd232d6647f70f7d792c6dcc63223 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The contents of page_owner have changed to include more information than the stack trace. On a modern kernel, the blocks look like
Page allocated via order 0, mask 0x0(), pid 1, ts 165564237 ns, free_ts 0 ns register_early_stack+0x4b/0x90 init_page_owner+0x39/0x250 kernel_init_freeable+0x11e/0x242 kernel_init+0x16/0x130
Sorting by the contents of .txt will result in almost no repeated pages, as the pid, ts, and free_ts will almost never be the same. Instead, sort by the contents of the stack trace, which we assume to be whatever is after the first line.
[seanga2@gmail.com: fix NULL-pointer dereference when comparing stack traces] Link: https://lkml.kernel.org/r/20211125162653.1855958-1-seanga2@gmail.com
Link: https://lkml.kernel.org/r/20211124193709.1805776-1-seanga2@gmail.com Signed-off-by: Sean Anderson seanga2@gmail.com Cc: Changhee Han ch0.han@lge.com Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Zhang Shengju zhangshengju@cmss.chinamobile.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 9ebb84a9c731..5582d8454d3b 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -23,6 +23,7 @@
struct block_list { char *txt; + char *stacktrace; int len; int num; int page_num; @@ -51,11 +52,11 @@ int read_block(char *buf, int buf_size, FILE *fin) return -1; /* EOF or no space left in buf. */ }
-static int compare_txt(const void *p1, const void *p2) +static int compare_stacktrace(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2;
- return strcmp(l1->txt, l2->txt); + return strcmp(l1->stacktrace, l2->stacktrace); }
static int compare_num(const void *p1, const void *p2) @@ -121,6 +122,7 @@ static void add_list(char *buf, int len) list[list_size].page_num = get_page_num(buf); memcpy(list[list_size].txt, buf, len); list[list_size].txt[len] = 0; + list[list_size].stacktrace = strchr(list[list_size].txt, '\n') ?: ""; list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size); @@ -199,7 +201,7 @@ int main(int argc, char **argv)
printf("sorting ....\n");
- qsort(list, list_size, sizeof(list[0]), compare_txt); + qsort(list, list_size, sizeof(list[0]), compare_stacktrace);
list2 = malloc(sizeof(*list) * list_size); if (!list2) { @@ -211,7 +213,7 @@ int main(int argc, char **argv)
for (i = count = 0; i < list_size; i++) { if (count == 0 || - strcmp(list2[count-1].txt, list[i].txt) != 0) { + strcmp(list2[count-1].stacktrace, list[i].stacktrace) != 0) { list2[count++] = list[i]; } else { list2[count-1].num += list[i].num;
From: Sean Anderson seanga2@gmail.com
mainline inclusion from mainline-v5.18-rc1 commit 82f5ebc2beb3c803bf17bead4e38c1d5bf2c8d78 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This adds the ability to sort by stacktraces. This is helpful when comparing multiple dumps of page_owner taken at different times, since blocks will not be reordered if they were allocated/free'd.
Link: https://lkml.kernel.org/r/20211124193709.1805776-2-seanga2@gmail.com Signed-off-by: Sean Anderson seanga2@gmail.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Changhee Han ch0.han@lge.com Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Zhang Shengju zhangshengju@cmss.chinamobile.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 5582d8454d3b..1b2acf02d3cd 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -29,7 +29,6 @@ struct block_list { int page_num; };
-static int sort_by_memory; static regex_t order_pattern; static struct block_list *list; static int list_size; @@ -134,13 +133,16 @@ static void add_list(char *buf, int len)
static void usage(void) { - printf("Usage: ./page_owner_sort [-m] <input> <output>\n" - "-m Sort by total memory. If this option is unset, sort by times\n" + printf("Usage: ./page_owner_sort [OPTIONS] <input> <output>\n" + "-m Sort by total memory.\n" + "-s Sort by the stack trace.\n" + "-t Sort by times (default).\n" ); }
int main(int argc, char **argv) { + int (*cmp)(const void *, const void *) = compare_num; FILE *fin, *fout; char *buf; int ret, i, count; @@ -149,10 +151,16 @@ int main(int argc, char **argv) int err; int opt;
- while ((opt = getopt(argc, argv, "m")) != -1) + while ((opt = getopt(argc, argv, "mst")) != -1) switch (opt) { case 'm': - sort_by_memory = 1; + cmp = compare_page_num; + break; + case 's': + cmp = compare_stacktrace; + break; + case 't': + cmp = compare_num; break; default: usage(); @@ -221,10 +229,7 @@ int main(int argc, char **argv) } }
- if (sort_by_memory) - qsort(list2, count, sizeof(list[0]), compare_page_num); - else - qsort(list2, count, sizeof(list[0]), compare_num); + qsort(list2, count, sizeof(list[0]), cmp);
for (i = 0; i < count; i++) fprintf(fout, "%d times, %d pages:\n%s\n",
From: Yinan Zhang zhangyinan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit cd75ea0e32620c622aeaef531ef7b9f59c67f7a6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Culling by comparing stacktrace would casue loss of some information. For example, if there exists 2 blocks which have the same stacktrace and the different head info
Page allocated via order 0, mask 0x108c48(...), pid 73696, ts 1578829190639010 ns, free_ts 1576583851324450 ns prep_new_page+0x80/0xb8 get_page_from_freelist+0x924/0xee8 __alloc_pages+0x138/0xc18 alloc_pages+0x80/0xf0 __page_cache_alloc+0x90/0xc8
Page allocated via order 0, mask 0x108c48(...), pid 61806, ts 1354113726046100 ns, free_ts 1354104926841400 ns prep_new_page+0x80/0xb8 get_page_from_freelist+0x924/0xee8 __alloc_pages+0x138/0xc18 alloc_pages+0x80/0xf0 __page_cache_alloc+0x90/0xc8
After culling, it would be like this
2 times, 2 pages: Page allocated via order 0, mask 0x108c48(...), pid 73696, ts 1578829190639010 ns, free_ts 1576583851324450 ns prep_new_page+0x80/0xb8 get_page_from_freelist+0x924/0xee8 __alloc_pages+0x138/0xc18 alloc_pages+0x80/0xf0 __page_cache_alloc+0x90/0xc8
The info of second block missed. So, add -c to turn on culling by stacktrace. By default, it will cull by txt.
Link: https://lkml.kernel.org/r/20211129145658.2491-1-zhangyinan2019@email.szu.edu... Signed-off-by: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Changhee Han ch0.han@lge.com Cc: Sean Anderson seanga2@gmail.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Zhang Shengju zhangshengju@cmss.chinamobile.com Cc: Zhenliang Wei weizhenliang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 1b2acf02d3cd..492be7f752c0 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -51,6 +51,13 @@ int read_block(char *buf, int buf_size, FILE *fin) return -1; /* EOF or no space left in buf. */ }
+static int compare_txt(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return strcmp(l1->txt, l2->txt); +} + static int compare_stacktrace(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2; @@ -137,12 +144,14 @@ static void usage(void) "-m Sort by total memory.\n" "-s Sort by the stack trace.\n" "-t Sort by times (default).\n" + "-c cull by comparing stacktrace instead of total block.\n" ); }
int main(int argc, char **argv) { int (*cmp)(const void *, const void *) = compare_num; + int cull_st = 0; FILE *fin, *fout; char *buf; int ret, i, count; @@ -151,7 +160,7 @@ int main(int argc, char **argv) int err; int opt;
- while ((opt = getopt(argc, argv, "mst")) != -1) + while ((opt = getopt(argc, argv, "mstc")) != -1) switch (opt) { case 'm': cmp = compare_page_num; @@ -162,6 +171,9 @@ int main(int argc, char **argv) case 't': cmp = compare_num; break; + case 'c': + cull_st = 1; + break; default: usage(); exit(1); @@ -209,7 +221,10 @@ int main(int argc, char **argv)
printf("sorting ....\n");
- qsort(list, list_size, sizeof(list[0]), compare_stacktrace); + if (cull_st == 1) + qsort(list, list_size, sizeof(list[0]), compare_stacktrace); + else + qsort(list, list_size, sizeof(list[0]), compare_txt);
list2 = malloc(sizeof(*list) * list_size); if (!list2) { @@ -219,9 +234,11 @@ int main(int argc, char **argv)
printf("culling\n");
+ long offset = cull_st ? &list[0].stacktrace - &list[0].txt : 0; + for (i = count = 0; i < list_size; i++) { if (count == 0 || - strcmp(list2[count-1].stacktrace, list[i].stacktrace) != 0) { + strcmp(*(&list2[count-1].txt+offset), *(&list[i].txt+offset)) != 0) { list2[count++] = list[i]; } else { list2[count-1].num += list[i].num;
From: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 8f9c447e2e2b53c2db4fac85fc42ecada8b39e52 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing the page owner information, we expect that the information can be sorted by PID, so that we can quickly combine PID with the program to check the information together.
We also expect that the information can be sorted by time. Time sorting helps to view the running status of the program according to the time interval when the program hangs up.
Finally, we hope to pass the page_ owner_ Sort. C can reduce part of the output and only output the plate information whose memory has not been released, which can make us locate the problem of the program faster. Therefore, the following adjustments have been made:
1. Add the static functions search_pattern and check_regcomp to improve the cleanliness.
2. Add member attributes and their corresponding sorting methods. In terms of comparison time, int will overflow because the data of ull is too large, so the ternary operator is used
3. Add the -f parameter to filter out the information of blocks whose memory has not been released
Link: https://lkml.kernel.org/r/20211206165653.5093-1-zhaochongxi2019@email.szu.ed... Signed-off-by: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Reviewed-by: Sean Anderson seanga2@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 177 +++++++++++++++++++++++++++++++------ 1 file changed, 148 insertions(+), 29 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 492be7f752c0..c9fedc1806d5 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -20,6 +20,7 @@ #include <string.h> #include <regex.h> #include <errno.h> +#include <linux/types.h>
struct block_list { char *txt; @@ -27,9 +28,15 @@ struct block_list { int len; int num; int page_num; + pid_t pid; + __u64 ts_nsec; + __u64 free_ts_nsec; };
static regex_t order_pattern; +static regex_t pid_pattern; +static regex_t ts_nsec_pattern; +static regex_t free_ts_nsec_pattern; static struct block_list *list; static int list_size; static int max_size; @@ -79,34 +86,124 @@ static int compare_page_num(const void *p1, const void *p2) return l2->page_num - l1->page_num; }
-static int get_page_num(char *buf) +static int compare_pid(const void *p1, const void *p2) { - int err, val_len, order_val; - char order_str[4] = {0}; - char *endptr; + const struct block_list *l1 = p1, *l2 = p2; + + return l1->pid - l2->pid; +} + +static int compare_ts(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return l1->ts_nsec < l2->ts_nsec ? -1 : 1; +} + +static int compare_free_ts(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return l1->free_ts_nsec < l2->free_ts_nsec ? -1 : 1; +} + +static int search_pattern(regex_t *pattern, char *pattern_str, char *buf) +{ + int err, val_len; regmatch_t pmatch[2];
- err = regexec(&order_pattern, buf, 2, pmatch, REG_NOTBOL); + err = regexec(pattern, buf, 2, pmatch, REG_NOTBOL); if (err != 0 || pmatch[1].rm_so == -1) { - printf("no order pattern in %s\n", buf); - return 0; + printf("no matching pattern in %s\n", buf); + return -1; } val_len = pmatch[1].rm_eo - pmatch[1].rm_so; - if (val_len > 2) /* max_order should not exceed 2 digits */ - goto wrong_order;
- memcpy(order_str, buf + pmatch[1].rm_so, val_len); + memcpy(pattern_str, buf + pmatch[1].rm_so, val_len); + + return 0; +} + +static void check_regcomp(regex_t *pattern, const char *regex) +{ + int err; + + err = regcomp(pattern, regex, REG_EXTENDED | REG_NEWLINE); + if (err != 0 || pattern->re_nsub != 1) { + printf("Invalid pattern %s code %d\n", regex, err); + exit(1); + } +} + +# define FIELD_BUFF 25 + +static int get_page_num(char *buf) +{ + int order_val; + char order_str[FIELD_BUFF] = {0}; + char *endptr;
+ search_pattern(&order_pattern, order_str, buf); errno = 0; order_val = strtol(order_str, &endptr, 10); - if (errno != 0 || endptr == order_str || *endptr != '\0') - goto wrong_order; + if (order_val > 64 || errno != 0 || endptr == order_str || *endptr != '\0') { + printf("wrong order in follow buf:\n%s\n", buf); + return 0; + }
return 1 << order_val; +}
-wrong_order: - printf("wrong order in follow buf:\n%s\n", buf); - return 0; +static pid_t get_pid(char *buf) +{ + pid_t pid; + char pid_str[FIELD_BUFF] = {0}; + char *endptr; + + search_pattern(&pid_pattern, pid_str, buf); + errno = 0; + pid = strtol(pid_str, &endptr, 10); + if (errno != 0 || endptr == pid_str || *endptr != '\0') { + printf("wrong/invalid pid in follow buf:\n%s\n", buf); + return -1; + } + + return pid; + +} + +static __u64 get_ts_nsec(char *buf) +{ + __u64 ts_nsec; + char ts_nsec_str[FIELD_BUFF] = {0}; + char *endptr; + + search_pattern(&ts_nsec_pattern, ts_nsec_str, buf); + errno = 0; + ts_nsec = strtoull(ts_nsec_str, &endptr, 10); + if (errno != 0 || endptr == ts_nsec_str || *endptr != '\0') { + printf("wrong ts_nsec in follow buf:\n%s\n", buf); + return -1; + } + + return ts_nsec; +} + +static __u64 get_free_ts_nsec(char *buf) +{ + __u64 free_ts_nsec; + char free_ts_nsec_str[FIELD_BUFF] = {0}; + char *endptr; + + search_pattern(&free_ts_nsec_pattern, free_ts_nsec_str, buf); + errno = 0; + free_ts_nsec = strtoull(free_ts_nsec_str, &endptr, 10); + if (errno != 0 || endptr == free_ts_nsec_str || *endptr != '\0') { + printf("wrong free_ts_nsec in follow buf:\n%s\n", buf); + return -1; + } + + return free_ts_nsec; }
static void add_list(char *buf, int len) @@ -129,6 +226,11 @@ static void add_list(char *buf, int len) memcpy(list[list_size].txt, buf, len); list[list_size].txt[len] = 0; list[list_size].stacktrace = strchr(list[list_size].txt, '\n') ?: ""; + list[list_size].pid = get_pid(buf); + list[list_size].ts_nsec = get_ts_nsec(buf); + list[list_size].free_ts_nsec = get_free_ts_nsec(buf); + memcpy(list[list_size].txt, buf, len); + list[list_size].txt[len] = 0; list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size); @@ -144,6 +246,9 @@ static void usage(void) "-m Sort by total memory.\n" "-s Sort by the stack trace.\n" "-t Sort by times (default).\n" + "-p Sort by pid.\n" + "-a Sort by memory allocate time.\n" + "-r Sort by memory release time.\n" "-c cull by comparing stacktrace instead of total block.\n" ); } @@ -152,28 +257,40 @@ int main(int argc, char **argv) { int (*cmp)(const void *, const void *) = compare_num; int cull_st = 0; + int filter = 0; FILE *fin, *fout; char *buf; int ret, i, count; struct block_list *list2; struct stat st; - int err; int opt;
- while ((opt = getopt(argc, argv, "mstc")) != -1) + while ((opt = getopt(argc, argv, "acfmprst")) != -1) switch (opt) { + case 'a': + cmp = compare_ts; + break; + case 'c': + cull_st = 1; + break; + case 'f': + filter = 1; + break; case 'm': cmp = compare_page_num; break; + case 'p': + cmp = compare_pid; + break; + case 'r': + cmp = compare_free_ts; + break; case 's': cmp = compare_stacktrace; break; case 't': cmp = compare_num; break; - case 'c': - cull_st = 1; - break; default: usage(); exit(1); @@ -192,13 +309,10 @@ int main(int argc, char **argv) exit(1); }
- err = regcomp(&order_pattern, "order\s*([0-9]*),", REG_EXTENDED|REG_NEWLINE); - if (err != 0 || order_pattern.re_nsub != 1) { - printf("%s: Invalid pattern 'order\s*([0-9]*),' code %d\n", - argv[0], err); - exit(1); - } - + check_regcomp(&order_pattern, "order\s*([0-9]*),"); + check_regcomp(&pid_pattern, "pid\s*([0-9]*),"); + check_regcomp(&ts_nsec_pattern, "ts\s*([0-9]*)\s*ns,"); + check_regcomp(&free_ts_nsec_pattern, "free_ts\s*([0-9]*)\s*ns"); fstat(fileno(fin), &st); max_size = st.st_size / 100; /* hack ... */
@@ -248,10 +362,15 @@ int main(int argc, char **argv)
qsort(list2, count, sizeof(list[0]), cmp);
- for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { + if (filter == 1 && list2[i].free_ts_nsec != 0) + continue; fprintf(fout, "%d times, %d pages:\n%s\n", list2[i].num, list2[i].page_num, list2[i].txt); - + } regfree(&order_pattern); + regfree(&pid_pattern); + regfree(&ts_nsec_pattern); + regfree(&free_ts_nsec_pattern); return 0; }
From: Shenghong Han hanshenghong2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit e7a3f6776905b4e0b61692add3d0808315379c89 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
1) There is an unused variable. It's better to delete it. 2) One case is missing in the usage().
Link: https://lkml.kernel.org/r/20211213164518.2461-1-hanshenghong2019@email.szu.e... Signed-off-by: Shenghong Han hanshenghong2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index c9fedc1806d5..284a5070402c 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -41,8 +41,6 @@ static struct block_list *list; static int list_size; static int max_size;
-struct block_list *block_head; - int read_block(char *buf, int buf_size, FILE *fin) { char *curr = buf, *const buf_end = buf + buf_size; @@ -249,7 +247,8 @@ static void usage(void) "-p Sort by pid.\n" "-a Sort by memory allocate time.\n" "-r Sort by memory release time.\n" - "-c cull by comparing stacktrace instead of total block.\n" + "-c Cull by comparing stacktrace instead of total block.\n" + "-f Filter out the information of blocks whose memory has not been released.\n" ); }
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 41ed64347b5db8f6c9359d4f87f768db1a83bd79 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
I noticed that there is two invalid lines of duplicate code. It's better to delete it.
Link: https://lkml.kernel.org/r/20211213095743.3630-1-caoyixuan2019@email.szu.edu.... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Mark Brown broonie@kernel.org Cc: Sean Anderson seanga2@gmail.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 284a5070402c..c8ec2d6b314d 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -227,8 +227,6 @@ static void add_list(char *buf, int len) list[list_size].pid = get_pid(buf); list[list_size].ts_nsec = get_ts_nsec(buf); list[list_size].free_ts_nsec = get_free_ts_nsec(buf); - memcpy(list[list_size].txt, buf, len); - list[list_size].txt[len] = 0; list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size);
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 49e495a015e9cc127728b0f353a15d16da138fb8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
I noticed a discrepancy between the usage method and the code logic.
If we enable the -f option, it should be "Filter out the information of blocks whose memory has been released".
Link: https://lkml.kernel.org/r/20220219143106.2805-1-caoyixuan2019@email.szu.edu.... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Sean Anderson seanga2@gmail.com Cc: Muchun Song songmuchun@bytedance.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index c8ec2d6b314d..79d69c3b84ed 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -246,7 +246,7 @@ static void usage(void) "-a Sort by memory allocate time.\n" "-r Sort by memory release time.\n" "-c Cull by comparing stacktrace instead of total block.\n" - "-f Filter out the information of blocks whose memory has not been released.\n" + "-f Filter out the information of blocks whose memory has been released.\n" ); }
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 56465a38305f22bca3469c2738d7320a0c333e72 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add a security check after using malloc() to allocate memory.
Link: https://lkml.kernel.org/r/20220301151438.166118-2-yejiajian2018@email.szu.ed... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Zhenliang Wei weizhenliang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 79d69c3b84ed..69fb6ca7c0b7 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -217,7 +217,13 @@ static void add_list(char *buf, int len) printf("max_size too small??\n"); exit(1); } + list[list_size].txt = malloc(len+1); + if (!list[list_size].txt) { + printf("Out of memory\n"); + exit(1); + } + list[list_size].len = len; list[list_size].num = 1; list[list_size].page_num = get_page_num(buf);
From: Shenghong Han hanshenghong2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 57f2b54a937987847e666aaf56d207aa457adee6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Update the documentation of ``page_owner``.
[akpm@linux-foundation.org: small grammatical tweaks]
Link: https://lkml.kernel.org/r/20211214134736.2569-1-hanshenghong2019@email.szu.e... Signed-off-by: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Jonathan Corbet corbet@lwn.net Cc: Vlastimil Babka vbabka@suse.cz Cc: Georgi Djakov georgi.djakov@linaro.org Cc: Liam Mark lmark@codeaurora.org Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Zhang Shengju zhangshengju@cmss.chinamobile.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Xiaoming Ni nixiaoming@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 9837fc8147dd..602cf6eefcb5 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -97,7 +97,7 @@ Usage
The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows in buf, uses regexp to extract the page order value, counts the times - and pages of buf, and finally sorts them according to the times. + and pages of buf, and finally sorts them according to the parameter(s).
See the result about who allocated each page in the ``sorted_page_owner.txt``. General output: @@ -107,4 +107,23 @@ Usage // Detailed stack
By default, ``page_owner_sort`` is sorted according to the times of buf. - If you want to sort by the pages nums of buf, use the ``-m`` parameter. + If you want to sort by the page nums of buf, use the ``-m`` parameter. + The detailed parameters are: + + fundamental function: + + Sort: + -a Sort by memory allocation time. + -m Sort by total memory. + -p Sort by pid. + -r Sort by memory release time. + -s Sort by stack trace. + -t Sort by times (default). + + additional function: + + Cull: + -c Cull by comparing stacktrace instead of total block. + + Filter: + -f Filter out the information of blocks whose memory has not been released.
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit cf3c2c8678a0b21052d00b64d7a5903f3b1d1197 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When the "page owner" information is read, the information sorted by TGID is expected.
As a result, the following adjustments have been made:
1. Add a new -P option to sort the information of blocks by TGID in ascending order.
2. Adjust the order of member variables in block_list strust to avoid one 4 byte hole.
3. Add -P option explanation in the document.
Link: https://lkml.kernel.org/r/20220301151438.166118-3-yejiajian2018@email.szu.ed... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 1 + tools/vm/page_owner_sort.c | 40 ++++++++++++++++++++++++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 602cf6eefcb5..b83332fd5eed 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -116,6 +116,7 @@ Usage -a Sort by memory allocation time. -m Sort by total memory. -p Sort by pid. + -P Sort by tgid. -r Sort by memory release time. -s Sort by stack trace. -t Sort by times (default). diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 69fb6ca7c0b7..d166f2f900eb 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -25,16 +25,18 @@ struct block_list { char *txt; char *stacktrace; + __u64 ts_nsec; + __u64 free_ts_nsec; int len; int num; int page_num; pid_t pid; - __u64 ts_nsec; - __u64 free_ts_nsec; + pid_t tgid; };
static regex_t order_pattern; static regex_t pid_pattern; +static regex_t tgid_pattern; static regex_t ts_nsec_pattern; static regex_t free_ts_nsec_pattern; static struct block_list *list; @@ -91,6 +93,13 @@ static int compare_pid(const void *p1, const void *p2) return l1->pid - l2->pid; }
+static int compare_tgid(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return l1->tgid - l2->tgid; +} + static int compare_ts(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2; @@ -170,6 +179,24 @@ static pid_t get_pid(char *buf)
}
+static pid_t get_tgid(char *buf) +{ + pid_t tgid; + char tgid_str[FIELD_BUFF] = {0}; + char *endptr; + + search_pattern(&tgid_pattern, tgid_str, buf); + errno = 0; + tgid = strtol(tgid_str, &endptr, 10); + if (errno != 0 || endptr == tgid_str || *endptr != '\0') { + printf("wrong/invalid tgid in follow buf:\n%s\n", buf); + return -1; + } + + return tgid; + +} + static __u64 get_ts_nsec(char *buf) { __u64 ts_nsec; @@ -231,6 +258,7 @@ static void add_list(char *buf, int len) list[list_size].txt[len] = 0; list[list_size].stacktrace = strchr(list[list_size].txt, '\n') ?: ""; list[list_size].pid = get_pid(buf); + list[list_size].tgid = get_tgid(buf); list[list_size].ts_nsec = get_ts_nsec(buf); list[list_size].free_ts_nsec = get_free_ts_nsec(buf); list_size++; @@ -249,6 +277,7 @@ static void usage(void) "-s Sort by the stack trace.\n" "-t Sort by times (default).\n" "-p Sort by pid.\n" + "-P Sort by tgid.\n" "-a Sort by memory allocate time.\n" "-r Sort by memory release time.\n" "-c Cull by comparing stacktrace instead of total block.\n" @@ -268,7 +297,7 @@ int main(int argc, char **argv) struct stat st; int opt;
- while ((opt = getopt(argc, argv, "acfmprst")) != -1) + while ((opt = getopt(argc, argv, "acfmprstP")) != -1) switch (opt) { case 'a': cmp = compare_ts; @@ -294,6 +323,9 @@ int main(int argc, char **argv) case 't': cmp = compare_num; break; + case 'P': + cmp = compare_tgid; + break; default: usage(); exit(1); @@ -314,6 +346,7 @@ int main(int argc, char **argv)
check_regcomp(&order_pattern, "order\s*([0-9]*),"); check_regcomp(&pid_pattern, "pid\s*([0-9]*),"); + check_regcomp(&tgid_pattern, "tgid\s*([0-9]*) "); check_regcomp(&ts_nsec_pattern, "ts\s*([0-9]*)\s*ns,"); check_regcomp(&free_ts_nsec_pattern, "free_ts\s*([0-9]*)\s*ns"); fstat(fileno(fin), &st); @@ -373,6 +406,7 @@ int main(int argc, char **argv) } regfree(&order_pattern); regfree(&pid_pattern); + regfree(&tgid_pattern); regfree(&ts_nsec_pattern); regfree(&free_ts_nsec_pattern); return 0;
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 578d8f2761a828f6c3409d0931b036bf3a999246 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The following adjustments are made:
1. Instead of using another array to cull the blocks after sorting, reuse the old array. So there is no need to malloc a new array.
2. When enabling '-f' option to filter out the blocks which have been released, only add those have not been released in the list, rather than add all of blocks in the list and then do the filtering when printing the result.
3. When enabling '-c' option to cull the blocks by comparing stacktrace, print the stacetrace rather than the total block.
Link: https://lkml.kernel.org/r/20220306030640.43054-1-yejiajian2018@email.szu.edu... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: hanshenghong2019@email.szu.edu.cn Cc: Sean Anderson seanga2@gmail.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: zhangyinan2019@email.szu.edu.cn Cc: Zhenliang Wei weizhenliang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index d166f2f900eb..b8d2867b5b18 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -42,6 +42,8 @@ static regex_t free_ts_nsec_pattern; static struct block_list *list; static int list_size; static int max_size; +static int cull_st; +static int filter;
int read_block(char *buf, int buf_size, FILE *fin) { @@ -245,6 +247,9 @@ static void add_list(char *buf, int len) exit(1); }
+ list[list_size].free_ts_nsec = get_free_ts_nsec(buf); + if (filter == 1 && list[list_size].free_ts_nsec != 0) + return; list[list_size].txt = malloc(len+1); if (!list[list_size].txt) { printf("Out of memory\n"); @@ -257,10 +262,11 @@ static void add_list(char *buf, int len) memcpy(list[list_size].txt, buf, len); list[list_size].txt[len] = 0; list[list_size].stacktrace = strchr(list[list_size].txt, '\n') ?: ""; + if (*list[list_size].stacktrace == '\n') + list[list_size].stacktrace++; list[list_size].pid = get_pid(buf); list[list_size].tgid = get_tgid(buf); list[list_size].ts_nsec = get_ts_nsec(buf); - list[list_size].free_ts_nsec = get_free_ts_nsec(buf); list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size); @@ -288,12 +294,9 @@ static void usage(void) int main(int argc, char **argv) { int (*cmp)(const void *, const void *) = compare_num; - int cull_st = 0; - int filter = 0; FILE *fin, *fout; char *buf; int ret, i, count; - struct block_list *list2; struct stat st; int opt;
@@ -376,11 +379,7 @@ int main(int argc, char **argv) else qsort(list, list_size, sizeof(list[0]), compare_txt);
- list2 = malloc(sizeof(*list) * list_size); - if (!list2) { - printf("Out of memory\n"); - exit(1); - } +
printf("culling\n");
@@ -388,21 +387,23 @@ int main(int argc, char **argv)
for (i = count = 0; i < list_size; i++) { if (count == 0 || - strcmp(*(&list2[count-1].txt+offset), *(&list[i].txt+offset)) != 0) { - list2[count++] = list[i]; + strcmp(*(&list[count-1].txt+offset), *(&list[i].txt+offset)) != 0) { + list[count++] = list[i]; } else { - list2[count-1].num += list[i].num; - list2[count-1].page_num += list[i].page_num; + list[count-1].num += list[i].num; + list[count-1].page_num += list[i].page_num; } }
- qsort(list2, count, sizeof(list[0]), cmp); + qsort(list, count, sizeof(list[0]), cmp);
for (i = 0; i < count; i++) { - if (filter == 1 && list2[i].free_ts_nsec != 0) - continue; - fprintf(fout, "%d times, %d pages:\n%s\n", - list2[i].num, list2[i].page_num, list2[i].txt); + if (cull_st == 0) + fprintf(fout, "%d times, %d pages:\n%s\n", + list[i].num, list[i].page_num, list[i].txt); + else + fprintf(fout, "%d times, %d pages:\n%s\n", + list[i].num, list[i].page_num, list[i].stacktrace); } regfree(&order_pattern); regfree(&pid_pattern);
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 194d52d771b8f7cf5bcf0f81f87dd76e492c355c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing page owner information, we may also need to the block to be sorted by task command name. Therefore, the following adjustments are made:
1. Add a member variable to record task command name of block.
2. Add a new -n option to sort the information of blocks by task command name.
3. Add -n option explanation in the document.
Link: https://lkml.kernel.org/r/20220306030640.43054-2-yejiajian2018@email.szu.edu... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Sean Anderson seanga2@gmail.com Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Zhenliang Wei weizhenliang@huawei.com Cc: zhaochongxi2019@email.szu.edu.cn Cc: hanshenghong2019@email.szu.edu.cn Cc: zhangyinan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 1 + tools/vm/page_owner_sort.c | 35 ++++++++++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index b83332fd5eed..73a10b2f368c 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -117,6 +117,7 @@ Usage -m Sort by total memory. -p Sort by pid. -P Sort by tgid. + -n Sort by task command name. -r Sort by memory release time. -s Sort by stack trace. -t Sort by times (default). diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index b8d2867b5b18..e508abd8f665 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -24,6 +24,7 @@
struct block_list { char *txt; + char *comm; // task command name char *stacktrace; __u64 ts_nsec; __u64 free_ts_nsec; @@ -37,6 +38,7 @@ struct block_list { static regex_t order_pattern; static regex_t pid_pattern; static regex_t tgid_pattern; +static regex_t comm_pattern; static regex_t ts_nsec_pattern; static regex_t free_ts_nsec_pattern; static struct block_list *list; @@ -102,6 +104,13 @@ static int compare_tgid(const void *p1, const void *p2) return l1->tgid - l2->tgid; }
+static int compare_comm(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return strcmp(l1->comm, l2->comm); +} + static int compare_ts(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2; @@ -145,6 +154,7 @@ static void check_regcomp(regex_t *pattern, const char *regex) }
# define FIELD_BUFF 25 +# define TASK_COMM_LEN 16
static int get_page_num(char *buf) { @@ -233,6 +243,22 @@ static __u64 get_free_ts_nsec(char *buf) return free_ts_nsec; }
+static char *get_comm(char *buf) +{ + char *comm_str = malloc(TASK_COMM_LEN); + + memset(comm_str, 0, TASK_COMM_LEN); + + search_pattern(&comm_pattern, comm_str, buf); + errno = 0; + if (errno != 0) { + printf("wrong comm in follow buf:\n%s\n", buf); + return NULL; + } + + return comm_str; +} + static void add_list(char *buf, int len) { if (list_size != 0 && @@ -266,6 +292,7 @@ static void add_list(char *buf, int len) list[list_size].stacktrace++; list[list_size].pid = get_pid(buf); list[list_size].tgid = get_tgid(buf); + list[list_size].comm = get_comm(buf); list[list_size].ts_nsec = get_ts_nsec(buf); list_size++; if (list_size % 1000 == 0) { @@ -284,6 +311,7 @@ static void usage(void) "-t Sort by times (default).\n" "-p Sort by pid.\n" "-P Sort by tgid.\n" + "-n Sort by task command name.\n" "-a Sort by memory allocate time.\n" "-r Sort by memory release time.\n" "-c Cull by comparing stacktrace instead of total block.\n" @@ -300,7 +328,7 @@ int main(int argc, char **argv) struct stat st; int opt;
- while ((opt = getopt(argc, argv, "acfmprstP")) != -1) + while ((opt = getopt(argc, argv, "acfmnprstP")) != -1) switch (opt) { case 'a': cmp = compare_ts; @@ -329,6 +357,9 @@ int main(int argc, char **argv) case 'P': cmp = compare_tgid; break; + case 'n': + cmp = compare_comm; + break; default: usage(); exit(1); @@ -350,6 +381,7 @@ int main(int argc, char **argv) check_regcomp(&order_pattern, "order\s*([0-9]*),"); check_regcomp(&pid_pattern, "pid\s*([0-9]*),"); check_regcomp(&tgid_pattern, "tgid\s*([0-9]*) "); + check_regcomp(&comm_pattern, "tgid\s*[0-9]*\s*\((.*)\),\s*ts"); check_regcomp(&ts_nsec_pattern, "ts\s*([0-9]*)\s*ns,"); check_regcomp(&free_ts_nsec_pattern, "free_ts\s*([0-9]*)\s*ns"); fstat(fileno(fin), &st); @@ -408,6 +440,7 @@ int main(int argc, char **argv) regfree(&order_pattern); regfree(&pid_pattern); regfree(&tgid_pattern); + regfree(&comm_pattern); regfree(&ts_nsec_pattern); regfree(&free_ts_nsec_pattern); return 0;
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 8ea8613a616aff184df9e3ea2d3ec39d90832867 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing page owner information, we may also need to select the blocks by PID, TGID or task command name, which helps to get more accurate page allocation information as needed.
Therefore, following adjustments are made:
1. Add three new options, including --pid, --tgid and --name, to support the selection of information blocks by a specific pid, tgid and task command name. In addtion, multiple options are allowed to be used at the same time.
./page_owner_sort [input] [output] --pid <PID> ./page_owner_sort [input] [output] --tgid <TGID> ./page_owner_sort [input] [output] --name <TASK_COMMAND_NAME>
Assuming a scenario when a multi-threaded program, ./demo (PID = 5280), is running, and ./demo creates a child process (PID = 5281).
$ps PID TTY TIME CMD 5215 pts/0 00:00:00 bash 5280 pts/0 00:00:00 ./demo 5281 pts/0 00:00:00 ./demo 5282 pts/0 00:00:00 ps
It would be better to filter out the records with tgid=5280 and the task name "demo" when debugging the parent process, and the specific usage is
./page_owner_sort [input] [output] --tgid 5280 --name demo
2. Add explanations of three new options, including --pid, --tgid and --name, to the document.
This work is coauthored by Shenghong Han hanshenghong2019@email.szu.edu.cn, Yixuan Cao caoyixuan2019@email.szu.edu.cn, Yinan Zhang zhangyinan2019@email.szu.edu.cn, Chongxi Zhao zhaochongxi2019@email.szu.edu.cn, Yuhong Feng yuhongf@szu.edu.cn.
Link: https://lkml.kernel.org/r/1646835223-7584-1-git-send-email-yejiajian2018@ema... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Sean Anderson seanga2@gmail.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Zhenliang Wei weizhenliang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 5 ++ tools/vm/page_owner_sort.c | 120 +++++++++++++++++++++++++------- 2 files changed, 98 insertions(+), 27 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 73a10b2f368c..4014bfb529b2 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -129,3 +129,8 @@ Usage
Filter: -f Filter out the information of blocks whose memory has not been released. + + Select: + --pid <PID> Select by pid. + --tgid <TGID> Select by tgid. + --name <command> Select by task command name. \ No newline at end of file diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index e508abd8f665..e873eff84462 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -21,6 +21,12 @@ #include <regex.h> #include <errno.h> #include <linux/types.h> +#include <getopt.h> + +#define bool int +#define true 1 +#define false 0 +#define TASK_COMM_LEN 16
struct block_list { char *txt; @@ -34,7 +40,18 @@ struct block_list { pid_t pid; pid_t tgid; }; - +enum FILTER_BIT { + FILTER_UNRELEASE = 1<<1, + FILTER_PID = 1<<2, + FILTER_TGID = 1<<3, + FILTER_TASK_COMM_NAME = 1<<4 +}; +struct filter_condition { + pid_t tgid; + pid_t pid; + char comm[TASK_COMM_LEN]; +}; +static struct filter_condition fc; static regex_t order_pattern; static regex_t pid_pattern; static regex_t tgid_pattern; @@ -154,7 +171,6 @@ static void check_regcomp(regex_t *pattern, const char *regex) }
# define FIELD_BUFF 25 -# define TASK_COMM_LEN 16
static int get_page_num(char *buf) { @@ -259,11 +275,30 @@ static char *get_comm(char *buf) return comm_str; }
+static bool is_need(char *buf) +{ + if ((filter & FILTER_UNRELEASE) != 0 && get_free_ts_nsec(buf) != 0) + return false; + if ((filter & FILTER_PID) != 0 && get_pid(buf) != fc.pid) + return false; + if ((filter & FILTER_TGID) != 0 && get_tgid(buf) != fc.tgid) + return false; + + char *comm = get_comm(buf); + + if ((filter & FILTER_TASK_COMM_NAME) != 0 && + strncmp(comm, fc.comm, TASK_COMM_LEN) != 0) { + free(comm); + return false; + } + return true; +} + static void add_list(char *buf, int len) { if (list_size != 0 && - len == list[list_size-1].len && - memcmp(buf, list[list_size-1].txt, len) == 0) { + len == list[list_size-1].len && + memcmp(buf, list[list_size-1].txt, len) == 0) { list[list_size-1].num++; list[list_size-1].page_num += get_page_num(buf); return; @@ -272,28 +307,27 @@ static void add_list(char *buf, int len) printf("max_size too small??\n"); exit(1); } - - list[list_size].free_ts_nsec = get_free_ts_nsec(buf); - if (filter == 1 && list[list_size].free_ts_nsec != 0) + if (!is_need(buf)) return; + list[list_size].pid = get_pid(buf); + list[list_size].tgid = get_tgid(buf); + list[list_size].comm = get_comm(buf); list[list_size].txt = malloc(len+1); if (!list[list_size].txt) { printf("Out of memory\n"); exit(1); } - + memcpy(list[list_size].txt, buf, len); + list[list_size].txt[len] = 0; list[list_size].len = len; list[list_size].num = 1; list[list_size].page_num = get_page_num(buf); - memcpy(list[list_size].txt, buf, len); - list[list_size].txt[len] = 0; + list[list_size].stacktrace = strchr(list[list_size].txt, '\n') ?: ""; if (*list[list_size].stacktrace == '\n') list[list_size].stacktrace++; - list[list_size].pid = get_pid(buf); - list[list_size].tgid = get_tgid(buf); - list[list_size].comm = get_comm(buf); list[list_size].ts_nsec = get_ts_nsec(buf); + list[list_size].free_ts_nsec = get_free_ts_nsec(buf); list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size); @@ -306,16 +340,19 @@ static void add_list(char *buf, int len) static void usage(void) { printf("Usage: ./page_owner_sort [OPTIONS] <input> <output>\n" - "-m Sort by total memory.\n" - "-s Sort by the stack trace.\n" - "-t Sort by times (default).\n" - "-p Sort by pid.\n" - "-P Sort by tgid.\n" - "-n Sort by task command name.\n" - "-a Sort by memory allocate time.\n" - "-r Sort by memory release time.\n" - "-c Cull by comparing stacktrace instead of total block.\n" - "-f Filter out the information of blocks whose memory has been released.\n" + "-m\t\tSort by total memory.\n" + "-s\t\tSort by the stack trace.\n" + "-t\t\tSort by times (default).\n" + "-p\t\tSort by pid.\n" + "-P\t\tSort by tgid.\n" + "-n\t\tSort by task command name.\n" + "-a\t\tSort by memory allocate time.\n" + "-r\t\tSort by memory release time.\n" + "-c\t\tCull by comparing stacktrace instead of total block.\n" + "-f\t\tFilter out the information of blocks whose memory has been released.\n" + "--pid <PID>\tSelect by pid. This selects the information of blocks whose process ID number equals to <PID>.\n" + "--tgid <TGID>\tSelect by tgid. This selects the information of blocks whose Thread Group ID number equals to <TGID>.\n" + "--name <command>\n\t\tSelect by command name. This selects the information of blocks whose command name identical to <command>.\n" ); }
@@ -323,12 +360,18 @@ int main(int argc, char **argv) { int (*cmp)(const void *, const void *) = compare_num; FILE *fin, *fout; - char *buf; + char *buf, *endptr; int ret, i, count; struct stat st; int opt; - - while ((opt = getopt(argc, argv, "acfmnprstP")) != -1) + struct option longopts[] = { + { "pid", required_argument, NULL, 1 }, + { "tgid", required_argument, NULL, 2 }, + { "name", required_argument, NULL, 3 }, + { 0, 0, 0, 0}, + }; + + while ((opt = getopt_long(argc, argv, "acfmnprstP", longopts, NULL)) != -1) switch (opt) { case 'a': cmp = compare_ts; @@ -337,7 +380,7 @@ int main(int argc, char **argv) cull_st = 1; break; case 'f': - filter = 1; + filter = filter | FILTER_UNRELEASE; break; case 'm': cmp = compare_page_num; @@ -360,6 +403,29 @@ int main(int argc, char **argv) case 'n': cmp = compare_comm; break; + case 1: + filter = filter | FILTER_PID; + errno = 0; + fc.pid = strtol(optarg, &endptr, 10); + if (errno != 0 || endptr == optarg || *endptr != '\0') { + printf("wrong/invalid pid in from the command line:%s\n", optarg); + exit(1); + } + break; + case 2: + filter = filter | FILTER_TGID; + errno = 0; + fc.tgid = strtol(optarg, &endptr, 10); + if (errno != 0 || endptr == optarg || *endptr != '\0') { + printf("wrong/invalid tgid in from the command line:%s\n", optarg); + exit(1); + } + break; + case 3: + filter = filter | FILTER_TASK_COMM_NAME; + strncpy(fc.comm, optarg, TASK_COMM_LEN); + fc.comm[TASK_COMM_LEN-1] = '\0'; + break; default: usage(); exit(1);
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit 9c8a0a8e599f4a949ef18207ba495fb557dd1016 category: feature bugzilla: 188978
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing page owner information, we may want to cull blocks of information with our own rules. So it is important to enhance culling function to provide the support for customizing culling rules. Therefore, following adjustments are made:
1. Add --cull option to support the culling of blocks of information with user-defined culling rules.
./page_owner_sort <input> <output> --cull=<rules> ./page_owner_sort <input> <output> --cull <rules>
<rules> is a single argument in the form of a comma-separated list to specify individual culling rules, by the sequence of keys k1,k2, .... Mixed use of abbreviated and complete-form of keys is allowed.
For reference, please see the document(Documentation/vm/page_owner.rst).
Now, assuming two blocks in the input file are as follows:
Page allocated via order 0, mask xxxx, pid 1, tgid 1 (task_name_demo) PFN xxxx prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158
Page allocated via order 0, mask xxxx, pid 32, tgid 32 (task_name_demo) PFN xxxx prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158
If we want to cull the blocks by stacktrace and task command name, we can use this command:
./page_owner_sort <input> <output> --cull=stacktrace,name
The output would be like:
2 times, 2 pages, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158
As we can see, these two blocks are culled successfully, for they share the same pid and task command name.
However, if we want to cull the blocks by pid, stacktrace and task command name, we can this command:
./page_owner_sort <input> <output> --cull=stacktrace,name,pid
The output would be like:
1 times, 1 pages, PID 1, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158
1 times, 1 pages, PID 32, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158
As we can see, these two blocks are failed to cull, for their PIDs are different.
2. Add explanations of --cull options to the document.
This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng
Link: https://lkml.kernel.org/r/20220312145834.624-1-yejiajian2018@email.szu.edu.c... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Sean Anderson seanga2@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 31 ++++++- tools/vm/page_owner_sort.c | 150 +++++++++++++++++++++++++++----- 2 files changed, 158 insertions(+), 23 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 4014bfb529b2..23565062ead1 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -126,11 +126,38 @@ Usage
Cull: -c Cull by comparing stacktrace instead of total block. + --cull <rules> + Specify culling rules.Culling syntax is key[,key[,...]].Choose a + multi-letter key from the **STANDARD FORMAT SPECIFIERS** section. + + + <rules> is a single argument in the form of a comma-separated list, + which offers a way to specify individual culling rules. The recognized + keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. + <rules> can be specified by the sequence of keys k1,k2, ..., as described in + the STANDARD SORT KEYS section below. Mixed use of abbreviated and + complete-form of keys is allowed. + + + Examples: + ./page_owner_sort <input> <output> --cull=stacktrace + ./page_owner_sort <input> <output> --cull=st,pid,name + ./page_owner_sort <input> <output> --cull=n,f
Filter: -f Filter out the information of blocks whose memory has not been released.
Select: - --pid <PID> Select by pid. + --pid <PID> Select by pid. --tgid <TGID> Select by tgid. - --name <command> Select by task command name. \ No newline at end of file + --name <command> Select by task command name. + +STANDARD FORMAT SPECIFIERS +========================== + + KEY LONG DESCRIPTION + p pid process ID + tg tgid thread group ID + n name task command name + f free whether the page has been released or not + st stacktrace stace trace of the page allocation \ No newline at end of file diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index e873eff84462..7679335fce5b 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -44,7 +44,14 @@ enum FILTER_BIT { FILTER_UNRELEASE = 1<<1, FILTER_PID = 1<<2, FILTER_TGID = 1<<3, - FILTER_TASK_COMM_NAME = 1<<4 + FILTER_COMM = 1<<4 +}; +enum CULL_BIT { + CULL_UNRELEASE = 1<<1, + CULL_PID = 1<<2, + CULL_TGID = 1<<3, + CULL_COMM = 1<<4, + CULL_STACKTRACE = 1<<5 }; struct filter_condition { pid_t tgid; @@ -61,7 +68,7 @@ static regex_t free_ts_nsec_pattern; static struct block_list *list; static int list_size; static int max_size; -static int cull_st; +static int cull; static int filter;
int read_block(char *buf, int buf_size, FILE *fin) @@ -142,6 +149,36 @@ static int compare_free_ts(const void *p1, const void *p2) return l1->free_ts_nsec < l2->free_ts_nsec ? -1 : 1; }
+ +static int compare_release(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + if (!l1->free_ts_nsec && !l2->free_ts_nsec) + return 0; + if (l1->free_ts_nsec && l2->free_ts_nsec) + return 0; + return l1->free_ts_nsec ? 1 : -1; +} + + +static int compare_cull_condition(const void *p1, const void *p2) +{ + if (cull == 0) + return compare_txt(p1, p2); + if ((cull & CULL_STACKTRACE) && compare_stacktrace(p1, p2)) + return compare_stacktrace(p1, p2); + if ((cull & CULL_PID) && compare_pid(p1, p2)) + return compare_pid(p1, p2); + if ((cull & CULL_TGID) && compare_tgid(p1, p2)) + return compare_tgid(p1, p2); + if ((cull & CULL_COMM) && compare_comm(p1, p2)) + return compare_comm(p1, p2); + if ((cull & CULL_UNRELEASE) && compare_release(p1, p2)) + return compare_release(p1, p2); + return 0; +} + static int search_pattern(regex_t *pattern, char *pattern_str, char *buf) { int err, val_len; @@ -170,6 +207,38 @@ static void check_regcomp(regex_t *pattern, const char *regex) } }
+static char **explode(char sep, const char *str, int *size) +{ + int count = 0, len = strlen(str); + int lastindex = -1, j = 0; + + for (int i = 0; i < len; i++) + if (str[i] == sep) + count++; + char **ret = calloc(++count, sizeof(char *)); + + for (int i = 0; i < len; i++) { + if (str[i] == sep) { + ret[j] = calloc(i - lastindex, sizeof(char)); + memcpy(ret[j++], str + lastindex + 1, i - lastindex - 1); + lastindex = i; + } + } + if (lastindex <= len - 1) { + ret[j] = calloc(len - lastindex, sizeof(char)); + memcpy(ret[j++], str + lastindex + 1, strlen(str) - 1 - lastindex); + } + *size = j; + return ret; +} + +static void free_explode(char **arr, int size) +{ + for (int i = 0; i < size; i++) + free(arr[i]); + free(arr); +} + # define FIELD_BUFF 25
static int get_page_num(char *buf) @@ -277,16 +346,16 @@ static char *get_comm(char *buf)
static bool is_need(char *buf) { - if ((filter & FILTER_UNRELEASE) != 0 && get_free_ts_nsec(buf) != 0) + if ((filter & FILTER_UNRELEASE) && get_free_ts_nsec(buf) != 0) return false; - if ((filter & FILTER_PID) != 0 && get_pid(buf) != fc.pid) + if ((filter & FILTER_PID) && get_pid(buf) != fc.pid) return false; - if ((filter & FILTER_TGID) != 0 && get_tgid(buf) != fc.tgid) + if ((filter & FILTER_TGID) && get_tgid(buf) != fc.tgid) return false;
char *comm = get_comm(buf);
- if ((filter & FILTER_TASK_COMM_NAME) != 0 && + if ((filter & FILTER_COMM) && strncmp(comm, fc.comm, TASK_COMM_LEN) != 0) { free(comm); return false; @@ -335,6 +404,30 @@ static void add_list(char *buf, int len) } }
+static bool parse_cull_args(const char *arg_str) +{ + int size = 0; + char **args = explode(',', arg_str, &size); + + for (int i = 0; i < size; ++i) + if (!strcmp(args[i], "pid") || !strcmp(args[i], "p")) + cull |= CULL_PID; + else if (!strcmp(args[i], "tgid") || !strcmp(args[i], "tg")) + cull |= CULL_TGID; + else if (!strcmp(args[i], "name") || !strcmp(args[i], "n")) + cull |= CULL_COMM; + else if (!strcmp(args[i], "stacktrace") || !strcmp(args[i], "st")) + cull |= CULL_STACKTRACE; + else if (!strcmp(args[i], "free") || !strcmp(args[i], "f")) + cull |= CULL_UNRELEASE; + else { + free_explode(args, size); + return false; + } + free_explode(args, size); + return true; +} + #define BUF_SIZE (128 * 1024)
static void usage(void) @@ -353,6 +446,7 @@ static void usage(void) "--pid <PID>\tSelect by pid. This selects the information of blocks whose process ID number equals to <PID>.\n" "--tgid <TGID>\tSelect by tgid. This selects the information of blocks whose Thread Group ID number equals to <TGID>.\n" "--name <command>\n\t\tSelect by command name. This selects the information of blocks whose command name identical to <command>.\n" + "--cull <rules>\tCull by user-defined rules. <rules> is a single argument in the form of a comma-separated list with some common fields predefined\n" ); }
@@ -368,6 +462,7 @@ int main(int argc, char **argv) { "pid", required_argument, NULL, 1 }, { "tgid", required_argument, NULL, 2 }, { "name", required_argument, NULL, 3 }, + { "cull", required_argument, NULL, 4 }, { 0, 0, 0, 0}, };
@@ -377,7 +472,7 @@ int main(int argc, char **argv) cmp = compare_ts; break; case 'c': - cull_st = 1; + cull = cull | CULL_STACKTRACE; break; case 'f': filter = filter | FILTER_UNRELEASE; @@ -422,10 +517,17 @@ int main(int argc, char **argv) } break; case 3: - filter = filter | FILTER_TASK_COMM_NAME; + filter = filter | FILTER_COMM; strncpy(fc.comm, optarg, TASK_COMM_LEN); fc.comm[TASK_COMM_LEN-1] = '\0'; break; + case 4: + if (!parse_cull_args(optarg)) { + printf("wrong argument after --cull in from the command line:%s\n", + optarg); + exit(1); + } + break; default: usage(); exit(1); @@ -472,20 +574,13 @@ int main(int argc, char **argv)
printf("sorting ....\n");
- if (cull_st == 1) - qsort(list, list_size, sizeof(list[0]), compare_stacktrace); - else - qsort(list, list_size, sizeof(list[0]), compare_txt); - - + qsort(list, list_size, sizeof(list[0]), compare_cull_condition);
printf("culling\n");
- long offset = cull_st ? &list[0].stacktrace - &list[0].txt : 0; - for (i = count = 0; i < list_size; i++) { if (count == 0 || - strcmp(*(&list[count-1].txt+offset), *(&list[i].txt+offset)) != 0) { + compare_cull_condition((void *)(&list[count-1]), (void *)(&list[i])) != 0) { list[count++] = list[i]; } else { list[count-1].num += list[i].num; @@ -496,12 +591,25 @@ int main(int argc, char **argv) qsort(list, count, sizeof(list[0]), cmp);
for (i = 0; i < count; i++) { - if (cull_st == 0) + if (cull == 0) fprintf(fout, "%d times, %d pages:\n%s\n", list[i].num, list[i].page_num, list[i].txt); - else - fprintf(fout, "%d times, %d pages:\n%s\n", - list[i].num, list[i].page_num, list[i].stacktrace); + else { + fprintf(fout, "%d times, %d pages", + list[i].num, list[i].page_num); + if (cull & CULL_PID || filter & FILTER_PID) + fprintf(fout, ", PID %d", list[i].pid); + if (cull & CULL_TGID || filter & FILTER_TGID) + fprintf(fout, ", TGID %d", list[i].pid); + if (cull & CULL_COMM || filter & FILTER_COMM) + fprintf(fout, ", task_comm_name: %s", list[i].comm); + if (cull & CULL_UNRELEASE) + fprintf(fout, " (%s)", + list[i].free_ts_nsec ? "UNRELEASED" : "RELEASED"); + if (cull & CULL_STACKTRACE) + fprintf(fout, ":\n%s", list[i].stacktrace); + fprintf(fout, "\n"); + } } regfree(&order_pattern); regfree(&pid_pattern);
From: Yinan Zhang zhangyinan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit d8b7b3fa9f9b2dc67fa1df29c4ce98eb10d62824 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The -c option is used to cull by stacktrace. Now, --cull option has been Added in page_owner_sort.c. Culling by stacktrace is one of the function of "--cull". No need to set an extra parameter. So remove -c option.
Remove parsing of -c when parse parameter and remove "-c" from usage.
This work is coauthored by Shenghong Han Yixuan Cao Chongxi Zhao Jiajian Ye Yuhong Feng Yongqiang Liu
Link: https://lkml.kernel.org/r/20220326085920.1470081-1-zhangyinan2019@email.szu.... Signed-off-by: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Georgi Djakov georgi.djakov@linaro.org Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Jonathan Corbet corbet@lwn.net Cc: Sean Anderson seanga2@gmail.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Zhenliang Wei weizhenliang@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 7679335fce5b..7d98e76c2291 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -441,7 +441,6 @@ static void usage(void) "-n\t\tSort by task command name.\n" "-a\t\tSort by memory allocate time.\n" "-r\t\tSort by memory release time.\n" - "-c\t\tCull by comparing stacktrace instead of total block.\n" "-f\t\tFilter out the information of blocks whose memory has been released.\n" "--pid <PID>\tSelect by pid. This selects the information of blocks whose process ID number equals to <PID>.\n" "--tgid <TGID>\tSelect by tgid. This selects the information of blocks whose Thread Group ID number equals to <TGID>.\n" @@ -466,14 +465,11 @@ int main(int argc, char **argv) { 0, 0, 0, 0}, };
- while ((opt = getopt_long(argc, argv, "acfmnprstP", longopts, NULL)) != -1) + while ((opt = getopt_long(argc, argv, "afmnprstP", longopts, NULL)) != -1) switch (opt) { case 'a': cmp = compare_ts; break; - case 'c': - cull = cull | CULL_STACKTRACE; - break; case 'f': filter = filter | FILTER_UNRELEASE; break;
From: Yinan Zhang zhangyinan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.18-rc1 commit c89b3ad2dea254ad17ae2585b17c2cf9f78e64d9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
-c option has been removed from page_owner_sort.c.
Remove the usage of -c option from Documentation.
This work is coauthored by Shenghong Han Yixuan Cao Chongxi Zhao Jiajian Ye Yuhong Feng Yongqiang Liu
Link: https://lkml.kernel.org/r/20220326085920.1470081-2-zhangyinan2019@email.szu.... Signed-off-by: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Jonathan Corbet corbet@lwn.net Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Sean Anderson seanga2@gmail.com Cc: Tang Bin tangbin@cmss.chinamobile.com Cc: Zhenliang Wei weizhenliang@huawei.com Cc: Georgi Djakov georgi.djakov@linaro.org Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 1 - 1 file changed, 1 deletion(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 23565062ead1..c0dda23aa7d8 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -125,7 +125,6 @@ Usage additional function:
Cull: - -c Cull by comparing stacktrace instead of total block. --cull <rules> Specify culling rules.Culling syntax is key[,key[,...]].Choose a multi-letter key from the **STANDARD FORMAT SPECIFIERS** section.
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.19-rc1 commit 329687a03d18143f491b535d22be1cccc291bb58 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Error messages should be send to stderr using fprintf() instead of printf().
This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu
Link: https://lkml.kernel.org/r/20220401024856.767-1-yejiajian2018@email.szu.edu.c... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Haowen Bai baihaowen@meizu.com Cc: Sean Anderson seanga2@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 7d98e76c2291..6771003ed5f1 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -186,7 +186,7 @@ static int search_pattern(regex_t *pattern, char *pattern_str, char *buf)
err = regexec(pattern, buf, 2, pmatch, REG_NOTBOL); if (err != 0 || pmatch[1].rm_so == -1) { - printf("no matching pattern in %s\n", buf); + fprintf(stderr, "no matching pattern in %s\n", buf); return -1; } val_len = pmatch[1].rm_eo - pmatch[1].rm_so; @@ -202,7 +202,7 @@ static void check_regcomp(regex_t *pattern, const char *regex)
err = regcomp(pattern, regex, REG_EXTENDED | REG_NEWLINE); if (err != 0 || pattern->re_nsub != 1) { - printf("Invalid pattern %s code %d\n", regex, err); + fprintf(stderr, "Invalid pattern %s code %d\n", regex, err); exit(1); } } @@ -251,7 +251,7 @@ static int get_page_num(char *buf) errno = 0; order_val = strtol(order_str, &endptr, 10); if (order_val > 64 || errno != 0 || endptr == order_str || *endptr != '\0') { - printf("wrong order in follow buf:\n%s\n", buf); + fprintf(stderr, "wrong order in follow buf:\n%s\n", buf); return 0; }
@@ -268,7 +268,7 @@ static pid_t get_pid(char *buf) errno = 0; pid = strtol(pid_str, &endptr, 10); if (errno != 0 || endptr == pid_str || *endptr != '\0') { - printf("wrong/invalid pid in follow buf:\n%s\n", buf); + fprintf(stderr, "wrong/invalid pid in follow buf:\n%s\n", buf); return -1; }
@@ -286,7 +286,7 @@ static pid_t get_tgid(char *buf) errno = 0; tgid = strtol(tgid_str, &endptr, 10); if (errno != 0 || endptr == tgid_str || *endptr != '\0') { - printf("wrong/invalid tgid in follow buf:\n%s\n", buf); + fprintf(stderr, "wrong/invalid tgid in follow buf:\n%s\n", buf); return -1; }
@@ -304,7 +304,7 @@ static __u64 get_ts_nsec(char *buf) errno = 0; ts_nsec = strtoull(ts_nsec_str, &endptr, 10); if (errno != 0 || endptr == ts_nsec_str || *endptr != '\0') { - printf("wrong ts_nsec in follow buf:\n%s\n", buf); + fprintf(stderr, "wrong ts_nsec in follow buf:\n%s\n", buf); return -1; }
@@ -321,7 +321,7 @@ static __u64 get_free_ts_nsec(char *buf) errno = 0; free_ts_nsec = strtoull(free_ts_nsec_str, &endptr, 10); if (errno != 0 || endptr == free_ts_nsec_str || *endptr != '\0') { - printf("wrong free_ts_nsec in follow buf:\n%s\n", buf); + fprintf(stderr, "wrong free_ts_nsec in follow buf:\n%s\n", buf); return -1; }
@@ -337,7 +337,7 @@ static char *get_comm(char *buf) search_pattern(&comm_pattern, comm_str, buf); errno = 0; if (errno != 0) { - printf("wrong comm in follow buf:\n%s\n", buf); + fprintf(stderr, "wrong comm in follow buf:\n%s\n", buf); return NULL; }
@@ -373,7 +373,7 @@ static void add_list(char *buf, int len) return; } if (list_size == max_size) { - printf("max_size too small??\n"); + fprintf(stderr, "max_size too small??\n"); exit(1); } if (!is_need(buf)) @@ -383,7 +383,7 @@ static void add_list(char *buf, int len) list[list_size].comm = get_comm(buf); list[list_size].txt = malloc(len+1); if (!list[list_size].txt) { - printf("Out of memory\n"); + fprintf(stderr, "Out of memory\n"); exit(1); } memcpy(list[list_size].txt, buf, len); @@ -499,7 +499,8 @@ int main(int argc, char **argv) errno = 0; fc.pid = strtol(optarg, &endptr, 10); if (errno != 0 || endptr == optarg || *endptr != '\0') { - printf("wrong/invalid pid in from the command line:%s\n", optarg); + fprintf(stderr, "wrong/invalid pid in from the command line:%s\n", + optarg); exit(1); } break; @@ -508,7 +509,8 @@ int main(int argc, char **argv) errno = 0; fc.tgid = strtol(optarg, &endptr, 10); if (errno != 0 || endptr == optarg || *endptr != '\0') { - printf("wrong/invalid tgid in from the command line:%s\n", optarg); + fprintf(stderr, "wrong/invalid tgid in from the command line:%s\n", + optarg); exit(1); } break; @@ -519,7 +521,7 @@ int main(int argc, char **argv) break; case 4: if (!parse_cull_args(optarg)) { - printf("wrong argument after --cull in from the command line:%s\n", + fprintf(stderr, "wrong argument after --cull option:%s\n", optarg); exit(1); } @@ -554,7 +556,7 @@ int main(int argc, char **argv) list = malloc(max_size * sizeof(*list)); buf = malloc(BUF_SIZE); if (!list || !buf) { - printf("Out of memory\n"); + fprintf(stderr, "Out of memory\n"); exit(1); }
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.19-rc1 commit 75382a2dca0e9e9e57e88b479cf537549461a934 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing page owner information, we may want to select blocks whose PID/TGID/TASK_COMM_NAME appears in a user-specified list for data analysis and aggregation. But currently page_owner_sort only supports selecting blocks associated with only one specified PID/TGID/TASK_COMM_NAME.
Therefore, following adjustments are made to fix the problem:
1. Enhance selecting function to support the selection of multiple PIDs/TGIDs/TASK_COMM_NAMEs.
The enhanced usages are as follows:
--pid <pidlist> Select by pid. This selects the blocks whose PID numbers appear in <pidlist>. --tgid <tgidlist> Select by tgid. This selects the blocks whose TGID numbers appear in <tgidlist>. --name <cmdlist> Select by task command name. This selects the blocks whose task command name appear in <cmdlist>.
Where <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list,which offers a way to specify individual selecting rules.
For example, if you want to select blocks whose tgids are 1, 2 or 3, you have to use 4 commands as follows:
./page_owner_sort <input> <output1> --tgid=1 ./page_owner_sort <input> <output2> --tgid=2 ./page_owner_sort <input> <output3> --tgid=3 cat <output1> <output2> <output3> > <output>
With this patch, you can use only 1 command to obtain the same result as above:
./page_owner_sort <input> <output1> --tgid=1,2,3
2. Update explanations of --pid, --tgid and --name in the function usage() and the document(Documents/vm/page_owner.rst).
This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu
Link: https://lkml.kernel.org/r/20220401024856.767-2-yejiajian2018@email.szu.edu.c... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Haowen Bai baihaowen@meizu.com Cc: Sean Anderson seanga2@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 20 ++++++--- tools/vm/page_owner_sort.c | 78 ++++++++++++++++++++++++--------- 2 files changed, 72 insertions(+), 26 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index c0dda23aa7d8..4712159e75b9 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -129,7 +129,6 @@ Usage Specify culling rules.Culling syntax is key[,key[,...]].Choose a multi-letter key from the **STANDARD FORMAT SPECIFIERS** section.
- <rules> is a single argument in the form of a comma-separated list, which offers a way to specify individual culling rules. The recognized keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. @@ -137,7 +136,6 @@ Usage the STANDARD SORT KEYS section below. Mixed use of abbreviated and complete-form of keys is allowed.
- Examples: ./page_owner_sort <input> <output> --cull=stacktrace ./page_owner_sort <input> <output> --cull=st,pid,name @@ -147,9 +145,21 @@ Usage -f Filter out the information of blocks whose memory has not been released.
Select: - --pid <PID> Select by pid. - --tgid <TGID> Select by tgid. - --name <command> Select by task command name. + --pid <pidlist> Select by pid. This selects the blocks whose process ID + numbers appear in <pidlist>. + --tgid <tgidlist> Select by tgid. This selects the blocks whose thread + group ID numbers appear in <tgidlist>. + --name <cmdlist> Select by task command name. This selects the blocks whose + task command name appear in <cmdlist>. + + <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, + which offers a way to specify individual selecting rules. + + + Examples: + ./page_owner_sort <input> <output> --pid=1 + ./page_owner_sort <input> <output> --tgid=1,2,3 + ./page_owner_sort <input> <output> --name name1,name2s
STANDARD FORMAT SPECIFIERS ========================== diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 6771003ed5f1..16fb034c6a4e 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -54,9 +54,12 @@ enum CULL_BIT { CULL_STACKTRACE = 1<<5 }; struct filter_condition { - pid_t tgid; - pid_t pid; - char comm[TASK_COMM_LEN]; + pid_t *tgids; + int tgids_size; + pid_t *pids; + int pids_size; + char **comms; + int comms_size; }; static struct filter_condition fc; static regex_t order_pattern; @@ -149,7 +152,6 @@ static int compare_free_ts(const void *p1, const void *p2) return l1->free_ts_nsec < l2->free_ts_nsec ? -1 : 1; }
- static int compare_release(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2; @@ -161,7 +163,6 @@ static int compare_release(const void *p1, const void *p2) return l1->free_ts_nsec ? 1 : -1; }
- static int compare_cull_condition(const void *p1, const void *p2) { if (cull == 0) @@ -344,22 +345,40 @@ static char *get_comm(char *buf) return comm_str; }
+static bool match_num_list(int num, int *list, int list_size) +{ + for (int i = 0; i < list_size; ++i) + if (list[i] == num) + return true; + return false; +} + +static bool match_str_list(const char *str, char **list, int list_size) +{ + for (int i = 0; i < list_size; ++i) + if (!strcmp(list[i], str)) + return true; + return false; +} + static bool is_need(char *buf) { if ((filter & FILTER_UNRELEASE) && get_free_ts_nsec(buf) != 0) return false; - if ((filter & FILTER_PID) && get_pid(buf) != fc.pid) + if ((filter & FILTER_PID) && !match_num_list(get_pid(buf), fc.pids, fc.pids_size)) return false; - if ((filter & FILTER_TGID) && get_tgid(buf) != fc.tgid) + if ((filter & FILTER_TGID) && + !match_num_list(get_tgid(buf), fc.tgids, fc.tgids_size)) return false;
char *comm = get_comm(buf);
if ((filter & FILTER_COMM) && - strncmp(comm, fc.comm, TASK_COMM_LEN) != 0) { + !match_str_list(comm, fc.comms, fc.comms_size)) { free(comm); return false; } + free(comm); return true; }
@@ -428,6 +447,27 @@ static bool parse_cull_args(const char *arg_str) return true; }
+static int *parse_nums_list(char *arg_str, int *list_size) +{ + int size = 0; + char **args = explode(',', arg_str, &size); + int *list = calloc(size, sizeof(int)); + + errno = 0; + for (int i = 0; i < size; ++i) { + char *endptr = NULL; + + list[i] = strtol(args[i], &endptr, 10); + if (errno != 0 || endptr == args[i] || *endptr != '\0') { + free(list); + return NULL; + } + } + *list_size = size; + free_explode(args, size); + return list; +} + #define BUF_SIZE (128 * 1024)
static void usage(void) @@ -442,9 +482,9 @@ static void usage(void) "-a\t\tSort by memory allocate time.\n" "-r\t\tSort by memory release time.\n" "-f\t\tFilter out the information of blocks whose memory has been released.\n" - "--pid <PID>\tSelect by pid. This selects the information of blocks whose process ID number equals to <PID>.\n" - "--tgid <TGID>\tSelect by tgid. This selects the information of blocks whose Thread Group ID number equals to <TGID>.\n" - "--name <command>\n\t\tSelect by command name. This selects the information of blocks whose command name identical to <command>.\n" + "--pid <pidlist>\tSelect by pid. This selects the information of blocks whose process ID numbers appear in <pidlist>.\n" + "--tgid <tgidlist>\tSelect by tgid. This selects the information of blocks whose Thread Group ID numbers appear in <tgidlist>.\n" + "--name <cmdlist>\n\t\tSelect by command name. This selects the information of blocks whose command name appears in <cmdlist>.\n" "--cull <rules>\tCull by user-defined rules. <rules> is a single argument in the form of a comma-separated list with some common fields predefined\n" ); } @@ -453,7 +493,7 @@ int main(int argc, char **argv) { int (*cmp)(const void *, const void *) = compare_num; FILE *fin, *fout; - char *buf, *endptr; + char *buf; int ret, i, count; struct stat st; int opt; @@ -496,9 +536,8 @@ int main(int argc, char **argv) break; case 1: filter = filter | FILTER_PID; - errno = 0; - fc.pid = strtol(optarg, &endptr, 10); - if (errno != 0 || endptr == optarg || *endptr != '\0') { + fc.pids = parse_nums_list(optarg, &fc.pids_size); + if (fc.pids == NULL) { fprintf(stderr, "wrong/invalid pid in from the command line:%s\n", optarg); exit(1); @@ -506,9 +545,8 @@ int main(int argc, char **argv) break; case 2: filter = filter | FILTER_TGID; - errno = 0; - fc.tgid = strtol(optarg, &endptr, 10); - if (errno != 0 || endptr == optarg || *endptr != '\0') { + fc.tgids = parse_nums_list(optarg, &fc.tgids_size); + if (fc.tgids == NULL) { fprintf(stderr, "wrong/invalid tgid in from the command line:%s\n", optarg); exit(1); @@ -516,8 +554,7 @@ int main(int argc, char **argv) break; case 3: filter = filter | FILTER_COMM; - strncpy(fc.comm, optarg, TASK_COMM_LEN); - fc.comm[TASK_COMM_LEN-1] = '\0'; + fc.comms = explode(',', optarg, &fc.comms_size); break; case 4: if (!parse_cull_args(optarg)) { @@ -564,7 +601,6 @@ int main(int argc, char **argv) ret = read_block(buf, BUF_SIZE, fin); if (ret < 0) break; - add_list(buf, ret); }
From: Akira Yokosawa akiyks@gmail.com
mainline inclusion from mainline-v5.18-rc5 commit 5603f9bdea68406f54132125b6fdddeeb5c0d2e4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Sphinx generates hard-to-read lists of parameters at the bottom of the page. Fix them by putting literal-block markers of "::" in front of them.
Link: https://lkml.kernel.org/r/cfd3bcc0-b51d-0c68-c065-ca1c4c202447@gmail.com Signed-off-by: Akira Yokosawa akiyks@gmail.com Fixes: 57f2b54a9379 ("Documentation/vm/page_owner.rst: update the documentation") Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Haowen Bai baihaowen@meizu.com Cc: Jonathan Corbet corbet@lwn.net Cc: Alex Shi seakeel@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 4712159e75b9..9ab61f75819f 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -110,7 +110,7 @@ Usage If you want to sort by the page nums of buf, use the ``-m`` parameter. The detailed parameters are:
- fundamental function: + fundamental function::
Sort: -a Sort by memory allocation time. @@ -122,7 +122,7 @@ Usage -s Sort by stack trace. -t Sort by times (default).
- additional function: + additional function::
Cull: --cull <rules> @@ -163,6 +163,7 @@ Usage
STANDARD FORMAT SPECIFIERS ========================== +::
KEY LONG DESCRIPTION p pid process ID
From: Jiajian Ye yejiajian2018@email.szu.edu.cn
mainline inclusion from mainline-v5.19-rc1 commit ebbeae36387ccf1326c896167872c3acf6c3c956 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When viewing page owner information, we may want to sort blocks of information by multiple keys, since one single key does not uniquely identify a block. Therefore, following adjustments are made:
1. Add a new --sort option to support sorting blocks of information by multiple keys.
./page_owner_sort <input> <output> --sort=<order> ./page_owner_sort <input> <output> --sort <order>
<order> is a single argument in the form of a comma-separated list, which offers a way to specify sorting order.
Sorting syntax is [+|-]key[,[+|-]key[,...]]. The ascending or descending order can be specified by adding the + (ascending, default) or - (descend -ing) prefix to the key:
./page_owner_sort <input> <output> [option] --sort -key1,+key2,key3...
For example, to sort the blocks first by task command name in lexicographic order and then by pid in ascending numerical order, use the following:
./page_owner_sort <input> <output> --sort=name,+pid
To sort the blocks first by pid in ascending order and then by timestamp of the page when it is allocated in descending order, use the following:
./page_owner_sort <input> <output> --sort=pid,-alloc_ts
2. Add explanations of a newly added --sort option in the function usage() and the document(Documentation/vm/page_owner.rst).
This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu
Link: https://lkml.kernel.org/r/20220401024856.767-3-yejiajian2018@email.szu.edu.c... Signed-off-by: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Haowen Bai baihaowen@meizu.com Cc: Sean Anderson seanga2@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 23 ++++- tools/vm/page_owner_sort.c | 164 +++++++++++++++++++++++++++----- 2 files changed, 164 insertions(+), 23 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 9ab61f75819f..54df46b554ca 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -121,6 +121,14 @@ Usage -r Sort by memory release time. -s Sort by stack trace. -t Sort by times (default). + --sort <order> Specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. + Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is + optional since default direction is increasing numerical or lexicographic + order. Mixed use of abbreviated and complete-form of keys is allowed. + + Examples: + ./page_owner_sort <input> <output> --sort=n,+pid,-tgid + ./page_owner_sort <input> <output> --sort=at
additional function::
@@ -165,9 +173,22 @@ STANDARD FORMAT SPECIFIERS ========================== ::
+For --sort option: + + KEY LONG DESCRIPTION + p pid process ID + tg tgid thread group ID + n name task command name + st stacktrace stack trace of the page allocation + T txt full text of block + ft free_ts timestamp of the page when it was released + at alloc_ts timestamp of the page when it was allocated + +For --curl option: + KEY LONG DESCRIPTION p pid process ID tg tgid thread group ID n name task command name f free whether the page has been released or not - st stacktrace stace trace of the page allocation \ No newline at end of file + st stacktrace stack trace of the page allocation diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 16fb034c6a4e..beca990707fb 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -53,15 +53,29 @@ enum CULL_BIT { CULL_COMM = 1<<4, CULL_STACKTRACE = 1<<5 }; +enum ARG_TYPE { + ARG_TXT, ARG_COMM, ARG_STACKTRACE, ARG_ALLOC_TS, ARG_FREE_TS, + ARG_CULL_TIME, ARG_PAGE_NUM, ARG_PID, ARG_TGID, ARG_UNKNOWN, ARG_FREE +}; +enum SORT_ORDER { + SORT_ASC = 1, + SORT_DESC = -1, +}; struct filter_condition { - pid_t *tgids; - int tgids_size; pid_t *pids; - int pids_size; + pid_t *tgids; char **comms; + int pids_size; + int tgids_size; int comms_size; }; +struct sort_condition { + int (**cmps)(const void *, const void *); + int *signs; + int size; +}; static struct filter_condition fc; +static struct sort_condition sc; static regex_t order_pattern; static regex_t pid_pattern; static regex_t tgid_pattern; @@ -107,14 +121,14 @@ static int compare_num(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2;
- return l2->num - l1->num; + return l1->num - l2->num; }
static int compare_page_num(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2;
- return l2->page_num - l1->page_num; + return l1->page_num - l2->page_num; }
static int compare_pid(const void *p1, const void *p2) @@ -180,6 +194,16 @@ static int compare_cull_condition(const void *p1, const void *p2) return 0; }
+static int compare_sort_condition(const void *p1, const void *p2) +{ + int cmp = 0; + + for (int i = 0; i < sc.size; ++i) + if (cmp == 0) + cmp = sc.signs[i] * sc.cmps[i](p1, p2); + return cmp; +} + static int search_pattern(regex_t *pattern, char *pattern_str, char *buf) { int err, val_len; @@ -345,6 +369,29 @@ static char *get_comm(char *buf) return comm_str; }
+static int get_arg_type(const char *arg) +{ + if (!strcmp(arg, "pid") || !strcmp(arg, "p")) + return ARG_PID; + else if (!strcmp(arg, "tgid") || !strcmp(arg, "tg")) + return ARG_TGID; + else if (!strcmp(arg, "name") || !strcmp(arg, "n")) + return ARG_COMM; + else if (!strcmp(arg, "stacktrace") || !strcmp(arg, "st")) + return ARG_STACKTRACE; + else if (!strcmp(arg, "free") || !strcmp(arg, "f")) + return ARG_FREE; + else if (!strcmp(arg, "txt") || !strcmp(arg, "T")) + return ARG_TXT; + else if (!strcmp(arg, "free_ts") || !strcmp(arg, "ft")) + return ARG_FREE_TS; + else if (!strcmp(arg, "alloc_ts") || !strcmp(arg, "at")) + return ARG_ALLOC_TS; + else { + return ARG_UNKNOWN; + } +} + static bool match_num_list(int num, int *list, int list_size) { for (int i = 0; i < list_size; ++i) @@ -428,21 +475,86 @@ static bool parse_cull_args(const char *arg_str) int size = 0; char **args = explode(',', arg_str, &size);
- for (int i = 0; i < size; ++i) - if (!strcmp(args[i], "pid") || !strcmp(args[i], "p")) + for (int i = 0; i < size; ++i) { + int arg_type = get_arg_type(args[i]); + + if (arg_type == ARG_PID) cull |= CULL_PID; - else if (!strcmp(args[i], "tgid") || !strcmp(args[i], "tg")) + else if (arg_type == ARG_TGID) cull |= CULL_TGID; - else if (!strcmp(args[i], "name") || !strcmp(args[i], "n")) + else if (arg_type == ARG_COMM) cull |= CULL_COMM; - else if (!strcmp(args[i], "stacktrace") || !strcmp(args[i], "st")) + else if (arg_type == ARG_STACKTRACE) cull |= CULL_STACKTRACE; - else if (!strcmp(args[i], "free") || !strcmp(args[i], "f")) + else if (arg_type == ARG_FREE) cull |= CULL_UNRELEASE; else { free_explode(args, size); return false; } + } + free_explode(args, size); + return true; +} + +static void set_single_cmp(int (*cmp)(const void *, const void *), int sign) +{ + if (sc.signs == NULL || sc.size < 1) + sc.signs = calloc(1, sizeof(int)); + sc.signs[0] = sign; + if (sc.cmps == NULL || sc.size < 1) + sc.cmps = calloc(1, sizeof(int *)); + sc.cmps[0] = cmp; + sc.size = 1; +} + +static bool parse_sort_args(const char *arg_str) +{ + int size = 0; + + if (sc.size != 0) { /* reset sort_condition */ + free(sc.signs); + free(sc.cmps); + size = 0; + } + + char **args = explode(',', arg_str, &size); + + sc.signs = calloc(size, sizeof(int)); + sc.cmps = calloc(size, sizeof(int *)); + for (int i = 0; i < size; ++i) { + int offset = 0; + + sc.signs[i] = SORT_ASC; + if (args[i][0] == '-' || args[i][0] == '+') { + if (args[i][0] == '-') + sc.signs[i] = SORT_DESC; + offset = 1; + } + + int arg_type = get_arg_type(args[i]+offset); + + if (arg_type == ARG_PID) + sc.cmps[i] = compare_pid; + else if (arg_type == ARG_TGID) + sc.cmps[i] = compare_tgid; + else if (arg_type == ARG_COMM) + sc.cmps[i] = compare_comm; + else if (arg_type == ARG_STACKTRACE) + sc.cmps[i] = compare_stacktrace; + else if (arg_type == ARG_ALLOC_TS) + sc.cmps[i] = compare_ts; + else if (arg_type == ARG_FREE_TS) + sc.cmps[i] = compare_free_ts; + else if (arg_type == ARG_TXT) + sc.cmps[i] = compare_txt; + else { + free_explode(args, size); + sc.size = 0; + return false; + } + } + sc.size = size; free_explode(args, size); return true; } @@ -485,13 +597,13 @@ static void usage(void) "--pid <pidlist>\tSelect by pid. This selects the information of blocks whose process ID numbers appear in <pidlist>.\n" "--tgid <tgidlist>\tSelect by tgid. This selects the information of blocks whose Thread Group ID numbers appear in <tgidlist>.\n" "--name <cmdlist>\n\t\tSelect by command name. This selects the information of blocks whose command name appears in <cmdlist>.\n" - "--cull <rules>\tCull by user-defined rules. <rules> is a single argument in the form of a comma-separated list with some common fields predefined\n" + "--cull <rules>\tCull by user-defined rules.<rules> is a single argument in the form of a comma-separated list with some common fields predefined\n" + "--sort <order>\tSpecify sort order as: [+|-]key[,[+|-]key[,...]]\n" ); }
int main(int argc, char **argv) { - int (*cmp)(const void *, const void *) = compare_num; FILE *fin, *fout; char *buf; int ret, i, count; @@ -502,37 +614,38 @@ int main(int argc, char **argv) { "tgid", required_argument, NULL, 2 }, { "name", required_argument, NULL, 3 }, { "cull", required_argument, NULL, 4 }, + { "sort", required_argument, NULL, 5 }, { 0, 0, 0, 0}, };
while ((opt = getopt_long(argc, argv, "afmnprstP", longopts, NULL)) != -1) switch (opt) { case 'a': - cmp = compare_ts; + set_single_cmp(compare_ts, SORT_ASC); break; case 'f': filter = filter | FILTER_UNRELEASE; break; case 'm': - cmp = compare_page_num; + set_single_cmp(compare_page_num, SORT_DESC); break; case 'p': - cmp = compare_pid; + set_single_cmp(compare_pid, SORT_ASC); break; case 'r': - cmp = compare_free_ts; + set_single_cmp(compare_free_ts, SORT_ASC); break; case 's': - cmp = compare_stacktrace; + set_single_cmp(compare_stacktrace, SORT_ASC); break; case 't': - cmp = compare_num; + set_single_cmp(compare_num, SORT_DESC); break; case 'P': - cmp = compare_tgid; + set_single_cmp(compare_tgid, SORT_ASC); break; case 'n': - cmp = compare_comm; + set_single_cmp(compare_comm, SORT_ASC); break; case 1: filter = filter | FILTER_PID; @@ -563,6 +676,13 @@ int main(int argc, char **argv) exit(1); } break; + case 5: + if (!parse_sort_args(optarg)) { + fprintf(stderr, "wrong argument after --sort option:%s\n", + optarg); + exit(1); + } + break; default: usage(); exit(1); @@ -622,7 +742,7 @@ int main(int argc, char **argv) } }
- qsort(list, count, sizeof(list[0]), cmp); + qsort(list, count, sizeof(list[0]), compare_sort_condition);
for (i = 0; i < count; i++) { if (cull == 0)
From: Haowen Bai baihaowen@meizu.com
mainline inclusion from mainline-v5.19-rc1 commit a72469aa593881c2a5ad3a38cfb3e7871c50f169 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As normal usage, tool will print huge parser log and spend a lot of time printing, so it would be preferable add "-d" debug control to avoid this problem.
Link: https://lkml.kernel.org/r/1649672446-5685-1-git-send-email-baihaowen@meizu.c... Signed-off-by: Haowen Bai baihaowen@meizu.com Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Sean Anderson seanga2@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index beca990707fb..a32e446e5bb2 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -87,6 +87,7 @@ static int list_size; static int max_size; static int cull; static int filter; +static bool debug_on;
int read_block(char *buf, int buf_size, FILE *fin) { @@ -211,7 +212,8 @@ static int search_pattern(regex_t *pattern, char *pattern_str, char *buf)
err = regexec(pattern, buf, 2, pmatch, REG_NOTBOL); if (err != 0 || pmatch[1].rm_so == -1) { - fprintf(stderr, "no matching pattern in %s\n", buf); + if (debug_on) + fprintf(stderr, "no matching pattern in %s\n", buf); return -1; } val_len = pmatch[1].rm_eo - pmatch[1].rm_so; @@ -276,7 +278,8 @@ static int get_page_num(char *buf) errno = 0; order_val = strtol(order_str, &endptr, 10); if (order_val > 64 || errno != 0 || endptr == order_str || *endptr != '\0') { - fprintf(stderr, "wrong order in follow buf:\n%s\n", buf); + if (debug_on) + fprintf(stderr, "wrong order in follow buf:\n%s\n", buf); return 0; }
@@ -293,7 +296,8 @@ static pid_t get_pid(char *buf) errno = 0; pid = strtol(pid_str, &endptr, 10); if (errno != 0 || endptr == pid_str || *endptr != '\0') { - fprintf(stderr, "wrong/invalid pid in follow buf:\n%s\n", buf); + if (debug_on) + fprintf(stderr, "wrong/invalid pid in follow buf:\n%s\n", buf); return -1; }
@@ -311,7 +315,8 @@ static pid_t get_tgid(char *buf) errno = 0; tgid = strtol(tgid_str, &endptr, 10); if (errno != 0 || endptr == tgid_str || *endptr != '\0') { - fprintf(stderr, "wrong/invalid tgid in follow buf:\n%s\n", buf); + if (debug_on) + fprintf(stderr, "wrong/invalid tgid in follow buf:\n%s\n", buf); return -1; }
@@ -329,7 +334,8 @@ static __u64 get_ts_nsec(char *buf) errno = 0; ts_nsec = strtoull(ts_nsec_str, &endptr, 10); if (errno != 0 || endptr == ts_nsec_str || *endptr != '\0') { - fprintf(stderr, "wrong ts_nsec in follow buf:\n%s\n", buf); + if (debug_on) + fprintf(stderr, "wrong ts_nsec in follow buf:\n%s\n", buf); return -1; }
@@ -346,7 +352,8 @@ static __u64 get_free_ts_nsec(char *buf) errno = 0; free_ts_nsec = strtoull(free_ts_nsec_str, &endptr, 10); if (errno != 0 || endptr == free_ts_nsec_str || *endptr != '\0') { - fprintf(stderr, "wrong free_ts_nsec in follow buf:\n%s\n", buf); + if (debug_on) + fprintf(stderr, "wrong free_ts_nsec in follow buf:\n%s\n", buf); return -1; }
@@ -362,7 +369,8 @@ static char *get_comm(char *buf) search_pattern(&comm_pattern, comm_str, buf); errno = 0; if (errno != 0) { - fprintf(stderr, "wrong comm in follow buf:\n%s\n", buf); + if (debug_on) + fprintf(stderr, "wrong comm in follow buf:\n%s\n", buf); return NULL; }
@@ -594,6 +602,7 @@ static void usage(void) "-a\t\tSort by memory allocate time.\n" "-r\t\tSort by memory release time.\n" "-f\t\tFilter out the information of blocks whose memory has been released.\n" + "-d\t\tPrint debug information.\n" "--pid <pidlist>\tSelect by pid. This selects the information of blocks whose process ID numbers appear in <pidlist>.\n" "--tgid <tgidlist>\tSelect by tgid. This selects the information of blocks whose Thread Group ID numbers appear in <tgidlist>.\n" "--name <cmdlist>\n\t\tSelect by command name. This selects the information of blocks whose command name appears in <cmdlist>.\n" @@ -618,11 +627,14 @@ int main(int argc, char **argv) { 0, 0, 0, 0}, };
- while ((opt = getopt_long(argc, argv, "afmnprstP", longopts, NULL)) != -1) + while ((opt = getopt_long(argc, argv, "adfmnprstP", longopts, NULL)) != -1) switch (opt) { case 'a': set_single_cmp(compare_ts, SORT_ASC); break; + case 'd': + debug_on = true; + break; case 'f': filter = filter | FILTER_UNRELEASE; break;
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.19-rc1 commit f09654bb88127473b4baf3bc0b68d4d4695aca7b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
An application is suspected of having memory leak when its memory consumption is high and keeps increasing. There are several commonly used memory allocators: slab, cma, vmalloc, etc. The memory leak identification can be sped up if the page information allocated by an allocator can be analyzed separately.
This patch provides supports for memory allocator labelling for slab, vmalloc, and cma. The pages allocated by slab and cma can be confirmed from the "PFN" line according to the kernel codes, and the label of the vmalloc allocator can be obtained by analyzing the stack trace. Thanks for Vlastimil Babka's constructive suggestions.
Based on Yinan Zhang's study, the call chain of vmalloc() is vmalloc() -> ... -> __vmalloc_node_range() -> __vmalloc_area_node(). __vmalloc_area_node() requests memory through the interface of buddy allocation system. In the current version, __vmalloc_area_node() uses four interfaces: alloc_pages_bulk_array_mempolicy(), alloc_pages_bulk_array_node(), alloc_pages() and alloc_pages_node(). By disassembling the code, we find that __vmalloc_area_node() is expanded in __vmalloc_node_range(). So __vmalloc_area_node is not in the stack trace.
On the test machine, the stack trace of pages allocated by vmalloc has the following four forms:
__alloc_pages_bulk+0x230/0x6a0 __vmalloc_node_range+0x19c/0x598
alloc_pages_bulk_array_mempolicy+0xbc/0x278 __vmalloc_node_range+0x1e8/0x598
__alloc_pages+0x160/0x2b0 __vmalloc_node_range+0x234/0x598
alloc_pages+0xac/0x150 __vmalloc_node_range+0x44c/0x598
Therefore, in two consecutive lines of stacktrace, if the first line contains the word "alloc_pages" and the second line contains the word "__vmalloc_node_range", it can be determined that the page is allocated by vmalloc. And the function offset and size are not the same on different machines, so there is no need to match them.
At the same time, this patch updates the --cull and --sort options to support allocator-based merge statistics and sorting. The added functions are fully compatible with the original work. When using, you can use "allocator", or abbreviated as "ator". Relevant updates have also been made in the documentation(Documentation/vm/page_owner.rst).
Example: ./page_owner_sort <input> <output> --cull=st,pid,name,allocator ./page_owner_sort <input> <output> --sort=ator,pid,name
This work is coauthored by Jiajian Ye, Yinan Zhang, Shenghong Han, Chongxi Zhao, Yuhong Feng and Yongqiang Liu.
Link: https://lkml.kernel.org/r/20220410132932.9402-1-caoyixuan2019@email.szu.edu.... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Haowen Bai baihaowen@meizu.com Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Sean Anderson seanga2@gmail.com Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 2 + tools/vm/page_owner_sort.c | 112 +++++++++++++++++++++++++++----- 2 files changed, 99 insertions(+), 15 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 54df46b554ca..3a7f8dad9c1e 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -183,6 +183,7 @@ For --sort option: T txt full text of block ft free_ts timestamp of the page when it was released at alloc_ts timestamp of the page when it was allocated + ator allocator memory allocator for pages
For --curl option:
@@ -192,3 +193,4 @@ For --curl option: n name task command name f free whether the page has been released or not st stacktrace stack trace of the page allocation + ator allocator memory allocator for pages diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index a32e446e5bb2..fa2e4d2a9d68 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -39,6 +39,7 @@ struct block_list { int page_num; pid_t pid; pid_t tgid; + int allocator; }; enum FILTER_BIT { FILTER_UNRELEASE = 1<<1, @@ -51,11 +52,19 @@ enum CULL_BIT { CULL_PID = 1<<2, CULL_TGID = 1<<3, CULL_COMM = 1<<4, - CULL_STACKTRACE = 1<<5 + CULL_STACKTRACE = 1<<5, + CULL_ALLOCATOR = 1<<6 +}; +enum ALLOCATOR_BIT { + ALLOCATOR_CMA = 1<<1, + ALLOCATOR_SLAB = 1<<2, + ALLOCATOR_VMALLOC = 1<<3, + ALLOCATOR_OTHERS = 1<<4 }; enum ARG_TYPE { ARG_TXT, ARG_COMM, ARG_STACKTRACE, ARG_ALLOC_TS, ARG_FREE_TS, - ARG_CULL_TIME, ARG_PAGE_NUM, ARG_PID, ARG_TGID, ARG_UNKNOWN, ARG_FREE + ARG_CULL_TIME, ARG_PAGE_NUM, ARG_PID, ARG_TGID, ARG_UNKNOWN, ARG_FREE, + ARG_ALLOCATOR }; enum SORT_ORDER { SORT_ASC = 1, @@ -89,15 +98,20 @@ static int cull; static int filter; static bool debug_on;
-int read_block(char *buf, int buf_size, FILE *fin) +static void set_single_cmp(int (*cmp)(const void *, const void *), int sign); + +int read_block(char *buf, char *ext_buf, int buf_size, FILE *fin) { char *curr = buf, *const buf_end = buf + buf_size;
while (buf_end - curr > 1 && fgets(curr, buf_end - curr, fin)) { - if (*curr == '\n') /* empty line */ + if (*curr == '\n') { /* empty line */ return curr - buf; - if (!strncmp(curr, "PFN", 3)) + } + if (!strncmp(curr, "PFN", 3)) { + strcpy(ext_buf, curr); continue; + } curr += strlen(curr); }
@@ -146,6 +160,13 @@ static int compare_tgid(const void *p1, const void *p2) return l1->tgid - l2->tgid; }
+static int compare_allocator(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return l1->allocator - l2->allocator; +} + static int compare_comm(const void *p1, const void *p2) { const struct block_list *l1 = p1, *l2 = p2; @@ -192,6 +213,8 @@ static int compare_cull_condition(const void *p1, const void *p2) return compare_comm(p1, p2); if ((cull & CULL_UNRELEASE) && compare_release(p1, p2)) return compare_release(p1, p2); + if ((cull & CULL_ALLOCATOR) && compare_allocator(p1, p2)) + return compare_allocator(p1, p2); return 0; }
@@ -395,11 +418,42 @@ static int get_arg_type(const char *arg) return ARG_FREE_TS; else if (!strcmp(arg, "alloc_ts") || !strcmp(arg, "at")) return ARG_ALLOC_TS; + else if (!strcmp(arg, "allocator") || !strcmp(arg, "ator")) + return ARG_ALLOCATOR; else { return ARG_UNKNOWN; } }
+static int get_allocator(const char *buf, const char *migrate_info) +{ + char *tmp, *first_line, *second_line; + int allocator = 0; + + if (strstr(migrate_info, "CMA")) + allocator |= ALLOCATOR_CMA; + if (strstr(migrate_info, "slab")) + allocator |= ALLOCATOR_SLAB; + tmp = strstr(buf, "__vmalloc_node_range"); + if (tmp) { + second_line = tmp; + while (*tmp != '\n') + tmp--; + tmp--; + while (*tmp != '\n') + tmp--; + first_line = ++tmp; + tmp = strstr(tmp, "alloc_pages"); + if (tmp) { + if (tmp && first_line <= tmp && tmp < second_line) + allocator |= ALLOCATOR_VMALLOC; + } + } + if (allocator == 0) + allocator = ALLOCATOR_OTHERS; + return allocator; +} + static bool match_num_list(int num, int *list, int list_size) { for (int i = 0; i < list_size; ++i) @@ -437,7 +491,7 @@ static bool is_need(char *buf) return true; }
-static void add_list(char *buf, int len) +static void add_list(char *buf, int len, char *ext_buf) { if (list_size != 0 && len == list[list_size-1].len && @@ -471,6 +525,7 @@ static void add_list(char *buf, int len) list[list_size].stacktrace++; list[list_size].ts_nsec = get_ts_nsec(buf); list[list_size].free_ts_nsec = get_free_ts_nsec(buf); + list[list_size].allocator = get_allocator(buf, ext_buf); list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size); @@ -496,12 +551,16 @@ static bool parse_cull_args(const char *arg_str) cull |= CULL_STACKTRACE; else if (arg_type == ARG_FREE) cull |= CULL_UNRELEASE; + else if (arg_type == ARG_ALLOCATOR) + cull |= CULL_ALLOCATOR; else { free_explode(args, size); return false; } } free_explode(args, size); + if (sc.size == 0) + set_single_cmp(compare_num, SORT_DESC); return true; }
@@ -556,6 +615,8 @@ static bool parse_sort_args(const char *arg_str) sc.cmps[i] = compare_free_ts; else if (arg_type == ARG_TXT) sc.cmps[i] = compare_txt; + else if (arg_type == ARG_ALLOCATOR) + sc.cmps[i] = compare_allocator; else { free_explode(args, size); sc.size = 0; @@ -588,6 +649,19 @@ static int *parse_nums_list(char *arg_str, int *list_size) return list; }
+static void print_allocator(FILE *out, int allocator) +{ + fprintf(out, "allocated by "); + if (allocator & ALLOCATOR_CMA) + fprintf(out, "CMA "); + if (allocator & ALLOCATOR_SLAB) + fprintf(out, "SLAB "); + if (allocator & ALLOCATOR_VMALLOC) + fprintf(out, "VMALLOC "); + if (allocator & ALLOCATOR_OTHERS) + fprintf(out, "OTHERS "); +} + #define BUF_SIZE (128 * 1024)
static void usage(void) @@ -614,8 +688,8 @@ static void usage(void) int main(int argc, char **argv) { FILE *fin, *fout; - char *buf; - int ret, i, count; + char *buf, *ext_buf; + int i, count; struct stat st; int opt; struct option longopts[] = { @@ -724,16 +798,18 @@ int main(int argc, char **argv)
list = malloc(max_size * sizeof(*list)); buf = malloc(BUF_SIZE); - if (!list || !buf) { + ext_buf = malloc(BUF_SIZE); + if (!list || !buf || !ext_buf) { fprintf(stderr, "Out of memory\n"); exit(1); }
for ( ; ; ) { - ret = read_block(buf, BUF_SIZE, fin); - if (ret < 0) + int buf_len = read_block(buf, ext_buf, BUF_SIZE, fin); + + if (buf_len < 0) break; - add_list(buf, ret); + add_list(buf, buf_len, ext_buf); }
printf("loaded %d\n", list_size); @@ -757,9 +833,11 @@ int main(int argc, char **argv) qsort(list, count, sizeof(list[0]), compare_sort_condition);
for (i = 0; i < count; i++) { - if (cull == 0) - fprintf(fout, "%d times, %d pages:\n%s\n", - list[i].num, list[i].page_num, list[i].txt); + if (cull == 0) { + fprintf(fout, "%d times, %d pages, ", list[i].num, list[i].page_num); + print_allocator(fout, list[i].allocator); + fprintf(fout, ":\n%s\n", list[i].txt); + } else { fprintf(fout, "%d times, %d pages", list[i].num, list[i].page_num); @@ -769,6 +847,10 @@ int main(int argc, char **argv) fprintf(fout, ", TGID %d", list[i].pid); if (cull & CULL_COMM || filter & FILTER_COMM) fprintf(fout, ", task_comm_name: %s", list[i].comm); + if (cull & CULL_ALLOCATOR) { + fprintf(fout, ", "); + print_allocator(fout, list[i].allocator); + } if (cull & CULL_UNRELEASE) fprintf(fout, " (%s)", list[i].free_ts_nsec ? "UNRELEASED" : "RELEASED");
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v5.19-rc1 commit c7c4ab859642830a14c45785ca7866659b65fc44 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
I noticed a detail that needs to be adjusted. When judging whether a page is allocated by vmalloc, the value of the variable "tmp" was repeatedly judged, so the code was adjusted.
This work is coauthored by Yinan Zhang, Jiajian Ye, Shenghong Han, Chongxi Zhao, Yuhong Feng and Yongqiang Liu.
Link: https://lkml.kernel.org/r/20220414042744.13896-1-caoyixuan2019@email.szu.edu... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Haowen Bai baihaowen@meizu.com Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Sean Anderson seanga2@gmail.com Cc: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index fa2e4d2a9d68..c149427eb1c9 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -444,10 +444,8 @@ static int get_allocator(const char *buf, const char *migrate_info) tmp--; first_line = ++tmp; tmp = strstr(tmp, "alloc_pages"); - if (tmp) { - if (tmp && first_line <= tmp && tmp < second_line) - allocator |= ALLOCATOR_VMALLOC; - } + if (tmp && first_line <= tmp && tmp < second_line) + allocator |= ALLOCATOR_VMALLOC; } if (allocator == 0) allocator = ALLOCATOR_OTHERS;
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v6.0-rc1 commit 9b7a4039d6856f66521486da68c76838929039eb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
I noticed one more indentation than necessary in is_need().
Link: https://lkml.kernel.org/r/20220717195506.7602-1-caoyixuan2019@email.szu.edu.... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index c149427eb1c9..69ef83ef7100 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -470,23 +470,23 @@ static bool match_str_list(const char *str, char **list, int list_size)
static bool is_need(char *buf) { - if ((filter & FILTER_UNRELEASE) && get_free_ts_nsec(buf) != 0) - return false; - if ((filter & FILTER_PID) && !match_num_list(get_pid(buf), fc.pids, fc.pids_size)) - return false; - if ((filter & FILTER_TGID) && - !match_num_list(get_tgid(buf), fc.tgids, fc.tgids_size)) - return false; - - char *comm = get_comm(buf); - - if ((filter & FILTER_COMM) && - !match_str_list(comm, fc.comms, fc.comms_size)) { - free(comm); - return false; - } + if ((filter & FILTER_UNRELEASE) && get_free_ts_nsec(buf) != 0) + return false; + if ((filter & FILTER_PID) && !match_num_list(get_pid(buf), fc.pids, fc.pids_size)) + return false; + if ((filter & FILTER_TGID) && + !match_num_list(get_tgid(buf), fc.tgids, fc.tgids_size)) + return false; + + char *comm = get_comm(buf); + + if ((filter & FILTER_COMM) && + !match_str_list(comm, fc.comms, fc.comms_size)) { free(comm); - return true; + return false; + } + free(comm); + return true; }
static void add_list(char *buf, int len, char *ext_buf)
From: Yixuan Cao caoyixuan2019@email.szu.edu.cn
mainline inclusion from mainline-v6.1-rc1 commit 57eb60c04d2c7b0de91eac2bc5d0331f8fe72fd7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The -f option is to filter out the information of blocks whose memory has not been released, I noticed some blocks should not be filtered out.
Commit 9cc7e96aa846 ("mm/page_owner: record timestamp and pid") records the allocation timestamp (ts_nsec) of all pages.
Commit 866b48526217 ("mm/page_owner: record the timestamp of all pages during free") records the free timestamp (free_ts_nsec) of all pages. When the page is allocated for the first time, the initial value of free_ts_nsec is 0, and the corresponding time will be obtained when the page is released. But during reallocation the free_ts_nsec will not reset to 0 again. In particular, when page migration occurs, these two timestamps will be the same.
Now page_owner_sort removes all text blocks whose free_ts_nsec is not 0 when using -f option. However, this way can only select pages allocated for the first time. If a freed page is reallocated, free_ts_nsec will be less than ts_nsec; if page migration occurs, the two timestamps will be equal. These cases should be considered as pages are not released.
So I fix the function is_need() to keep text blocks that meet the above two conditions when using -f option.
Link: https://lkml.kernel.org/r/20220812155515.30846-1-caoyixuan2019@email.szu.edu... Signed-off-by: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Liam Mark lmark@codeaurora.org Cc: Georgi Djakov georgi.djakov@linaro.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 69ef83ef7100..4252fd1d347d 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -470,7 +470,12 @@ static bool match_str_list(const char *str, char **list, int list_size)
static bool is_need(char *buf) { - if ((filter & FILTER_UNRELEASE) && get_free_ts_nsec(buf) != 0) + __u64 ts_nsec, free_ts_nsec; + + ts_nsec = get_ts_nsec(buf); + free_ts_nsec = get_free_ts_nsec(buf); + + if ((filter & FILTER_UNRELEASE) && free_ts_nsec != 0 && ts_nsec < free_ts_nsec) return false; if ((filter & FILTER_PID) && !match_num_list(get_pid(buf), fc.pids, fc.pids_size)) return false;
From: Jianlin Lv iecedge@gmail.com
mainline inclusion from mainline-v6.3-rc1 commit ef1faf0e370a8e33fe625088ddc5fde02cf8c4c4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Although when a process terminates, the kernel will removes memory associated with that process, It's neither good style nor proper design to leave it to kernel. This patch free allocated memory before process exit.
Link: https://lkml.kernel.org/r/20221219164917.14132-1-iecedge@gmail.com Signed-off-by: Jianlin Lv iecedge@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 65 ++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 20 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 4252fd1d347d..00b5856f0b2b 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -246,15 +246,16 @@ static int search_pattern(regex_t *pattern, char *pattern_str, char *buf) return 0; }
-static void check_regcomp(regex_t *pattern, const char *regex) +static bool check_regcomp(regex_t *pattern, const char *regex) { int err;
err = regcomp(pattern, regex, REG_EXTENDED | REG_NEWLINE); if (err != 0 || pattern->re_nsub != 1) { fprintf(stderr, "Invalid pattern %s code %d\n", regex, err); - exit(1); + return false; } + return true; }
static char **explode(char sep, const char *str, int *size) @@ -494,28 +495,28 @@ static bool is_need(char *buf) return true; }
-static void add_list(char *buf, int len, char *ext_buf) +static bool add_list(char *buf, int len, char *ext_buf) { if (list_size != 0 && len == list[list_size-1].len && memcmp(buf, list[list_size-1].txt, len) == 0) { list[list_size-1].num++; list[list_size-1].page_num += get_page_num(buf); - return; + return true; } if (list_size == max_size) { fprintf(stderr, "max_size too small??\n"); - exit(1); + return false; } if (!is_need(buf)) - return; + return true; list[list_size].pid = get_pid(buf); list[list_size].tgid = get_tgid(buf); list[list_size].comm = get_comm(buf); list[list_size].txt = malloc(len+1); if (!list[list_size].txt) { fprintf(stderr, "Out of memory\n"); - exit(1); + return false; } memcpy(list[list_size].txt, buf, len); list[list_size].txt[len] = 0; @@ -534,6 +535,7 @@ static void add_list(char *buf, int len, char *ext_buf) printf("loaded %d\r", list_size); fflush(stdout); } + return true; }
static bool parse_cull_args(const char *arg_str) @@ -790,12 +792,19 @@ int main(int argc, char **argv) exit(1); }
- check_regcomp(&order_pattern, "order\s*([0-9]*),"); - check_regcomp(&pid_pattern, "pid\s*([0-9]*),"); - check_regcomp(&tgid_pattern, "tgid\s*([0-9]*) "); - check_regcomp(&comm_pattern, "tgid\s*[0-9]*\s*\((.*)\),\s*ts"); - check_regcomp(&ts_nsec_pattern, "ts\s*([0-9]*)\s*ns,"); - check_regcomp(&free_ts_nsec_pattern, "free_ts\s*([0-9]*)\s*ns"); + if (!check_regcomp(&order_pattern, "order\s*([0-9]*),")) + goto out_order; + if (!check_regcomp(&pid_pattern, "pid\s*([0-9]*),")) + goto out_pid; + if (!check_regcomp(&tgid_pattern, "tgid\s*([0-9]*) ")) + goto out_tgid; + if (!check_regcomp(&comm_pattern, "tgid\s*[0-9]*\s*\((.*)\),\s*ts")) + goto out_comm; + if (!check_regcomp(&ts_nsec_pattern, "ts\s*([0-9]*)\s*ns,")) + goto out_ts; + if (!check_regcomp(&free_ts_nsec_pattern, "free_ts\s*([0-9]*)\s*ns")) + goto out_free_ts; + fstat(fileno(fin), &st); max_size = st.st_size / 100; /* hack ... */
@@ -804,7 +813,7 @@ int main(int argc, char **argv) ext_buf = malloc(BUF_SIZE); if (!list || !buf || !ext_buf) { fprintf(stderr, "Out of memory\n"); - exit(1); + goto out_free; }
for ( ; ; ) { @@ -812,7 +821,8 @@ int main(int argc, char **argv)
if (buf_len < 0) break; - add_list(buf, buf_len, ext_buf); + if (!add_list(buf, buf_len, ext_buf)) + goto out_free; }
printf("loaded %d\n", list_size); @@ -862,11 +872,26 @@ int main(int argc, char **argv) fprintf(fout, "\n"); } } - regfree(&order_pattern); - regfree(&pid_pattern); - regfree(&tgid_pattern); - regfree(&comm_pattern); - regfree(&ts_nsec_pattern); + +out_free: + if (ext_buf) + free(ext_buf); + if (buf) + free(buf); + if (list) + free(list); +out_free_ts: regfree(&free_ts_nsec_pattern); +out_ts: + regfree(&ts_nsec_pattern); +out_comm: + regfree(&comm_pattern); +out_tgid: + regfree(&tgid_pattern); +out_pid: + regfree(&pid_pattern); +out_order: + regfree(&order_pattern); + return 0; }
From: Steve Chou steve_chou@pesi.com.tw
mainline inclusion from mainline-v6.3 commit 9235756885e865070c4be2facda75262dbd85967 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When using cull option with 'tg' flag, the fprintf is using pid instead of tgid. It should use tgid instead.
Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw Fixes: 9c8a0a8e599f4a ("tools/vm/page_owner_sort.c: support for user-defined culling rules") Signed-off-by: Steve Chou steve_chou@pesi.com.tw Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: tools/mm/page_owner_sort.c [tools/vm is not renamed to tools/mm] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 00b5856f0b2b..0e1db628291c 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -857,7 +857,7 @@ int main(int argc, char **argv) if (cull & CULL_PID || filter & FILTER_PID) fprintf(fout, ", PID %d", list[i].pid); if (cull & CULL_TGID || filter & FILTER_TGID) - fprintf(fout, ", TGID %d", list[i].pid); + fprintf(fout, ", TGID %d", list[i].tgid); if (cull & CULL_COMM || filter & FILTER_COMM) fprintf(fout, ", task_comm_name: %s", list[i].comm); if (cull & CULL_ALLOCATOR) {
From: Akira Yokosawa akiyks@gmail.com
mainline inclusion from mainline-v5.19-rc1 commit d1ed51fcdbd69be3729f6e249b61cc73fb3b2dd8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
A semantic conflict between commit 5603f9bdea68 ("docs: vm/page_owner: use literal blocks for param description") and a change queued for v5.19 authored by Jiajian Ye ("tools/vm/page_owner_sort.c: support sorting blocks by multiple keys") results in a warning from "make htmldocs" saying:
[...]/vm/page_owner.rst:176: WARNING: Literal block expected; none found.
This is because a literal block in ReST ends at a line which has the same indent as the paragraph preceding it. In this case the one with no indent.
Indent the two "For --xxxx option:" lines by two columns and make the whole section a literal block.
While at it, fix indents by white spaces of "ator" keys.
Link: https://lkml.kernel.org/r/fdfecc82-d41e-6d8a-738d-4beb6faa27fb@gmail.com Signed-of-by: Akira Yokosawa akiyks@gmail.com Reported-by: Shenghong Han hanshenghong2019@email.szu.edu.cn Cc: Jiajian Ye yejiajian2018@email.szu.edu.cn Cc: Chongxi Zhao zhaochongxi2019@email.szu.edu.cn Cc: Yinan Zhang zhangyinan2019@email.szu.edu.cn Cc: Yixuan Cao caoyixuan2019@email.szu.edu.cn Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Yuhong Feng yuhongf@szu.edu.cn Cc: Haowen Bai baihaowen@meizu.com Cc: Jonathan Corbet corbet@lwn.net Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflicts: Documentation/vm/page_owner.rst [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 3a7f8dad9c1e..380e9162cd5a 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -173,7 +173,7 @@ STANDARD FORMAT SPECIFIERS ========================== ::
-For --sort option: + For --sort option:
KEY LONG DESCRIPTION p pid process ID @@ -183,9 +183,9 @@ For --sort option: T txt full text of block ft free_ts timestamp of the page when it was released at alloc_ts timestamp of the page when it was allocated - ator allocator memory allocator for pages + ator allocator memory allocator for pages
-For --curl option: + For --curl option:
KEY LONG DESCRIPTION p pid process ID @@ -193,4 +193,4 @@ For --curl option: n name task command name f free whether the page has been released or not st stacktrace stack trace of the page allocation - ator allocator memory allocator for pages + ator allocator memory allocator for pages
From: Chen Xiao abigwc@gmail.com
mainline inclusion from mainline-v6.3-rc1 commit e7951a3e0647f588e83627ab18110fda988a766e category: fbugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Fix several spelling mistakes in page_owner documentation.
Signed-off-by: Chen Xiao abigwc@gmail.com Reviewed-by: Randy Dunlap rdunlap@infradead.org Link: https://lore.kernel.org/r/1670479443-8484-1-git-send-email-abigwc@gmail.com Signed-off-by: Jonathan Corbet corbet@lwn.net
Conflicts: Documentation/mm/page_owner.rst [Documentation/vm isn't renamed to Documentation/mm] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 380e9162cd5a..5b9043fd9452 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -64,7 +64,7 @@ pages are investigated and marked as allocated in initialization phase. Although it doesn't mean that they have the right owner information, at least, we can tell whether the page is allocated or not, more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages -are catched and marked, although they are mostly allocated from struct +are caught and marked, although they are mostly allocated from struct page extension feature. Anyway, after that, no page is left in un-tracking state.
@@ -185,7 +185,7 @@ STANDARD FORMAT SPECIFIERS at alloc_ts timestamp of the page when it was allocated ator allocator memory allocator for pages
- For --curl option: + For --cull option:
KEY LONG DESCRIPTION p pid process ID
From: Jonathan Corbet corbet@lwn.net
mainline inclusion from mainline-v5.18-rc1 commit 18ab307823bb643fc985d316448f2d70eb1cb7c3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Commit f7df2b1cf03a ("tools/vm/page_owner_sort.c: count and sort by mem") added a literal text block without the necessary markup, leading to these warnings in the docs build:
Documentation/vm/page_owner.rst:92: WARNING: Unexpected indentation. Documentation/vm/page_owner.rst:96: WARNING: Unexpected indentation. Documentation/vm/page_owner.rst:107: WARNING: Unexpected indentation.
Add the necessary colons and make the build quieter.
Signed-off-by: Jonathan Corbet corbet@lwn.net
Conflicts: Documentation/vm/page_owner.rst [context conflicts] Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 5b9043fd9452..b5d71ebb2727 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -85,7 +85,7 @@ Usage cat /sys/kernel/debug/page_owner > page_owner_full.txt ./page_owner_sort page_owner_full.txt sorted_page_owner.txt
- The general output of ``page_owner_full.txt`` is as follows: + The general output of ``page_owner_full.txt`` is as follows::
Page allocated via order XXX, ... PFN XXX ... @@ -100,7 +100,7 @@ Usage and pages of buf, and finally sorts them according to the parameter(s).
See the result about who allocated each page - in the ``sorted_page_owner.txt``. General output: + in the ``sorted_page_owner.txt``. General output::
XXX times, XXX pages: Page allocated via order XXX, ...
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
Identify if the pages are allocated by modules according to stackstrace. Record the module name in struct page_owner and print it when a user reads from /sys/kernel/debug/page_owner.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/Kconfig.debug | 9 ++++++ mm/Makefile | 1 + mm/page_owner.c | 24 +++++++-------- mm/page_owner.h | 59 +++++++++++++++++++++++++++++++++++ mm/page_owner_module.c | 70 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 150 insertions(+), 13 deletions(-) create mode 100644 mm/page_owner.h create mode 100644 mm/page_owner_module.c
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 864f129f1937..154ece4e7fc5 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -62,6 +62,15 @@ config PAGE_OWNER
If unsure, say N.
+config PAGE_OWNER_MODULE_STAT + bool "Track module allocation with page owner" + depends on PAGE_OWNER && MODULES + help + This tracks if a page is allocated by modules, may help to find the + alloc_page(s) problem in modules. Even if you include this feature + on your build, it is disabled in default. You should pass "page_owner=on" + to boot parameter in order to enable it. + config PAGE_POISONING bool "Poison pages after freeing" select PAGE_POISONING_NO_SANITY if HIBERNATION diff --git a/mm/Makefile b/mm/Makefile index a014a5e08f7b..7194a39e2a90 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -101,6 +101,7 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o obj-$(CONFIG_PAGE_OWNER) += page_owner.o +obj-$(CONFIG_PAGE_OWNER_MODULE_STAT) += page_owner_module.o obj-$(CONFIG_CLEANCACHE) += cleancache.o obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o obj-$(CONFIG_ZPOOL) += zpool.o diff --git a/mm/page_owner.c b/mm/page_owner.c index 119f028ef559..3e3cf55d270e 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -13,6 +13,7 @@ #include <linux/memcontrol.h> #include <linux/sched/clock.h>
+#include "page_owner.h" #include "internal.h"
/* @@ -21,19 +22,6 @@ */ #define PAGE_OWNER_STACK_DEPTH (16)
-struct page_owner { - unsigned short order; - short last_migrate_reason; - gfp_t gfp_mask; - depot_stack_handle_t handle; - depot_stack_handle_t free_handle; - u64 ts_nsec; - u64 free_ts_nsec; - char comm[TASK_COMM_LEN]; - pid_t pid; - pid_t tgid; -}; - static bool page_owner_enabled = false; DEFINE_STATIC_KEY_FALSE(page_owner_inited);
@@ -144,8 +132,10 @@ void __reset_page_owner(struct page *page, unsigned int order) depot_stack_handle_t handle = 0; struct page_owner *page_owner; u64 free_ts_nsec = local_clock(); + char mod_name[MODULE_NAME_LEN] = {0};
handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); + po_find_module_name_with_update(handle, mod_name, MODULE_NAME_LEN);
page_ext = page_ext_get(page); if (unlikely(!page_ext)) @@ -155,6 +145,7 @@ void __reset_page_owner(struct page *page, unsigned int order) page_owner = get_page_owner(page_ext); page_owner->free_handle = handle; page_owner->free_ts_nsec = free_ts_nsec; + po_set_module_name(page_owner, mod_name); page_ext = page_ext_next(page_ext); } page_ext_put(page_ext); @@ -167,6 +158,9 @@ static inline void __set_page_owner_handle(struct page *page, struct page_owner *page_owner; int i; u64 ts_nsec = local_clock(); + char mod_name[MODULE_NAME_LEN] = {0}; + + po_find_module_name_with_update(handle, mod_name, MODULE_NAME_LEN);
for (i = 0; i < (1 << order); i++) { page_owner = get_page_owner(page_ext); @@ -179,6 +173,7 @@ static inline void __set_page_owner_handle(struct page *page, page_owner->ts_nsec = ts_nsec; strscpy(page_owner->comm, current->comm, sizeof(page_owner->comm)); + po_set_module_name(page_owner, mod_name); __set_bit(PAGE_EXT_OWNER, &page_ext->flags); __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
@@ -260,6 +255,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage) new_page_owner->ts_nsec = old_page_owner->ts_nsec; new_page_owner->free_ts_nsec = old_page_owner->ts_nsec; strcpy(new_page_owner->comm, old_page_owner->comm); + po_copy_module_name(new_page_owner, old_page_owner);
/* * We don't clear the bit on the oldpage as it's going to be freed @@ -434,6 +430,8 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, migratetype_names[pageblock_mt], page->flags, &page->flags);
+ ret += po_module_name_snprint(page_owner, kbuf + ret, count - ret); + nr_entries = stack_depot_fetch(handle, &entries); ret += stack_trace_snprint(kbuf + ret, count - ret, entries, nr_entries, 0); if (ret >= count) diff --git a/mm/page_owner.h b/mm/page_owner.h new file mode 100644 index 000000000000..18ea05e999e7 --- /dev/null +++ b/mm/page_owner.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved. + */ + +#ifndef __MM_PAGE_OWNER_H +#define __MM_PAGE_OWNER_H + +#include <linux/stackdepot.h> + +struct page_owner { + unsigned short order; + short last_migrate_reason; + gfp_t gfp_mask; + depot_stack_handle_t handle; + depot_stack_handle_t free_handle; + u64 ts_nsec; + u64 free_ts_nsec; + char comm[TASK_COMM_LEN]; + pid_t pid; + pid_t tgid; +#ifdef CONFIG_PAGE_OWNER_MODULE_STAT + char module_name[MODULE_NAME_LEN]; +#endif +}; + +#ifdef CONFIG_PAGE_OWNER_MODULE_STAT +void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, size_t size); +void po_set_module_name(struct page_owner *page_owner, char *mod_name); +int po_module_name_snprint(struct page_owner *page_owner, char *kbuf, size_t size); + +static inline void po_copy_module_name(struct page_owner *dst, + struct page_owner *src) +{ + po_set_module_name(dst, src->module_name); +} + +#else +static void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, + size_t size) +{ +} + +static void po_set_module_name(struct page_owner *page_owner, char *mod_name) +{ +} + +static inline int po_module_name_snprint(struct page_owner *page_owner, + char *kbuf, size_t size) +{ + return 0; +} + +static inline void po_copy_module_name(struct page_owner *dst, struct page_owner *src) +{ +} +#endif + +#endif diff --git a/mm/page_owner_module.c b/mm/page_owner_module.c new file mode 100644 index 000000000000..2a2becf975da --- /dev/null +++ b/mm/page_owner_module.c @@ -0,0 +1,70 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved. + * + * page_owner_module core file + */ + +#include <linux/module.h> +#include <linux/debugfs.h> +#include <linux/ctype.h> + +#include "page_owner.h" + +void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, size_t size) +{ + int i; + struct module *mod = NULL; + unsigned long *entries; + unsigned int nr_entries; + + if (unlikely(!mod_name)) + return; + + nr_entries = stack_depot_fetch(handle, &entries); + if (!in_task()) + nr_entries = filter_irq_stacks(entries, nr_entries); + for (i = 0; i < nr_entries; i++) { + if (core_kernel_text(entries[i])) + continue; + + preempt_disable(); + mod = __module_address(entries[i]); + preempt_enable(); + + if (!mod) + continue; + + strscpy(mod_name, mod->name, size); + return; + } +} + +void po_set_module_name(struct page_owner *page_owner, char *mod_name) +{ + if (unlikely(!page_owner || !mod_name)) + return; + + if (strlen(mod_name) != 0) + strscpy(page_owner->module_name, mod_name, MODULE_NAME_LEN); + else + memset(page_owner->module_name, 0, MODULE_NAME_LEN); +} + +static inline bool po_is_module(struct page_owner *page_owner) +{ + return strlen(page_owner->module_name) != 0; +} + +int po_module_name_snprint(struct page_owner *page_owner, + char *kbuf, size_t size) +{ + if (unlikely(!page_owner || !kbuf)) + return 0; + + if (po_is_module(page_owner)) + return scnprintf(kbuf, size, "Page allocated by module %s\n", + page_owner->module_name); + + return 0; +}
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
Add a new debufgs interface page_owner_filter, to filter pages not allocated by modules. This will save some space and time when uses only care pages allocated by modules.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 4 +++ mm/page_owner.h | 11 +++++++ mm/page_owner_module.c | 68 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 83 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c index 3e3cf55d270e..c060ec8585da 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -604,6 +604,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) if (!handle) goto ext_put_continue;
+ if (po_is_filtered(page_owner)) + goto ext_put_continue; + /* Record the next PFN to read in the file offset */ *ppos = (pfn - min_low_pfn) + 1;
@@ -727,6 +730,7 @@ static int __init pageowner_init(void) debugfs_create_file("page_owner", 0400, NULL, NULL, &proc_page_owner_operations);
+ po_module_stat_init(); return 0; } late_initcall(pageowner_init) diff --git a/mm/page_owner.h b/mm/page_owner.h index 18ea05e999e7..c516de29ba6a 100644 --- a/mm/page_owner.h +++ b/mm/page_owner.h @@ -28,6 +28,8 @@ struct page_owner { void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, size_t size); void po_set_module_name(struct page_owner *page_owner, char *mod_name); int po_module_name_snprint(struct page_owner *page_owner, char *kbuf, size_t size); +void po_module_stat_init(void); +bool po_is_filtered(struct page_owner *page_owner);
static inline void po_copy_module_name(struct page_owner *dst, struct page_owner *src) @@ -54,6 +56,15 @@ static inline int po_module_name_snprint(struct page_owner *page_owner, static inline void po_copy_module_name(struct page_owner *dst, struct page_owner *src) { } + +static inline void po_module_stat_init(void) +{ +} + +static inline bool po_is_filtered(struct page_owner *page_owner) +{ + return false; +} #endif
#endif diff --git a/mm/page_owner_module.c b/mm/page_owner_module.c index 2a2becf975da..cc9150d5b6ff 100644 --- a/mm/page_owner_module.c +++ b/mm/page_owner_module.c @@ -11,6 +11,12 @@
#include "page_owner.h"
+#define PAGE_OWNER_FILTER_BUF_SIZE 16 +#define PAGE_OWNER_NONE_FILTER 0 +#define PAGE_OWNER_MODULE_FILTER 1 + +static unsigned int page_owner_filter = PAGE_OWNER_NONE_FILTER; + void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, size_t size) { int i; @@ -68,3 +74,65 @@ int po_module_name_snprint(struct page_owner *page_owner,
return 0; } + +static ssize_t read_page_owner_filter(struct file *file, + char __user *user_buf, size_t count, loff_t *ppos) +{ + char kbuf[PAGE_OWNER_FILTER_BUF_SIZE]; + int kcount; + + if (page_owner_filter & PAGE_OWNER_MODULE_FILTER) + kcount = snprintf(kbuf, sizeof(kbuf), "module\n"); + else + kcount = snprintf(kbuf, sizeof(kbuf), "none\n"); + + return simple_read_from_buffer(user_buf, count, ppos, kbuf, kcount); +} + +static ssize_t write_page_owner_filter(struct file *file, + const char __user *user_buf, size_t count, loff_t *ppos) +{ + char kbuf[PAGE_OWNER_FILTER_BUF_SIZE]; + char *p_kbuf; + size_t kbuf_size; + + kbuf_size = min(count, sizeof(kbuf) - 1); + if (copy_from_user(kbuf, user_buf, kbuf_size)) + return -EFAULT; + + kbuf[kbuf_size] = '\0'; + p_kbuf = strstrip(kbuf); + + if (!strcmp(p_kbuf, "module")) + page_owner_filter = PAGE_OWNER_MODULE_FILTER; + else if (!strcmp(p_kbuf, "none")) + page_owner_filter = PAGE_OWNER_NONE_FILTER; + else + return -EINVAL; + + return count; +} + +static const struct file_operations page_owner_filter_ops = { + .read = read_page_owner_filter, + .write = write_page_owner_filter, + .llseek = default_llseek, +}; + +bool po_is_filtered(struct page_owner *page_owner) +{ + if (unlikely(!page_owner)) + return false; + + if (page_owner_filter & PAGE_OWNER_MODULE_FILTER && + !po_is_module(page_owner)) + return true; + + return false; +} + +void po_module_stat_init(void) +{ + debugfs_create_file("page_owner_filter", 0600, NULL, NULL, + &page_owner_filter_ops); +}
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
This patch records the pages allocated by each module and dump the topN modules that allocate the most pages. The dump info is as follows:
-------------------------------- Top modules allocating pages: Module allocate_test_oom allocated 9664116 pages Module ext4 allocated 218200 pages Module zram allocated 18268 pages Module jbd2 allocated 151 pages Module allocate_test2 allocated 32 pages --------------------------------
This patch maintains a separate linked list to record the page num allocated by each module. The ListNode is allocated/freed on module load/unload.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner.c | 4 +- mm/page_owner.h | 5 +- mm/page_owner_module.c | 160 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 164 insertions(+), 5 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c index c060ec8585da..ed948099c7fd 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -135,7 +135,7 @@ void __reset_page_owner(struct page *page, unsigned int order) char mod_name[MODULE_NAME_LEN] = {0};
handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); - po_find_module_name_with_update(handle, mod_name, MODULE_NAME_LEN); + po_find_module_name_with_update(handle, mod_name, MODULE_NAME_LEN, -(1 << order));
page_ext = page_ext_get(page); if (unlikely(!page_ext)) @@ -160,7 +160,7 @@ static inline void __set_page_owner_handle(struct page *page, u64 ts_nsec = local_clock(); char mod_name[MODULE_NAME_LEN] = {0};
- po_find_module_name_with_update(handle, mod_name, MODULE_NAME_LEN); + po_find_module_name_with_update(handle, mod_name, MODULE_NAME_LEN, 1 << order);
for (i = 0; i < (1 << order); i++) { page_owner = get_page_owner(page_ext); diff --git a/mm/page_owner.h b/mm/page_owner.h index c516de29ba6a..a8517d38d3de 100644 --- a/mm/page_owner.h +++ b/mm/page_owner.h @@ -25,7 +25,8 @@ struct page_owner { };
#ifdef CONFIG_PAGE_OWNER_MODULE_STAT -void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, size_t size); +void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, + size_t size, long nr_pages); void po_set_module_name(struct page_owner *page_owner, char *mod_name); int po_module_name_snprint(struct page_owner *page_owner, char *kbuf, size_t size); void po_module_stat_init(void); @@ -39,7 +40,7 @@ static inline void po_copy_module_name(struct page_owner *dst,
#else static void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, - size_t size) + size_t size, long nr_pages) { }
diff --git a/mm/page_owner_module.c b/mm/page_owner_module.c index cc9150d5b6ff..b64cfb4a1671 100644 --- a/mm/page_owner_module.c +++ b/mm/page_owner_module.c @@ -8,6 +8,9 @@ #include <linux/module.h> #include <linux/debugfs.h> #include <linux/ctype.h> +#include <linux/list_sort.h> +#include <linux/oom.h> +#include <linux/slab.h>
#include "page_owner.h"
@@ -15,9 +18,65 @@ #define PAGE_OWNER_NONE_FILTER 0 #define PAGE_OWNER_MODULE_FILTER 1
+#define PO_MODULE_DEFAULT_TOPN 20 + static unsigned int page_owner_filter = PAGE_OWNER_NONE_FILTER;
-void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, size_t size) +struct po_module { + struct list_head list; + struct module *mod; + long nr_pages_used; +}; + +LIST_HEAD(po_module_list); +DEFINE_SPINLOCK(po_module_list_lock); + +static unsigned int po_module_topn = PO_MODULE_DEFAULT_TOPN; + +static int po_module_cmp(void *priv, const struct list_head *h1, + const struct list_head *h2) +{ + struct po_module *lhs, *rhs; + + lhs = container_of(h1, struct po_module, list); + rhs = container_of(h2, struct po_module, list); + + return lhs->nr_pages_used < rhs->nr_pages_used; +} + +static inline struct po_module *po_find_module(const struct module *mod) +{ + struct po_module *po_mod; + + lockdep_assert_held(&po_module_list_lock); + list_for_each_entry(po_mod, &po_module_list, list) { + if (po_mod->mod == mod) + return po_mod; + } + + pr_warn("page_owner_module: failed to find module %s in po_module list\n", + mod->name); + return NULL; +} + +void po_update_module_pages(const struct module *mod, long nr_pages) +{ + struct po_module *po_mod; + unsigned long flags; + + if (unlikely(!mod)) + return; + + spin_lock_irqsave(&po_module_list_lock, flags); + po_mod = po_find_module(mod); + if (po_mod) + po_mod->nr_pages_used += nr_pages; + spin_unlock_irqrestore(&po_module_list_lock, flags); +} + + +void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name, + size_t size, long nr_pages) { int i; struct module *mod = NULL; @@ -42,6 +101,7 @@ void po_find_module_name_with_update(depot_stack_handle_t handle, char *mod_name continue;
strscpy(mod_name, mod->name, size); + po_update_module_pages(mod, nr_pages); return; } } @@ -131,8 +191,106 @@ bool po_is_filtered(struct page_owner *page_owner) return false; }
+static int po_module_coming(struct module *mod) +{ + struct po_module *po_mod; + unsigned long flags; + + po_mod = kmalloc(sizeof(*po_mod), GFP_KERNEL); + if (!po_mod) + return -ENOMEM; + + po_mod->nr_pages_used = 0; + po_mod->mod = mod; + INIT_LIST_HEAD(&po_mod->list); + spin_lock_irqsave(&po_module_list_lock, flags); + list_add_tail(&po_mod->list, &po_module_list); + spin_unlock_irqrestore(&po_module_list_lock, flags); + + return 0; +} + +static void po_module_going(struct module *mod) +{ + struct po_module *po_mod; + unsigned long flags; + + spin_lock_irqsave(&po_module_list_lock, flags); + po_mod = po_find_module(mod); + list_del(&po_mod->list); + spin_unlock_irqrestore(&po_module_list_lock, flags); + kfree(po_mod); +} + +static int po_module_notify(struct notifier_block *self, + unsigned long val, void *data) +{ + struct module *mod = data; + int ret = 0; + + switch (val) { + case MODULE_STATE_COMING: + ret = po_module_coming(mod); + break; + case MODULE_STATE_GOING: + po_module_going(mod); + break; + } + + return notifier_from_errno(ret); +} + +static struct notifier_block po_module_nb = { + .notifier_call = po_module_notify, + .priority = 0 +}; + +static int po_oom_notify(struct notifier_block *self, + unsigned long val, void *data) +{ + struct po_module *po_mod; + unsigned long flags; + unsigned int nr = po_module_topn; + int ret = notifier_from_errno(0); + + if (!nr) + return ret; + + spin_lock_irqsave(&po_module_list_lock, flags); + list_sort(NULL, &po_module_list, po_module_cmp); + pr_info("Top modules allocating pages:\n"); + list_for_each_entry(po_mod, &po_module_list, list) { + pr_info("\tModule %s allocated %ld pages\n", po_mod->mod->name, + po_mod->nr_pages_used); + --nr; + if (!nr) + break; + } + spin_unlock_irqrestore(&po_module_list_lock, flags); + + return ret; +} + +static struct notifier_block po_oom_nb = { + .notifier_call = po_oom_notify, + .priority = 0 +}; + void po_module_stat_init(void) { + int ret; + debugfs_create_file("page_owner_filter", 0600, NULL, NULL, &page_owner_filter_ops); + + ret = register_module_notifier(&po_module_nb); + if (ret) { + pr_warn("Failed to register page owner module enter notifier\n"); + return; + } + + ret = register_oom_notifier(&po_oom_nb); + if (ret) + pr_warn("Failed to register page owner oom notifier\n"); + }
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
This patch adds a new debugfs interface page_owner_module_show_max to allow users to configure how much modules are dumped when oom occurs.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner_module.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/mm/page_owner_module.c b/mm/page_owner_module.c index b64cfb4a1671..a8a3066b02d7 100644 --- a/mm/page_owner_module.c +++ b/mm/page_owner_module.c @@ -276,6 +276,21 @@ static struct notifier_block po_oom_nb = { .priority = 0 };
+static int po_module_topn_set(void *data, u64 val) +{ + po_module_topn = val; + return 0; +} + +static int po_module_topn_get(void *data, u64 *val) +{ + *val = po_module_topn; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(po_module_topn_fops, po_module_topn_get, + po_module_topn_set, "%llu\n"); + void po_module_stat_init(void) { int ret; @@ -293,4 +308,5 @@ void po_module_stat_init(void) if (ret) pr_warn("Failed to register page owner oom notifier\n");
+ debugfs_create_file("page_owner_module_show_max", 0600, NULL, NULL, &po_module_topn_fops); }
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
This patch add a new debugfs interface page_owner_module_stats to allow users to get the module statistics even when oom does't occur.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner_module.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/mm/page_owner_module.c b/mm/page_owner_module.c index a8a3066b02d7..c184b103f6ae 100644 --- a/mm/page_owner_module.c +++ b/mm/page_owner_module.c @@ -291,6 +291,30 @@ static int po_module_topn_get(void *data, u64 *val) DEFINE_SIMPLE_ATTRIBUTE(po_module_topn_fops, po_module_topn_get, po_module_topn_set, "%llu\n");
+static int page_owner_module_stats_show(struct seq_file *m, void *v) +{ + struct po_module *po_mod; + unsigned long flags; + unsigned int nr = po_module_topn; + + if (!nr) + return 0; + + spin_lock_irqsave(&po_module_list_lock, flags); + list_sort(NULL, &po_module_list, po_module_cmp); + list_for_each_entry(po_mod, &po_module_list, list) { + seq_printf(m, "%s %ld\n", po_mod->mod->name, + po_mod->nr_pages_used); + --nr; + if (!nr) + break; + } + spin_unlock_irqrestore(&po_module_list_lock, flags); + return 0; +} +DEFINE_SHOW_ATTRIBUTE(page_owner_module_stats); + + void po_module_stat_init(void) { int ret; @@ -309,4 +333,6 @@ void po_module_stat_init(void) pr_warn("Failed to register page owner oom notifier\n");
debugfs_create_file("page_owner_module_show_max", 0600, NULL, NULL, &po_module_topn_fops); + debugfs_create_file("page_owner_module_stats", 0400, NULL, NULL, + &page_owner_module_stats_fops); }
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
When a module is unloaded, if it doesn't free all pages, a memory leak occurs. Move the po_module node to leaked_po_module_list, and report the leak when oom occurs and users read page_owner_module_stats interface.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- mm/page_owner_module.c | 100 +++++++++++++++++++++++++++++++++-------- 1 file changed, 82 insertions(+), 18 deletions(-)
diff --git a/mm/page_owner_module.c b/mm/page_owner_module.c index c184b103f6ae..cb878bd3ddcb 100644 --- a/mm/page_owner_module.c +++ b/mm/page_owner_module.c @@ -11,6 +11,7 @@ #include <linux/list_sort.h> #include <linux/oom.h> #include <linux/slab.h> +#include <linux/sched/clock.h>
#include "page_owner.h"
@@ -28,7 +29,15 @@ struct po_module { long nr_pages_used; };
+struct leaked_po_module { + struct list_head list; + char module_name[MODULE_NAME_LEN]; + long nr_pages_used; + u64 unload_ns; +}; + LIST_HEAD(po_module_list); +LIST_HEAD(leaked_po_module_list); DEFINE_SPINLOCK(po_module_list_lock);
static unsigned int po_module_topn = PO_MODULE_DEFAULT_TOPN; @@ -210,6 +219,24 @@ static int po_module_coming(struct module *mod) return 0; }
+static void create_leaked_node(struct po_module *po_mod) +{ + struct leaked_po_module *leaked_po_mod; + unsigned long flags; + + leaked_po_mod = kmalloc(sizeof(struct leaked_po_module), GFP_KERNEL); + if (!leaked_po_mod) + return; + + leaked_po_mod->unload_ns = local_clock(); + strscpy(leaked_po_mod->module_name, po_mod->mod->name, MODULE_NAME_LEN); + leaked_po_mod->nr_pages_used = po_mod->nr_pages_used; + INIT_LIST_HEAD(&leaked_po_mod->list); + spin_lock_irqsave(&po_module_list_lock, flags); + list_add_tail(&leaked_po_mod->list, &leaked_po_module_list); + spin_unlock_irqrestore(&po_module_list_lock, flags); +} + static void po_module_going(struct module *mod) { struct po_module *po_mod; @@ -219,6 +246,10 @@ static void po_module_going(struct module *mod) po_mod = po_find_module(mod); list_del(&po_mod->list); spin_unlock_irqrestore(&po_module_list_lock, flags); + + if (unlikely(po_mod->nr_pages_used)) + create_leaked_node(po_mod); + kfree(po_mod); }
@@ -245,10 +276,52 @@ static struct notifier_block po_module_nb = { .priority = 0 };
+static void print_list(unsigned int nr, struct seq_file *m) +{ + struct po_module *po_mod; + + lockdep_assert_held(&po_module_list_lock); + + if (list_empty(&po_module_list)) + return; + + list_sort(NULL, &po_module_list, po_module_cmp); + list_for_each_entry(po_mod, &po_module_list, list) { + if (m) + seq_printf(m, "%s %ld\n", po_mod->mod->name, + po_mod->nr_pages_used); + else + pr_info("\tModule %s allocated %ld pages\n", + po_mod->mod->name, po_mod->nr_pages_used); + --nr; + if (!nr) + break; + } +} + +static void print_leaked_list(struct seq_file *m) +{ + struct leaked_po_module *leaked_po_mod; + + lockdep_assert_held(&po_module_list_lock); + + if (list_empty(&leaked_po_module_list)) + return; + + list_for_each_entry(leaked_po_mod, &leaked_po_module_list, list) { + if (m) + seq_printf(m, "[unloaded %llu]%s %ld\n", leaked_po_mod->unload_ns, + leaked_po_mod->module_name, leaked_po_mod->nr_pages_used); + else + pr_info("\t[unloaded %llu]Module %s allocated %ld pages\n", + leaked_po_mod->unload_ns, leaked_po_mod->module_name, + leaked_po_mod->nr_pages_used); + } +} + static int po_oom_notify(struct notifier_block *self, unsigned long val, void *data) { - struct po_module *po_mod; unsigned long flags; unsigned int nr = po_module_topn; int ret = notifier_from_errno(0); @@ -257,15 +330,11 @@ static int po_oom_notify(struct notifier_block *self, return ret;
spin_lock_irqsave(&po_module_list_lock, flags); - list_sort(NULL, &po_module_list, po_module_cmp); pr_info("Top modules allocating pages:\n"); - list_for_each_entry(po_mod, &po_module_list, list) { - pr_info("\tModule %s allocated %ld pages\n", po_mod->mod->name, - po_mod->nr_pages_used); - --nr; - if (!nr) - break; - } + + print_list(nr, NULL); + print_leaked_list(NULL); + spin_unlock_irqrestore(&po_module_list_lock, flags);
return ret; @@ -293,7 +362,6 @@ DEFINE_SIMPLE_ATTRIBUTE(po_module_topn_fops, po_module_topn_get,
static int page_owner_module_stats_show(struct seq_file *m, void *v) { - struct po_module *po_mod; unsigned long flags; unsigned int nr = po_module_topn;
@@ -301,14 +369,10 @@ static int page_owner_module_stats_show(struct seq_file *m, void *v) return 0;
spin_lock_irqsave(&po_module_list_lock, flags); - list_sort(NULL, &po_module_list, po_module_cmp); - list_for_each_entry(po_mod, &po_module_list, list) { - seq_printf(m, "%s %ld\n", po_mod->mod->name, - po_mod->nr_pages_used); - --nr; - if (!nr) - break; - } + + print_list(nr, m); + print_leaked_list(m); + spin_unlock_irqrestore(&po_module_list_lock, flags); return 0; }
Offering: HULK hulk inclusion category: doc bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
Update document to explain features introduced by CONFIG_PAGE_OWNER_MODULE_STAT.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index b5d71ebb2727..a73d1ad6db04 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -68,6 +68,18 @@ are caught and marked, although they are mostly allocated from struct page extension feature. Anyway, after that, no page is left in un-tracking state.
+With CONFIG_PAGE_OWNER_MODULE_STAT config, page owner is able to track if +the pages are allocated by modules. If a page is allocated by a module, the +information dumped from /sys/kernel/debug/page_owner will show the module +name. Users can use the user-space helper to analyze the allocation situation +of modules. /sys/kernel/debug/page_owner_filter can be used to filter out the pages +that are not allocated by modules. The legal value is "module" or "none". The default value +is "none", which means do not filter out any page. + +Besides, the top N modules that allocate the most pages will be dumped +when oom occurs or users read /sys/kernel/debug/page_owner_module_stats. The N value +can be configured with /sys/kernel/debug/page_owner_show_max. The default N is 20. + Usage =====
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
Add -M and --module options to support selecting modules. -M option allows to filtering all the items except the ones allcating by modules. --module option allows to only select one or more modules. For example, --module=mod1,mod2 will only select items allocating by mod1 or mod2.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 84 +++++++++++++++++++++++++++++++++----- 1 file changed, 73 insertions(+), 11 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 0e1db628291c..884cc60be1ab 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -27,6 +27,7 @@ #define true 1 #define false 0 #define TASK_COMM_LEN 16 +#define MODULE_NAME_LEN (64 - sizeof(unsigned long))
struct block_list { char *txt; @@ -40,12 +41,14 @@ struct block_list { pid_t pid; pid_t tgid; int allocator; + char *module; }; enum FILTER_BIT { FILTER_UNRELEASE = 1<<1, FILTER_PID = 1<<2, FILTER_TGID = 1<<3, - FILTER_COMM = 1<<4 + FILTER_COMM = 1<<4, + FILTER_MODULE = 1<<5 }; enum CULL_BIT { CULL_UNRELEASE = 1<<1, @@ -74,9 +77,11 @@ struct filter_condition { pid_t *pids; pid_t *tgids; char **comms; + char **modules; int pids_size; int tgids_size; int comms_size; + int modules_size; }; struct sort_condition { int (**cmps)(const void *, const void *); @@ -91,6 +96,7 @@ static regex_t tgid_pattern; static regex_t comm_pattern; static regex_t ts_nsec_pattern; static regex_t free_ts_nsec_pattern; +static regex_t module_pattern; static struct block_list *list; static int list_size; static int max_size; @@ -100,10 +106,12 @@ static bool debug_on;
static void set_single_cmp(int (*cmp)(const void *, const void *), int sign);
-int read_block(char *buf, char *ext_buf, int buf_size, FILE *fin) +int read_block(char *buf, char *ext_buf, char *mod_buf, int buf_size, FILE *fin) { char *curr = buf, *const buf_end = buf + buf_size; + char *mod_string = "Page allocated by module";
+ mod_buf[0] = '\0'; while (buf_end - curr > 1 && fgets(curr, buf_end - curr, fin)) { if (*curr == '\n') { /* empty line */ return curr - buf; @@ -112,6 +120,10 @@ int read_block(char *buf, char *ext_buf, int buf_size, FILE *fin) strcpy(ext_buf, curr); continue; } + if (!strncmp(curr, mod_string, strlen(mod_string))) { + strcpy(mod_buf, curr); + continue; + } curr += strlen(curr); }
@@ -401,6 +413,16 @@ static char *get_comm(char *buf) return comm_str; }
+static char *get_module(char *buf) +{ + char *mod = malloc(MODULE_NAME_LEN); + + memset(mod, 0, MODULE_NAME_LEN); + search_pattern(&module_pattern, mod, buf); + + return mod; +} + static int get_arg_type(const char *arg) { if (!strcmp(arg, "pid") || !strcmp(arg, "p")) @@ -469,7 +491,24 @@ static bool match_str_list(const char *str, char **list, int list_size) return false; }
-static bool is_need(char *buf) +static bool is_module_filtered(char *mod_buf) +{ + char *mod = get_module(mod_buf); + int ret = true; + + if (!strlen(mod)) + goto out; + + if (fc.modules_size == 0 || + match_str_list(mod, fc.modules, fc.modules_size)) + ret = false; + +out: + free(mod); + return ret; +} + +static bool is_need(char *buf, char *mod_buf) { __u64 ts_nsec, free_ts_nsec;
@@ -484,6 +523,9 @@ static bool is_need(char *buf) !match_num_list(get_tgid(buf), fc.tgids, fc.tgids_size)) return false;
+ if ((filter & FILTER_MODULE) && is_module_filtered(mod_buf)) + return false; + char *comm = get_comm(buf);
if ((filter & FILTER_COMM) && @@ -495,7 +537,7 @@ static bool is_need(char *buf) return true; }
-static bool add_list(char *buf, int len, char *ext_buf) +static bool add_list(char *buf, int len, char *ext_buf, char *mod_buf) { if (list_size != 0 && len == list[list_size-1].len && @@ -508,7 +550,7 @@ static bool add_list(char *buf, int len, char *ext_buf) fprintf(stderr, "max_size too small??\n"); return false; } - if (!is_need(buf)) + if (!is_need(buf, mod_buf)) return true; list[list_size].pid = get_pid(buf); list[list_size].tgid = get_tgid(buf); @@ -530,6 +572,7 @@ static bool add_list(char *buf, int len, char *ext_buf) list[list_size].ts_nsec = get_ts_nsec(buf); list[list_size].free_ts_nsec = get_free_ts_nsec(buf); list[list_size].allocator = get_allocator(buf, ext_buf); + list[list_size].module = get_module(mod_buf); list_size++; if (list_size % 1000 == 0) { printf("loaded %d\r", list_size); @@ -681,19 +724,21 @@ static void usage(void) "-a\t\tSort by memory allocate time.\n" "-r\t\tSort by memory release time.\n" "-f\t\tFilter out the information of blocks whose memory has been released.\n" + "-M\t\tFilter out the information of blocks whose memory isn't allocated by modules.\n" "-d\t\tPrint debug information.\n" "--pid <pidlist>\tSelect by pid. This selects the information of blocks whose process ID numbers appear in <pidlist>.\n" "--tgid <tgidlist>\tSelect by tgid. This selects the information of blocks whose Thread Group ID numbers appear in <tgidlist>.\n" "--name <cmdlist>\n\t\tSelect by command name. This selects the information of blocks whose command name appears in <cmdlist>.\n" "--cull <rules>\tCull by user-defined rules.<rules> is a single argument in the form of a comma-separated list with some common fields predefined\n" "--sort <order>\tSpecify sort order as: [+|-]key[,[+|-]key[,...]]\n" + "--module <modulelist>\tSelect by module. This selects the information of blocks whose memory is allocated by modules appear in <modulelist>.\n" ); }
int main(int argc, char **argv) { FILE *fin, *fout; - char *buf, *ext_buf; + char *buf, *ext_buf, *mod_buf; int i, count; struct stat st; int opt; @@ -703,10 +748,11 @@ int main(int argc, char **argv) { "name", required_argument, NULL, 3 }, { "cull", required_argument, NULL, 4 }, { "sort", required_argument, NULL, 5 }, + { "module", required_argument, NULL, 6 }, { 0, 0, 0, 0}, };
- while ((opt = getopt_long(argc, argv, "adfmnprstP", longopts, NULL)) != -1) + while ((opt = getopt_long(argc, argv, "adfmnprstPM", longopts, NULL)) != -1) switch (opt) { case 'a': set_single_cmp(compare_ts, SORT_ASC); @@ -738,6 +784,11 @@ int main(int argc, char **argv) case 'n': set_single_cmp(compare_comm, SORT_ASC); break; + case 'M': + filter = filter | FILTER_MODULE; + fc.modules_size = 0; + fc.modules = NULL; + break; case 1: filter = filter | FILTER_PID; fc.pids = parse_nums_list(optarg, &fc.pids_size); @@ -774,6 +825,10 @@ int main(int argc, char **argv) exit(1); } break; + case 6: + filter = filter | FILTER_MODULE; + fc.modules = explode(',', optarg, &fc.modules_size); + break; default: usage(); exit(1); @@ -804,6 +859,8 @@ int main(int argc, char **argv) goto out_ts; if (!check_regcomp(&free_ts_nsec_pattern, "free_ts\s*([0-9]*)\s*ns")) goto out_free_ts; + if (!check_regcomp(&module_pattern, "Page allocated by module (.*)")) + goto out_module;
fstat(fileno(fin), &st); max_size = st.st_size / 100; /* hack ... */ @@ -811,17 +868,18 @@ int main(int argc, char **argv) list = malloc(max_size * sizeof(*list)); buf = malloc(BUF_SIZE); ext_buf = malloc(BUF_SIZE); - if (!list || !buf || !ext_buf) { + mod_buf = malloc(BUF_SIZE); + if (!list || !buf || !ext_buf || !mod_buf) { fprintf(stderr, "Out of memory\n"); goto out_free; }
for ( ; ; ) { - int buf_len = read_block(buf, ext_buf, BUF_SIZE, fin); + int buf_len = read_block(buf, ext_buf, mod_buf, BUF_SIZE, fin);
if (buf_len < 0) break; - if (!add_list(buf, buf_len, ext_buf)) + if (!add_list(buf, buf_len, ext_buf, mod_buf)) goto out_free; }
@@ -848,8 +906,10 @@ int main(int argc, char **argv) for (i = 0; i < count; i++) { if (cull == 0) { fprintf(fout, "%d times, %d pages, ", list[i].num, list[i].page_num); + if (strlen(list[i].module) != 0) + fprintf(fout, "allocated by module %s, ", list[i].module); print_allocator(fout, list[i].allocator); - fprintf(fout, ":\n%s\n", list[i].txt); + fprintf(fout, " :\n%s\n", list[i].txt); } else { fprintf(fout, "%d times, %d pages", @@ -880,6 +940,8 @@ int main(int argc, char **argv) free(buf); if (list) free(list); +out_module: + regfree(&module_pattern); out_free_ts: regfree(&free_ts_nsec_pattern); out_ts:
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
The --cull option allows the items that have the same information to be culled, such as pid, command name, stacktrace. This patch supports new cull arg (i.e. module) to cull the items by who allocates the page. For example, with --cull=module option, the items will be culled as follows: 314842 times, 334218 pages, module: 202508 times, 206389 pages, module: ext4 16711 times, 16753 pages, module: zram
For the first line, there is no string afther "module:", which means the pages are allocated by the kernel core, not by modules.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 884cc60be1ab..16ddff316f1b 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -56,7 +56,8 @@ enum CULL_BIT { CULL_TGID = 1<<3, CULL_COMM = 1<<4, CULL_STACKTRACE = 1<<5, - CULL_ALLOCATOR = 1<<6 + CULL_ALLOCATOR = 1<<6, + CULL_MODULE = 1 << 7 }; enum ALLOCATOR_BIT { ALLOCATOR_CMA = 1<<1, @@ -67,7 +68,7 @@ enum ALLOCATOR_BIT { enum ARG_TYPE { ARG_TXT, ARG_COMM, ARG_STACKTRACE, ARG_ALLOC_TS, ARG_FREE_TS, ARG_CULL_TIME, ARG_PAGE_NUM, ARG_PID, ARG_TGID, ARG_UNKNOWN, ARG_FREE, - ARG_ALLOCATOR + ARG_ALLOCATOR, ARG_MODULE }; enum SORT_ORDER { SORT_ASC = 1, @@ -211,6 +212,13 @@ static int compare_release(const void *p1, const void *p2) return l1->free_ts_nsec ? 1 : -1; }
+static int compare_module(const void *p1, const void *p2) +{ + const struct block_list *l1 = p1, *l2 = p2; + + return strcmp(l1->module, l2->module); +} + static int compare_cull_condition(const void *p1, const void *p2) { if (cull == 0) @@ -227,6 +235,8 @@ static int compare_cull_condition(const void *p1, const void *p2) return compare_release(p1, p2); if ((cull & CULL_ALLOCATOR) && compare_allocator(p1, p2)) return compare_allocator(p1, p2); + if ((cull & CULL_MODULE) && compare_module(p1, p2)) + return compare_module(p1, p2); return 0; }
@@ -443,6 +453,8 @@ static int get_arg_type(const char *arg) return ARG_ALLOC_TS; else if (!strcmp(arg, "allocator") || !strcmp(arg, "ator")) return ARG_ALLOCATOR; + else if (!strcmp(arg, "module") || !strcmp(arg, "mod")) + return ARG_MODULE; else { return ARG_UNKNOWN; } @@ -601,6 +613,8 @@ static bool parse_cull_args(const char *arg_str) cull |= CULL_UNRELEASE; else if (arg_type == ARG_ALLOCATOR) cull |= CULL_ALLOCATOR; + else if (arg_type == ARG_MODULE) + cull |= CULL_MODULE; else { free_explode(args, size); return false; @@ -920,6 +934,8 @@ int main(int argc, char **argv) fprintf(fout, ", TGID %d", list[i].tgid); if (cull & CULL_COMM || filter & FILTER_COMM) fprintf(fout, ", task_comm_name: %s", list[i].comm); + if (cull & CULL_MODULE || filter & FILTER_MODULE) + fprintf(fout, ", module: %s", list[i].module); if (cull & CULL_ALLOCATOR) { fprintf(fout, ", "); print_allocator(fout, list[i].allocator);
Offering: HULK hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
This patch adds new argument (i.e. source) for --sort option, allows the items sorted by module name.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- tools/vm/page_owner_sort.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/vm/page_owner_sort.c b/tools/vm/page_owner_sort.c index 16ddff316f1b..b4936e78d7d5 100644 --- a/tools/vm/page_owner_sort.c +++ b/tools/vm/page_owner_sort.c @@ -679,6 +679,8 @@ static bool parse_sort_args(const char *arg_str) sc.cmps[i] = compare_txt; else if (arg_type == ARG_ALLOCATOR) sc.cmps[i] = compare_allocator; + else if (arg_type == ARG_MODULE) + sc.cmps[i] = compare_module; else { free_explode(args, size); sc.size = 0;
Offering: HULK hulk inclusion category: doc bugzilla: https://gitee.com/openeuler/kernel/issues/I9GSSR
----------------------------------------
Add the description for -M, --module, --cull=module and --sort=module options added for page_owner_sort tool.
Signed-off-by: Jinjiang Tu tujinjiang@huawei.com --- Documentation/vm/page_owner.rst | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index a73d1ad6db04..a97f746368e2 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -120,7 +120,8 @@ Usage
By default, ``page_owner_sort`` is sorted according to the times of buf. If you want to sort by the page nums of buf, use the ``-m`` parameter. - The detailed parameters are: + The parameters related to modules depend on the kernel built with + CONFIG_PAGE_OWNER_MODULE_STAT. The detailed parameters are:
fundamental function::
@@ -163,6 +164,7 @@ Usage
Filter: -f Filter out the information of blocks whose memory has not been released. + -M Filter out the information of blocks whose memory isn't allocated by modules.
Select: --pid <pidlist> Select by pid. This selects the blocks whose process ID @@ -171,8 +173,10 @@ Usage group ID numbers appear in <tgidlist>. --name <cmdlist> Select by task command name. This selects the blocks whose task command name appear in <cmdlist>. + --module <modulelist> Select by module. This selects the information of blocks whose + memory is allocated by modules appear in <modulelist>.
- <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, + <pidlist>, <tgidlist>, <cmdlist>, <modulelist> are single arguments in the form of a comma-separated list, which offers a way to specify individual selecting rules.
@@ -196,6 +200,7 @@ STANDARD FORMAT SPECIFIERS ft free_ts timestamp of the page when it was released at alloc_ts timestamp of the page when it was allocated ator allocator memory allocator for pages + mod module the name of the module that the page is allocated by
For --cull option:
@@ -206,3 +211,4 @@ STANDARD FORMAT SPECIFIERS f free whether the page has been released or not st stacktrace stack trace of the page allocation ator allocator memory allocator for pages + mod module the name of the module that the page is allocated by
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/7594 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/F...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/7594 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/F...