[PATCH OLK-6.6 0/3] Print rootfs and tmpfs files charged by memcg

Support to print rootfs files and tmpfs files that having pages charged in given memory cgroup. The files infomations can be printed through interface "memory.memfs_files_info" or printed when OOM is triggered. Jinjiang Tu (1): fs: move {lock, unlock}_mount_hash to fs/mount.h Liu Shixin (2): mm/memcg_memfs_info: show files that having pages charged in mem_cgroup config: enable CONFIG_MEMCG_MEMFS_INFO by default Documentation/vm/memcg_memfs_info.rst | 40 ++++ arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + fs/mount.h | 10 + fs/namespace.c | 10 - include/linux/memcg_memfs_info.h | 23 ++ init/Kconfig | 10 + mm/Makefile | 1 + mm/memcg_memfs_info.c | 318 +++++++++++++++++++++++++ mm/memcontrol.c | 12 + 10 files changed, 416 insertions(+), 10 deletions(-) create mode 100644 Documentation/vm/memcg_memfs_info.rst create mode 100644 include/linux/memcg_memfs_info.h create mode 100644 mm/memcg_memfs_info.c -- 2.25.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3528 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/G... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3528 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/G...

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8PWEP CVE: NA -------------------------------------- commit d033cb6784c4 ("mount: make {lock,unlock}_mount_hash() static") moves {lock, unlock}_mount_hash to fs/namespace.c due to the two functions are only used in the file. memcg_memfs_info feature needs to reference the two functions, move them back to fs/mount.h. Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- fs/mount.h | 10 ++++++++++ fs/namespace.c | 10 ---------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 130c07c2f8d2..3c2fbb78cae2 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -123,6 +123,16 @@ static inline void get_mnt_ns(struct mnt_namespace *ns) extern seqlock_t mount_lock; +static inline void lock_mount_hash(void) +{ + write_seqlock(&mount_lock); +} + +static inline void unlock_mount_hash(void) +{ + write_sequnlock(&mount_lock); +} + struct proc_mounts { struct mnt_namespace *ns; struct path root; diff --git a/fs/namespace.c b/fs/namespace.c index e157efc54023..a10ff870c862 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -99,16 +99,6 @@ EXPORT_SYMBOL_GPL(fs_kobj); */ __cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock); -static inline void lock_mount_hash(void) -{ - write_seqlock(&mount_lock); -} - -static inline void unlock_mount_hash(void) -{ - write_sequnlock(&mount_lock); -} - static inline struct hlist_head *m_hash(struct vfsmount *mnt, struct dentry *dentry) { unsigned long tmp = ((unsigned long)mnt / L1_CACHE_BYTES); -- 2.25.1

From: Liu Shixin <liushixin2@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8PWEP CVE: NA -------------------------------- Support to print rootfs files and tmpfs files that having pages charged in given memory cgroup. The files infomations can be printed through interface "memory.memfs_files_info" or printed when OOM is triggered. In order not to flush memory logs, we limit the maximum number of files to be printed when oom through interface "max_print_files_in_oom". And in order to filter out small files, we limit the minimum size of files that can be printed through interface "size_threshold". Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- Documentation/vm/memcg_memfs_info.rst | 40 ++++ include/linux/memcg_memfs_info.h | 23 ++ init/Kconfig | 10 + mm/Makefile | 1 + mm/memcg_memfs_info.c | 318 ++++++++++++++++++++++++++ mm/memcontrol.c | 12 + 6 files changed, 404 insertions(+) create mode 100644 Documentation/vm/memcg_memfs_info.rst create mode 100644 include/linux/memcg_memfs_info.h create mode 100644 mm/memcg_memfs_info.c diff --git a/Documentation/vm/memcg_memfs_info.rst b/Documentation/vm/memcg_memfs_info.rst new file mode 100644 index 000000000000..aff432d125e5 --- /dev/null +++ b/Documentation/vm/memcg_memfs_info.rst @@ -0,0 +1,40 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +================ +Memcg Memfs Info +================ + +Overview +======== + +Support to print rootfs files and tmpfs files that having pages charged +in given memory cgroup. The files infomations can be printed through +interface "memory.memfs_files_info" or printed when OOM is triggered. + +User control +============ + +1. /sys/kernel/mm/memcg_memfs_info/enable +----------------------------------------- + +Boolean type. The default value is 0, set it to 1 to enable the feature. + +2. /sys/kernel/mm/memcg_memfs_info/max_print_files_in_oom +--------------------------------------------------------- + +Unsigned long type. The default value is 500, indicating that the maximum of +files can be print to console when OOM is triggered. + +3. /sys/kernel/mm/memcg_memfs_info/size_threshold +------------------------------------------------- + +Unsigned long type. The default value is 0, indicating that the minimum size of +files that can be printed. + +4. /sys/fs/cgroup/memory/<memory>/memory.memfs_files_info +--------------------------------------------------------- + +Outputs the files who use memory in this memory cgroup. + +--- +Liu Shixin, Jan 2022 diff --git a/include/linux/memcg_memfs_info.h b/include/linux/memcg_memfs_info.h new file mode 100644 index 000000000000..b5e3709baa9e --- /dev/null +++ b/include/linux/memcg_memfs_info.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +#ifndef _LINUX_MEMCG_MEMFS_INFO_H +#define _LINUX_MEMCG_MEMFS_INFO_H + +#include <linux/memcontrol.h> +#include <linux/seq_file.h> + +#ifdef CONFIG_MEMCG_MEMFS_INFO +void mem_cgroup_print_memfs_info(struct mem_cgroup *memcg, char *pathbuf, + struct seq_file *m); +int mem_cgroup_memfs_files_show(struct seq_file *m, void *v); +void mem_cgroup_memfs_info_init(void); +#else +static inline void mem_cgroup_print_memfs_info(struct mem_cgroup *memcg, + char *pathbuf, + struct seq_file *m) +{ +} +static inline void mem_cgroup_memfs_info_init(void) +{ +} +#endif +#endif diff --git a/init/Kconfig b/init/Kconfig index 2ee1384c4f81..455a90cea14c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -949,6 +949,16 @@ config MEMCG_V1_RECLAIM depends on MEMCG default n +config MEMCG_MEMFS_INFO + bool "Show memfs files that have pages charged in given memory cgroup" + depends on MEMCG + default n + help + Support to print rootfs files and tmpfs files that having pages + charged in given memory cgroup. The files infomations can be printed + through interface "memory.memfs_files_info" or printed when OOM is + triggered. + config MEMCG_KMEM bool depends on MEMCG diff --git a/mm/Makefile b/mm/Makefile index 642c6335596d..6921fedacd07 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -140,3 +140,4 @@ obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o obj-$(CONFIG_SHARE_POOL) += share_pool.o +obj-$(CONFIG_MEMCG_MEMFS_INFO) += memcg_memfs_info.o diff --git a/mm/memcg_memfs_info.c b/mm/memcg_memfs_info.c new file mode 100644 index 000000000000..aa7f7966306f --- /dev/null +++ b/mm/memcg_memfs_info.c @@ -0,0 +1,318 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include <linux/memcg_memfs_info.h> +#include <linux/fs.h> +#include <linux/sysfs.h> +#include <linux/kobject.h> +#include <linux/slab.h> +#include "../fs/mount.h" + +#define SEQ_printf(m, x...) \ +do { \ + if (m) \ + seq_printf(m, x); \ + else \ + pr_info(x); \ +} while (0) + +struct print_files_control { + struct mem_cgroup *memcg; + struct seq_file *m; + unsigned long size_threshold; + unsigned long max_print_files; + + char *pathbuf; + unsigned long pathbuf_size; + + const char *fs_type_name; + struct vfsmount *vfsmnt; + unsigned long total_print_files; + unsigned long total_files_size; +}; + +static bool memfs_enable; +static unsigned long memfs_size_threshold; +static unsigned long memfs_max_print_files = 500; + +static const char *const fs_type_names[] = { + "rootfs", + "tmpfs", +}; + +static struct vfsmount *memfs_get_vfsmount(struct super_block *sb) +{ + struct mount *mnt; + struct vfsmount *vfsmnt; + + lock_mount_hash(); + list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) { + /* + * There may be multiple mount points for a super_block, + * just need to print one of these mount points to determine + * the file path. + */ + vfsmnt = mntget(&mnt->mnt); + unlock_mount_hash(); + return vfsmnt; + } + unlock_mount_hash(); + + return NULL; +} + +static unsigned long memfs_count_in_mem_cgroup(struct mem_cgroup *memcg, + struct address_space *mapping) +{ + XA_STATE(xas, &mapping->i_pages, 0); + unsigned long size = 0; + struct page *page, *head; + + rcu_read_lock(); + xas_for_each(&xas, page, ULONG_MAX) { + if (xas_retry(&xas, page)) + continue; + + if (xa_is_value(page)) + continue; + + head = compound_head(page); + if ((unsigned long)memcg == head->memcg_data) + size += PAGE_SIZE; + } + rcu_read_unlock(); + return size; +} + +static void memfs_show_file_in_mem_cgroup(void *data, struct inode *inode) +{ + struct print_files_control *pfc = data; + struct dentry *dentry; + unsigned long size; + struct path path; + char *filepath; + + size = memfs_count_in_mem_cgroup(pfc->memcg, inode->i_mapping); + if (!size || size < pfc->size_threshold) + return; + + dentry = d_find_alias(inode); + if (!dentry) + return; + path.mnt = pfc->vfsmnt; + path.dentry = dentry; + filepath = d_absolute_path(&path, pfc->pathbuf, pfc->pathbuf_size); + if (!filepath || IS_ERR(filepath)) + filepath = "(too long)"; + pfc->total_print_files++; + pfc->total_files_size += size; + dput(dentry); + + /* + * To prevent excessive logs, limit the amount of data + * that can be output to logs. + */ + if (!pfc->m && pfc->total_print_files > pfc->max_print_files) + return; + + SEQ_printf(pfc->m, "%lukB %llukB %s\n", + size >> 10, inode->i_size >> 10, filepath); +} + +static void memfs_show_files_in_mem_cgroup(struct super_block *sb, void *data) +{ + struct print_files_control *pfc = data; + struct inode *inode, *toput_inode = NULL; + + if (strncmp(sb->s_type->name, + pfc->fs_type_name, strlen(pfc->fs_type_name))) + return; + + pfc->vfsmnt = memfs_get_vfsmount(sb); + if (!pfc->vfsmnt) + return; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + + if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || + (inode->i_mapping->nrpages == 0 && !need_resched())) { + spin_unlock(&inode->i_lock); + continue; + } + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + + memfs_show_file_in_mem_cgroup(pfc, inode); + + iput(toput_inode); + toput_inode = inode; + + cond_resched(); + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); + iput(toput_inode); + mntput(pfc->vfsmnt); +} + +void mem_cgroup_print_memfs_info(struct mem_cgroup *memcg, char *pathbuf, + struct seq_file *m) +{ + struct print_files_control pfc = { + .memcg = memcg, + .m = m, + .max_print_files = READ_ONCE(memfs_max_print_files), + .size_threshold = READ_ONCE(memfs_size_threshold), + }; + int i; + + if (!memfs_enable || !memcg) + return; + + pfc.pathbuf = pathbuf; + pfc.pathbuf_size = PATH_MAX; + + for (i = 0; i < ARRAY_SIZE(fs_type_names); i++) { + pfc.fs_type_name = fs_type_names[i]; + pfc.total_print_files = 0; + pfc.total_files_size = 0; + + SEQ_printf(m, "Show %s files (memory-size > %lukB):\n", + pfc.fs_type_name, pfc.size_threshold >> 10); + SEQ_printf(m, "<memory-size> <file-size> <path>\n"); + iterate_supers(memfs_show_files_in_mem_cgroup, &pfc); + + SEQ_printf(m, "total files: %lu, total memory-size: %lukB\n", + pfc.total_print_files, pfc.total_files_size >> 10); + } +} + +int mem_cgroup_memfs_files_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + char *pathbuf; + + pathbuf = kmalloc(PATH_MAX, GFP_KERNEL); + if (!pathbuf) { + SEQ_printf(m, "Show memfs abort: failed to allocate memory\n"); + return 0; + } + mem_cgroup_print_memfs_info(memcg, pathbuf, m); + kfree(pathbuf); + return 0; +} + +static ssize_t memfs_size_threshold_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", READ_ONCE(memfs_size_threshold)); +} + +static ssize_t memfs_size_threshold_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned long count; + int err; + + err = kstrtoul(buf, 10, &count); + if (err) + return err; + + WRITE_ONCE(memfs_size_threshold, count); + return len; +} + +static struct kobj_attribute memfs_size_threshold_attr = { + .attr = {"size_threshold", 0644}, + .show = &memfs_size_threshold_show, + .store = &memfs_size_threshold_store, +}; + +static ssize_t memfs_max_print_files_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", READ_ONCE(memfs_max_print_files)); +} + +static ssize_t memfs_max_print_files_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned long count; + int err; + + err = kstrtoul(buf, 10, &count); + if (err) + return err; + + WRITE_ONCE(memfs_max_print_files, count); + return len; +} + +static struct kobj_attribute memfs_max_print_files_attr = { + .attr = {"max_print_files_in_oom", 0644}, + .show = &memfs_max_print_files_show, + .store = &memfs_max_print_files_store, +}; + +static ssize_t memfs_enable_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", memfs_enable); +} + +static ssize_t memfs_enable_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + bool enable; + int err; + + err = kstrtobool(buf, &enable); + if (err) + return err; + + memfs_enable = enable; + return len; +} + +static struct kobj_attribute memfs_enable_attr = { + .attr = {"enable", 0644}, + .show = &memfs_enable_show, + .store = &memfs_enable_store, +}; + +static struct attribute *memfs_attr[] = { + &memfs_size_threshold_attr.attr, + &memfs_max_print_files_attr.attr, + &memfs_enable_attr.attr, + NULL, +}; + +static struct attribute_group memfs_attr_group = { + .attrs = memfs_attr, +}; + +void mem_cgroup_memfs_info_init(void) +{ + struct kobject *memcg_memfs_kobj; + + if (mem_cgroup_disabled()) + return; + + memcg_memfs_kobj = kobject_create_and_add("memcg_memfs_info", mm_kobj); + if (unlikely(!memcg_memfs_kobj)) { + pr_err("failed to create memcg_memfs kobject\n"); + return; + } + + if (sysfs_create_group(memcg_memfs_kobj, &memfs_attr_group)) { + pr_err("failed to register memcg_memfs group\n"); + kobject_put(memcg_memfs_kobj); + } +} diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2489f59ddd5a..a875e830b7e8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include <linux/resume_user_mode.h> #include <linux/psi.h> #include <linux/seq_buf.h> +#include <linux/memcg_memfs_info.h> #include <linux/sched/isolation.h> #include "internal.h" #include <net/sock.h> @@ -1672,6 +1673,7 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { /* Use static buffer, for the caller is holding oom_lock. */ static char buf[PAGE_SIZE]; + static char pathbuf[PATH_MAX]; struct seq_buf s; lockdep_assert_held(&oom_lock); @@ -1698,6 +1700,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) seq_buf_init(&s, buf, sizeof(buf)); memory_stat_format(memcg, &s); seq_buf_do_printk(&s, KERN_INFO); + + mem_cgroup_print_memfs_info(memcg, pathbuf, NULL); } /* @@ -5217,6 +5221,12 @@ static struct cftype mem_cgroup_legacy_files[] = { .name = "pressure_level", .seq_show = mem_cgroup_dummy_seq_show, }, +#ifdef CONFIG_MEMCG_MEMFS_INFO + { + .name = "memfs_files_info", + .seq_show = mem_cgroup_memfs_files_show, + }, +#endif #ifdef CONFIG_NUMA { .name = "numa_stat", @@ -7588,6 +7598,8 @@ static int __init mem_cgroup_init(void) soft_limit_tree.rb_tree_per_node[node] = rtpn; } + mem_cgroup_memfs_info_init(); + return 0; } subsys_initcall(mem_cgroup_init); -- 2.25.1

From: Liu Shixin <liushixin2@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8PWEP CVE: NA -------------------------------- enable CONFIG_MEMCG_MEMFS_INFO by default. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 33ba39711884..0491301ae0d1 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -158,6 +158,7 @@ CONFIG_PAGE_COUNTER=y # CONFIG_CGROUP_FAVOR_DYNMODS is not set CONFIG_MEMCG=y CONFIG_MEMCG_V1_RECLAIM=y +CONFIG_MEMCG_MEMFS_INFO=y CONFIG_MEMCG_KMEM=y CONFIG_BLK_CGROUP=y CONFIG_CGROUP_WRITEBACK=y diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 44040b835333..365aa5405393 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -180,6 +180,7 @@ CONFIG_PAGE_COUNTER=y # CONFIG_CGROUP_FAVOR_DYNMODS is not set CONFIG_MEMCG=y CONFIG_MEMCG_V1_RECLAIM=y +CONFIG_MEMCG_MEMFS_INFO=y CONFIG_MEMCG_KMEM=y CONFIG_BLK_CGROUP=y CONFIG_CGROUP_WRITEBACK=y -- 2.25.1
participants (2)
-
Jinjiang Tu
-
patchwork bot