September 2024 - Kernel - mailweb.openeuler.org

[PATCH OLK-5.10 v2] drm/amd/display: Fix null pointer deref in dcn20_resource.c
by Xiaomeng Zhang 06 Sep '24

06 Sep '24

From: Aurabindo Pillai <aurabindo.pillai(a)amd.com> mainline inclusion from mainline-v6.11-rc1 commit ecbf60782662f0a388493685b85a645a0ba1613c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAMMUG CVE: CVE-2024-43899 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Fixes a hang thats triggered when MPV is run on a DCN401 dGPU: mpv --hwdec=vaapi --vo=gpu --hwdec-codecs=all and then enabling fullscreen playback (double click on the video) The following calltrace will be seen: [ 181.843989] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 181.843997] #PF: supervisor instruction fetch in kernel mode [ 181.844003] #PF: error_code(0x0010) - not-present page [ 181.844009] PGD 0 P4D 0 [ 181.844020] Oops: 0010 [#1] PREEMPT SMP NOPTI [ 181.844028] CPU: 6 PID: 1892 Comm: gnome-shell Tainted: G W OE 6.5.0-41-generic #41~22.04.2-Ubuntu [ 181.844038] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 6302 10/23/2018 [ 181.844044] RIP: 0010:0x0 [ 181.844079] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 181.844084] RSP: 0018:ffffb593c2b8f7b0 EFLAGS: 00010246 [ 181.844093] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 181.844099] RDX: ffffb593c2b8f804 RSI: ffffb593c2b8f7e0 RDI: ffff9e3c8e758400 [ 181.844105] RBP: ffffb593c2b8f7b8 R08: ffffb593c2b8f9c8 R09: ffffb593c2b8f96c [ 181.844110] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb593c2b8f9c8 [ 181.844115] R13: 0000000000000001 R14: ffff9e3c88000000 R15: 0000000000000005 [ 181.844121] FS: 00007c6e323bb5c0(0000) GS:ffff9e3f85f80000(0000) knlGS:0000000000000000 [ 181.844128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 181.844134] CR2: ffffffffffffffd6 CR3: 0000000140fbe000 CR4: 00000000003506e0 [ 181.844141] Call Trace: [ 181.844146] <TASK> [ 181.844153] ? show_regs+0x6d/0x80 [ 181.844167] ? __die+0x24/0x80 [ 181.844179] ? page_fault_oops+0x99/0x1b0 [ 181.844192] ? do_user_addr_fault+0x31d/0x6b0 [ 181.844204] ? exc_page_fault+0x83/0x1b0 [ 181.844216] ? asm_exc_page_fault+0x27/0x30 [ 181.844237] dcn20_get_dcc_compression_cap+0x23/0x30 [amdgpu] [ 181.845115] amdgpu_dm_plane_validate_dcc.constprop.0+0xe5/0x180 [amdgpu] [ 181.845985] amdgpu_dm_plane_fill_plane_buffer_attributes+0x300/0x580 [amdgpu] [ 181.846848] fill_dc_plane_info_and_addr+0x258/0x350 [amdgpu] [ 181.847734] fill_dc_plane_attributes+0x162/0x350 [amdgpu] [ 181.848748] dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.849791] ? dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.850840] amdgpu_dm_atomic_check+0xdfe/0x1760 [amdgpu] Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Conflicts: drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c [The conflicts were due to file path changed] Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com> --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index 53ac82693532..2990793e86a2 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -3290,10 +3290,11 @@ bool dcn20_get_dcc_compression_cap(const struct dc *dc, const struct dc_dcc_surface_param *input, struct dc_surface_dcc_cap *output) { - return dc->res_pool->hubbub->funcs->get_dcc_compression_cap( - dc->res_pool->hubbub, - input, - output); + if (dc->res_pool->hubbub->funcs->get_dcc_compression_cap) + return dc->res_pool->hubbub->funcs->get_dcc_compression_cap( + dc->res_pool->hubbub, input, output); + + return false; } static void dcn20_destroy_resource_pool(struct resource_pool **pool) -- 2.34.1

2 1

[PATCH OLK-6.6] tools: Add dynamic process-level cgroup memory monitoring tool
by Taoxy2004 06 Sep '24

06 Sep '24

community inclusion category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I9JQ4D ---------------------------------------------------------------------- This patch introduces a new tool called "probeCgroup" that enables dynamic monitoring of memory usage at the process level within cgroups. By using kprobes at relevant cgroup functions, this tool can track memory allocations and deallocations for individual processes within a cgroup, providing detailed statistics on memory usage. The key features of the tool include: 1. Dynamic insertion of kprobes at critical points in the cgroup subsystem. 2. Tracking memory allocation and deallocation events for each process by recording page addresses in a hash table. 3. Providing real-time statistics on memory usage at the process level. 4. Providing statistics on memory usage for processes that are OOM. Signed-off-by: Taoxy2004 <221870066(a)smail.nju.edu.cn> --- tools/probeCgroup/Makefile | 7 + tools/probeCgroup/README.md | 29 + tools/probeCgroup/probeCgroup.c | 612 ++++++++++++++++++ tools/probeCgroup/probeCgroup.h | 415 ++++++++++++ tools/probeCgroup/run.sh | 8 + tools/probeCgroup/scripts/script1.sh | 10 + tools/probeCgroup/scripts/script2.sh | 14 + tools/probeCgroup/scripts/script3.sh | 11 + .../testcases/1_load_unload_test.py | 24 + .../testcases/2_multiple_process_test.py | 48 ++ .../testcases/3_multiple_cgroup_test.py | 55 ++ tools/probeCgroup/testcases/4_oom_test.py | 52 ++ .../testcases/5_multiple_threads_test.py | 45 ++ tools/probeCgroup/testcases/cgroup_utils.py | 114 ++++ tools/probeCgroup/testcases/mem-allocate.c | 35 + .../testcases/multiple-thread-mem-allocate.c | 60 ++ tools/probeCgroup/testcases/run.py | 32 + .../testcases/simple-mem-allocate.c | 27 + 18 files changed, 1598 insertions(+) create mode 100644 tools/probeCgroup/Makefile create mode 100644 tools/probeCgroup/README.md create mode 100644 tools/probeCgroup/probeCgroup.c create mode 100644 tools/probeCgroup/probeCgroup.h create mode 100755 tools/probeCgroup/run.sh create mode 100755 tools/probeCgroup/scripts/script1.sh create mode 100755 tools/probeCgroup/scripts/script2.sh create mode 100755 tools/probeCgroup/scripts/script3.sh create mode 100755 tools/probeCgroup/testcases/1_load_unload_test.py create mode 100755 tools/probeCgroup/testcases/2_multiple_process_test.py create mode 100755 tools/probeCgroup/testcases/3_multiple_cgroup_test.py create mode 100755 tools/probeCgroup/testcases/4_oom_test.py create mode 100755 tools/probeCgroup/testcases/5_multiple_threads_test.py create mode 100644 tools/probeCgroup/testcases/cgroup_utils.py create mode 100644 tools/probeCgroup/testcases/mem-allocate.c create mode 100644 tools/probeCgroup/testcases/multiple-thread-mem-allocate.c create mode 100755 tools/probeCgroup/testcases/run.py create mode 100644 tools/probeCgroup/testcases/simple-mem-allocate.c diff --git a/tools/probeCgroup/Makefile b/tools/probeCgroup/Makefile new file mode 100644 index 000000000000..606c951e5487 --- /dev/null +++ b/tools/probeCgroup/Makefile @@ -0,0 +1,7 @@ +obj-m := probeCgroup.o +CROSS_COMPILE = '' +KDIR := /lib/modules/$(shell uname -r)/build +all: + make -C $(KDIR) M=$(PWD) modules +clean: + rm -f *.ko *.o *.mod *.mod.o *.mod.c .*.cmd *.symvers module* diff --git a/tools/probeCgroup/README.md b/tools/probeCgroup/README.md new file mode 100644 index 000000000000..ff0b6fc21228 --- /dev/null +++ b/tools/probeCgroup/README.md @@ -0,0 +1,29 @@ +# probeCgroup + +#### Description +probeCgroup is a process-level cgroup memory monitoring tool based on dynamic tracing (kprobe/kretprobe) technology. By inserting kprobes and kretprobes at the entry and exit points of relevant cgroup functions, this tool can track the memory usage of individual processes within each cgroup in real time. + +#### Software Architecture +1. Dynamic Tracing : Insert kprobes and kretprobes at critical points in cgroup functions to capture memory allocation and release events. +2. Hash Table Recording : Record the addresses of pages currently used by each process in a hash table, so that when a page is released, the process it belongs to can be identified. +3. Real-Time Statistics : Provide real-time statistics showing the memory usage of individual processes within each cgroup. + +#### Instruction +1. Compile and Load the Module + a. In the 'probeCgroup' directory, run the 'make' command to compile the module. + b. Load the module: 'insmod probeCgroup.ko'. + c. View memory statistics: 'cat /proc/cgroup_memory_usage_per_process'. + If an OOM (Out of Memory) event occurs in a cgroup, you can see "oom:" followed by the process that experienced the OOM and its memory usage at the time. + +2. Automate OOM Scenario + In the 'probeCgroup' directory, run './run.sh'. This script will automatically set up an OOM scenario and output the content of '/proc/cgroup_memory_usage_per_process' after execution. + +3. Perform More Tests + a. After compiling the module, in the 'testcases' directory, run './run.py'. + b. This script will perform various tests, including: + - Loading and unloading the module + - Each cgroup containing multiple processes + - Creating multiple cgroups + - OOM scenarios + - Multithreading + c. The tests will take approximately one minute to complete. diff --git a/tools/probeCgroup/probeCgroup.c b/tools/probeCgroup/probeCgroup.c new file mode 100644 index 000000000000..9883cb1e082d --- /dev/null +++ b/tools/probeCgroup/probeCgroup.c @@ -0,0 +1,612 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * probeCgroup.c - A tool used to get memory usage for each process in a cgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include "probeCgroup.h" + +// kretprobe at mem_cgroup_charge +struct charge_data { + struct cgroup *cgrp; + struct mem_cgroup *memcg; + struct task_struct *task; + unsigned long addr; +}; + +static int mem_cgroup_charge_entry_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct charge_data *data; + struct folio *page; + struct mm_struct *mm; + struct mem_cgroup *memcg; + struct cgroup_subsys_state css; + struct cgroup *cgrp; + + if (!current->mm) + return 1; + page = (struct folio *)regs->di; + mm = (struct mm_struct *)regs->si; + if (mm == NULL || page == NULL) + return -1; + memcg = get_mem_cgroup_from_mm(mm); + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + + data = (struct charge_data *)ri->data; + data->memcg = memcg; + data->addr = (unsigned long)page; + data->task = current; + data->cgrp = cgrp; + } + return 0; +} + +NOKPROBE_SYMBOL(mem_cgroup_charge_entry_handler); + +static int mem_cgroup_charge_ret_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + unsigned long retval = regs_return_value(regs); + struct charge_data *data = (struct charge_data *)ri->data; + int id; + struct cgroup_info *cgrp_info; + struct task_info *tsk_info; + + if (data->memcg != NULL && retval == 0) { + id = ((data->memcg)->css).id; + + spin_lock(&lock); + cgrp_info = find_cgroup_info(id); + if (cgrp_info == NULL) { + cgrp_info = create_cgroup_info(data->cgrp, data->memcg); + if (cgrp_info == NULL) { + spin_unlock(&lock); + return -1; + } + add_cgroup_info(cgrp_info); + } + spin_unlock(&lock); + + read_lock(&cgrp_info->cgrp_lock); + tsk_info = find_task_info(cgrp_info, data->task->tgid); + read_unlock(&cgrp_info->cgrp_lock); + + // for some cases, task->comm changes over time + if (tsk_info != NULL + && strcmp(data->task->comm, tsk_info->comm) != 0) { + strscpy(tsk_info->comm, data->task->comm, + sizeof(tsk_info->comm)); + } + + if (tsk_info == NULL) { + tsk_info = create_task_info(data->task); + if (tsk_info == NULL) + return -1; + add_task_to_cgroup_info(cgrp_info, tsk_info); + } + + if (HashMap_insert(tsk_info->pages, data->addr)) { + //update counter + spin_lock(&(tsk_info->cnt_lock)); + tsk_info->count += + folio_nr_pages((struct folio *)data->addr); + spin_unlock(&(tsk_info->cnt_lock)); + } + } + + return 0; + +} + +NOKPROBE_SYMBOL(mem_cgroup_charge_ret_handler); + +static struct kretprobe mem_cgroup_charge_kretprobe = { + .handler = mem_cgroup_charge_ret_handler, + .entry_handler = mem_cgroup_charge_entry_handler, + .data_size = sizeof(struct charge_data), + .maxactive = 20, +}; + +static int mem_cgroup_charge_kretprobe_init(void) +{ + int ret; + + mem_cgroup_charge_kretprobe.kp.symbol_name = "__mem_cgroup_charge"; + ret = register_kretprobe(&mem_cgroup_charge_kretprobe); + if (ret < 0) { + pr_err("register_kretprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted return probe at %s: %p\n", + mem_cgroup_charge_kretprobe.kp.symbol_name, + mem_cgroup_charge_kretprobe.kp.addr); + return 0; +} + +static void mem_cgroup_charge_kretprobe_exit(void) +{ + unregister_kretprobe(&mem_cgroup_charge_kretprobe); + pr_info("kretprobe at %p unregistered\n", + mem_cgroup_charge_kretprobe.kp.addr); + + /* nmissed > 0 suggests that maxactive was set too low. */ + pr_info("Missed probing %d instances of %s\n", + mem_cgroup_charge_kretprobe.nmissed, + mem_cgroup_charge_kretprobe.kp.symbol_name); +} + +// kretprobe at uncharge_folio + +struct uncharge_data { + struct cgroup *cgrp; + struct mem_cgroup *memcg; + unsigned long addr; + bool isKmem; + int nr_pages; +}; + +static int uncharge_folio_entry_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct uncharge_data *data; + struct folio *page; + struct mem_cgroup *memcg = NULL; + struct cgroup_subsys_state css; + struct cgroup *cgrp; + struct obj_cgroup *objcg; + int nr_pages = 0; + + data = (struct uncharge_data *)ri->data; + page = (struct folio *)regs->di; + if (page == NULL) { + data->memcg = NULL; + return -1; + } + if (page->memcg_data & MEMCG_DATA_KMEM) { // if the page belongs to kmem + if (!folio_test_large(page)) + nr_pages = 1; + else + nr_pages = page->_folio_nr_pages; + // nr_pages = thp_nr_pages(page); + objcg = __folio_objcg(page); + if (objcg != NULL) + memcg = objcg->memcg; + data->isKmem = true; + data->nr_pages = nr_pages; + } else { + memcg = __folio_memcg(page); + data->isKmem = false; + } + + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + + data->memcg = memcg; + data->addr = (unsigned long)page; + data->cgrp = cgrp; + } + return 0; +} + +NOKPROBE_SYMBOL(uncharge_folio_entry_handler); + +static int uncharge_folio_ret_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct uncharge_data *data = (struct uncharge_data *)ri->data; + int id; + struct cgroup_info *cgrp_info; + int ret = -1; + + if (data->memcg != NULL) { + id = ((data->memcg)->css).id; + cgrp_info = find_cgroup_info(id); + if (cgrp_info == NULL) + return -1; + if (data->isKmem) + ret = -1; + else + ret = remove_page_from_cgroup_info(data->addr, cgrp_info); + } + + return ret; +} + +NOKPROBE_SYMBOL(uncharge_folio_ret_handler); + +static struct kretprobe uncharge_folio_kretprobe = { + .handler = uncharge_folio_ret_handler, + .entry_handler = uncharge_folio_entry_handler, + .data_size = sizeof(struct uncharge_data), + .maxactive = 20, +}; + +static int uncharge_folio_kretprobe_init(void) +{ + int ret; + + uncharge_folio_kretprobe.kp.symbol_name = "uncharge_folio"; + ret = register_kretprobe(&uncharge_folio_kretprobe); + if (ret < 0) { + pr_err("register_kretprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted return probe at %s: %p\n", + uncharge_folio_kretprobe.kp.symbol_name, + uncharge_folio_kretprobe.kp.addr); + return 0; +} + +static void uncharge_folio_kretprobe_exit(void) +{ + unregister_kretprobe(&uncharge_folio_kretprobe); + pr_info("kretprobe at %p unregistered\n", + uncharge_folio_kretprobe.kp.addr); + + /* nmissed > 0 suggests that maxactive was set too low. */ + pr_info("Missed probing %d instances of %s\n", + uncharge_folio_kretprobe.nmissed, + uncharge_folio_kretprobe.kp.symbol_name); +} + +//kprobe at do_exit +static struct kprobe do_exit_kprobe; +static int do_exit_kprobe_pre_handler(struct kprobe *p, struct pt_regs *regs) +{ + struct task_struct *cur = current; + int tgid = cur->tgid; + struct mm_struct *mm = cur->mm; + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + struct cgroup_subsys_state css; + struct cgroup *cgrp; + struct cgroup_info *cgrp_info; + struct task_info *tsk_info; + int id; + + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + id = (memcg->css).id; + cgrp_info = find_cgroup_info(id); + if (cgrp_info != NULL) { + write_lock(&cgrp_info->cgrp_lock); + tsk_info = find_task_info(cgrp_info, tgid); + if (tsk_info != NULL) { + list_del(&tsk_info->list); + write_unlock(&cgrp_info->cgrp_lock); + remove_task_from_cgroup_info(cgrp_info, + tsk_info); + } else { + write_unlock(&cgrp_info->cgrp_lock); + } + return 0; + } + } + return 0; +} + +static void do_exit_kprobe_post_handler(struct kprobe *p, + struct pt_regs *regs, + unsigned long flags) +{ + +} + +static int do_exit_kprobe_init(void) +{ + do_exit_kprobe.pre_handler = do_exit_kprobe_pre_handler; + do_exit_kprobe.post_handler = do_exit_kprobe_post_handler; + do_exit_kprobe.symbol_name = "do_exit"; + if (register_kprobe(&do_exit_kprobe)) { + pr_alert("register_kprobe on do_exit failed!\n"); + return -EINVAL; + } + return 0; +} + +static void do_exit_kprobe_exit(void) +{ + unregister_kprobe(&do_exit_kprobe); +} + +//kprobe at mark_oom_victim +static struct kprobe mark_oom_victim_kprobe; + +static int mark_oom_victim_kprobe_pre_handler(struct kprobe *p, + struct pt_regs *regs) +{ + struct task_struct *victim; + int tgid; + struct mm_struct *mm; + struct mem_cgroup *memcg; + struct cgroup_subsys_state css; + struct cgroup *cgrp; + struct cgroup_info *cgrp_info; + struct task_info *tsk_info; + int id; + struct task_info *oom_info; + + victim = (struct task_struct *)regs->di; + tgid = victim->tgid; + mm = victim->mm; + memcg = get_mem_cgroup_from_mm(mm); + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + id = (memcg->css).id; + cgrp_info = find_cgroup_info(id); + if (cgrp_info != NULL) { + read_lock(&cgrp_info->cgrp_lock); + tsk_info = find_task_info(cgrp_info, tgid); + read_unlock(&cgrp_info->cgrp_lock); + if (tsk_info != NULL) { + oom_info = create_oom_task_info(tsk_info); + if (oom_info != NULL) { + add_oom_task_to_cgroup_info(cgrp_info, + oom_info); + } + return 0; + } + } + } + return 0; +} + +static void mark_oom_victim_kprobe_post_handler(struct kprobe *p, + struct pt_regs *regs, + unsigned long flags) +{ + +} + +static int mark_oom_victim_kprobe_init(void) +{ + mark_oom_victim_kprobe.pre_handler = mark_oom_victim_kprobe_pre_handler; + mark_oom_victim_kprobe.post_handler = + mark_oom_victim_kprobe_post_handler; + mark_oom_victim_kprobe.symbol_name = "mark_oom_victim"; + if (register_kprobe(&mark_oom_victim_kprobe)) { + pr_alert("register_kprobe on mark_oom_victim failed!\n"); + return -EINVAL; + } + return 0; +} + +static void mark_oom_victim_kprobe_exit(void) +{ + unregister_kprobe(&mark_oom_victim_kprobe); +} + +//kretporbe at cgroup_destroy_locked +struct destroy_data { + struct cgroup *cgrp; +}; + +static int cgroup_destroy_locked_entry_handler(struct kretprobe_instance + *ri, struct pt_regs *regs) +{ + struct destroy_data *data; + + data = (struct destroy_data *)ri->data; + data->cgrp = (struct cgroup *)regs->di; + return 0; +} + +NOKPROBE_SYMBOL(cgroup_destroy_locked_entry_handler); + +static int cgroup_destroy_locked_ret_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct destroy_data *data = (struct destroy_data *)ri->data; + struct cgroup *cgrp = data->cgrp; + struct cgroup_info *cgrp_info = NULL; + unsigned long retval = regs_return_value(regs); + + if (!cgrp) + return -1; + if (retval != 0) + return -1; + list_for_each_entry(cgrp_info, &all_cgroup_info, list) { + if (cgrp_info->cgrp == cgrp) { + spin_lock(&lock); + list_del(&cgrp_info->list); + spin_unlock(&lock); + destroy_cgroup_info(cgrp_info); + return 0; + } + } + return -1; +} + +NOKPROBE_SYMBOL(cgroup_destroy_locked_ret_handler); + +static struct kretprobe cgroup_destroy_locked_kretprobe = { + .handler = cgroup_destroy_locked_ret_handler, + .entry_handler = cgroup_destroy_locked_entry_handler, + .data_size = sizeof(struct destroy_data), + .maxactive = 20, +}; + +static int cgroup_destroy_locked_kretprobe_init(void) +{ + int ret; + + cgroup_destroy_locked_kretprobe.kp.symbol_name = + "cgroup_destroy_locked"; + ret = register_kretprobe(&cgroup_destroy_locked_kretprobe); + if (ret < 0) { + pr_err("register_kretprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted return probe at %s: %p\n", + cgroup_destroy_locked_kretprobe.kp.symbol_name, + cgroup_destroy_locked_kretprobe.kp.addr); + return 0; +} + +static void cgroup_destroy_locked_kretprobe_exit(void) +{ + unregister_kretprobe(&cgroup_destroy_locked_kretprobe); + pr_info("kretprobe at %p unregistered\n", + cgroup_destroy_locked_kretprobe.kp.addr); + + /* nmissed > 0 suggests that maxactive was set too low. */ + pr_info("Missed probing %d instances of %s\n", + cgroup_destroy_locked_kretprobe.nmissed, + cgroup_destroy_locked_kretprobe.kp.symbol_name); +} + +// print the tasks in order of their memory usage +static void print_sorted_tasks_list(struct cgroup_info *cgrp_info, + int type, struct seq_file *m) +{ + struct list_head *cur, *insert_pos; + struct task_info *task, *insert_task; + struct list_head new_list = LIST_HEAD_INIT(new_list); + struct list_head *old_list; + struct task_info *new_task, *next_task; + + if (type == 0) { + if (cgrp_info == NULL) + return; + read_lock(&cgrp_info->cgrp_lock); + old_list = &cgrp_info->tasks_list; + } else { + if (cgrp_info == NULL) + return; + old_list = &cgrp_info->oom_list; + } + + list_for_each_entry_safe(task, insert_task, old_list, list) { + new_task = kmalloc(sizeof(struct task_info), GFP_ATOMIC); + if (!new_task) + return; + new_task->tgid = task->tgid; + strscpy(new_task->comm, task->comm, sizeof(new_task->comm)); + new_task->count = task->count; + new_task->pages = NULL; + INIT_LIST_HEAD(&new_task->list); + + //insertion sort + cur = &new_list; + insert_pos = cur->next; + while (insert_pos != &new_list) { + next_task = + list_entry(insert_pos, struct task_info, list); + if (new_task->count >= next_task->count) + break; + cur = insert_pos; + insert_pos = insert_pos->next; + } + + (&new_task->list)->prev = insert_pos->prev; + (insert_pos->prev)->next = (&new_task->list); + (&new_task->list)->next = insert_pos; + insert_pos->prev = (&new_task->list); + } + if (type == 0) + read_unlock(&cgrp_info->cgrp_lock); + + //print + if (type == 1 && (&new_list) != new_list.next) { + seq_puts(m, "oom:\n"); + seq_printf(m, "%10s %20s %20s\n", "pid", "command", + "memory usage (KB)"); + } + if (type == 0) + seq_printf(m, "%10s %20s %20s\n", "pid", "command", + "memory usage (KB)"); + list_for_each_entry_safe(task, insert_task, &new_list, list) { + seq_printf(m, "%10d %20s %20d\n", task->tgid, task->comm, + (task->count) * 4); + } + + list_for_each_entry_safe(task, insert_task, &new_list, list) { + list_del(&task->list); + kfree(task); + } +} + +static struct proc_dir_entry *cgroup_info_read; +#define procfs_file_read "cgroup_memory_usage_per_process" + +void seq_print_tasks(struct cgroup_info *cgroup_info, struct seq_file *m) +{ + if (!cgroup_info) + return; + + print_sorted_tasks_list(cgroup_info, 0, m); +} + +void seq_print_oom_tasks(struct cgroup_info *cgroup_info, struct seq_file *m) +{ + if (!cgroup_info) + return; + + print_sorted_tasks_list(cgroup_info, 1, m); +} + +void seq_print_cgroups(struct seq_file *m) +{ + struct cgroup_info *cgrp, *pos; + + spin_lock(&lock); + list_for_each_entry_safe(cgrp, pos, &all_cgroup_info, list) { + seq_printf(m, "cgroup name : %s\n", cgrp->name); + seq_print_tasks(cgrp, m); + seq_print_oom_tasks(cgrp, m); + seq_puts(m, "\n"); + } + spin_unlock(&lock); +} + +static int memory_usage_show(struct seq_file *m, void *v) +{ + seq_print_cgroups(m); + return 0; +} + +static int __init global_init(void) +{ + int ret = 0; + + cgroup_info_read = + proc_create_single(procfs_file_read, 0, NULL, memory_usage_show); + if (!cgroup_info_read) + return -ENOMEM; + ret = mem_cgroup_charge_kretprobe_init(); + uncharge_folio_kretprobe_init(); + do_exit_kprobe_init(); + mark_oom_victim_kprobe_init(); + cgroup_destroy_locked_kretprobe_init(); + + return ret; +} + +static void __exit global_exit(void) +{ + struct cgroup_info *cgrp_info, *pos; + + mem_cgroup_charge_kretprobe_exit(); + uncharge_folio_kretprobe_exit(); + do_exit_kprobe_exit(); + mark_oom_victim_kprobe_exit(); + cgroup_destroy_locked_kretprobe_exit(); + + remove_proc_entry(procfs_file_read, NULL); + + //release all memory use + list_for_each_entry_safe(cgrp_info, pos, &all_cgroup_info, list) { + list_del(&cgrp_info->list); + destroy_cgroup_info(cgrp_info); + } +} + +module_init(global_init) +module_exit(global_exit) +MODULE_LICENSE("GPL"); diff --git a/tools/probeCgroup/probeCgroup.h b/tools/probeCgroup/probeCgroup.h new file mode 100644 index 000000000000..953a6e0aca31 --- /dev/null +++ b/tools/probeCgroup/probeCgroup.h @@ -0,0 +1,415 @@ +/* SPDX-License-Identifier: GPL-2.0*/ +/* + * probeCgroup.h + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/kprobes.h> +#include <linux/ktime.h> +#include <linux/limits.h> +#include <linux/sched.h> +#include <linux/mm_types.h> +#include <linux/memcontrol.h> +#include <linux/cgroup-defs.h> +#include <linux/kernfs.h> +#include <linux/string.h> +#include <linux/list.h> +#include <linux/oom.h> +#include <linux/fs.h> +#include <linux/proc_fs.h> +#include <linux/huge_mm.h> +#include <linux/page-flags.h> +#include <linux/spinlock.h> +#include <linux/rwlock.h> + +static spinlock_t lock; // global lock for the list of cgroup_info + +struct HashNode { + unsigned long addr; + struct HashNode *next; +}; + +struct HashNode *HashNode_create(unsigned long addr) +{ + struct HashNode *node = NULL; + + node = kzalloc(sizeof(struct HashNode), GFP_ATOMIC); + if (node == NULL) + return NULL; + node->addr = addr; + node->next = NULL; + return node; +} + +struct HashBucket { + struct HashNode *head; + spinlock_t bkt_lock; +}; + +void HashBucket_init(struct HashBucket *bkt) +{ + bkt->head = NULL; + spin_lock_init(&bkt->bkt_lock); +} + +bool HashBucket_insert(struct HashBucket *bkt, unsigned long addr) +{ + struct HashNode *new_node; + struct HashNode *node; + struct HashNode *prev; + bool ret = true; + + if (bkt == NULL) + return false; + + prev = NULL; + new_node = NULL; + new_node = HashNode_create(addr); + + spin_lock(&bkt->bkt_lock); + node = bkt->head; + while (node != NULL && node->addr != addr) { + prev = node; + node = node->next; + } + if (node == NULL) { + if (new_node == NULL) { + pr_info("not enough memory for HashNode\n"); + spin_unlock(&bkt->bkt_lock); + return false; + } + if (bkt->head == NULL) + bkt->head = new_node; + else + prev->next = new_node; + spin_unlock(&bkt->bkt_lock); + ret = true; + } else { + spin_unlock(&bkt->bkt_lock); + kfree(new_node); + ret = false; + } + + return ret; +} + +bool HashBucket_erase(struct HashBucket *bkt, unsigned long addr) +{ + struct HashNode *node; + struct HashNode *prev; + bool ret = true; + + if (bkt == NULL) + return false; + + spin_lock(&bkt->bkt_lock); + node = bkt->head; + prev = NULL; + while (node != NULL && node->addr != addr) { + prev = node; + node = node->next; + } + if (node == NULL) { + spin_unlock(&bkt->bkt_lock); + ret = false; + } else { + if (bkt->head == node) + bkt->head = node->next; + else + prev->next = node->next; + kfree(node); + spin_unlock(&bkt->bkt_lock); + ret = true; + } + + return ret; +} + +void HashBucket_clear(struct HashBucket *bkt) +{ + struct HashNode *node; + struct HashNode *prev; + + if (bkt == NULL) + return; + + spin_lock(&bkt->bkt_lock); + node = bkt->head; + prev = NULL; + bkt->head = NULL; + while (node != NULL) { + prev = node; + node = node->next; + kfree(prev); + } + spin_unlock(&bkt->bkt_lock); +} + +struct HashMap { + unsigned long size; + struct HashBucket *HashTable; +}; + +unsigned long hash_func(unsigned long addr, unsigned long size) +{ + return addr % size; +} + +struct HashMap *HashMap_create(unsigned long size) +{ + struct HashMap *hm = NULL; + struct HashBucket *ht = NULL; + int i = 0; + + hm = kmalloc(sizeof(struct HashMap), GFP_ATOMIC); + if (hm == NULL) + return NULL; + ht = kmalloc((size * sizeof(struct HashBucket)), GFP_ATOMIC); + if (ht == NULL) { + kfree(hm); + return NULL; + } + for (i = 0; i < size; i++) + HashBucket_init(&(ht[i])); + + hm->size = size; + hm->HashTable = ht; + return hm; +} + +bool HashMap_insert(struct HashMap *hm, unsigned long addr) +{ + unsigned long index; + + if (hm == NULL) + return false; + index = hash_func(addr, hm->size); + if (hm->HashTable == NULL) + return false; + return HashBucket_insert(&(hm->HashTable[index]), addr); +} + +bool HashMap_erase(struct HashMap *hm, unsigned long addr) +{ + unsigned long index; + + if (hm == NULL) + return false; + index = hash_func(addr, hm->size); + if (hm->HashTable == NULL) + return false; + return HashBucket_erase(&(hm->HashTable[index]), addr); +} + +void HashMap_clear(struct HashMap *hm) +{ + unsigned long size; + struct HashBucket *ht; + int i; + + if (hm == NULL) + return; + size = hm->size; + ht = hm->HashTable; + if (ht == NULL) + return; + hm->HashTable = NULL; + for (i = 0; i < size; i++) + HashBucket_clear(&(ht[i])); + + kfree(ht); + kfree(hm); +} + +//struct that save the information for each task +struct task_info { + int tgid; + char comm[TASK_COMM_LEN]; + int count; // number of pages + struct HashMap *pages; + struct list_head list; + spinlock_t cnt_lock; +}; + +// struct that save the information for each cgroup +struct cgroup_info { + struct cgroup *cgrp; + struct mem_cgroup *memcg; + int id; + char name[64]; + struct list_head list; + struct list_head tasks_list; + struct list_head oom_list; + rwlock_t cgrp_lock; + unsigned int cached_bytes; +}; + +static LIST_HEAD(all_cgroup_info); // a list that linked all the cgroup_info struct + +static struct task_info *create_task_info(struct task_struct *cur_task) +{ + struct task_info *tsk_info = + kmalloc(sizeof(struct task_info), GFP_ATOMIC); + if (!tsk_info) + return NULL; + + // initialization + tsk_info->tgid = cur_task->tgid; + strscpy(tsk_info->comm, cur_task->comm, sizeof(tsk_info->comm)); + tsk_info->count = 0; + tsk_info->pages = NULL; + tsk_info->pages = HashMap_create(1023); + INIT_LIST_HEAD(&tsk_info->list); + spin_lock_init(&(tsk_info->cnt_lock)); + + return tsk_info; +} + +static int +add_task_to_cgroup_info(struct cgroup_info *cgrp, struct task_info *task) +{ + if (!cgrp || !task) + return -EINVAL; + + write_lock(&cgrp->cgrp_lock); + list_add_tail(&task->list, &cgrp->tasks_list); + write_unlock(&cgrp->cgrp_lock); + return 0; +} + +static int +remove_task_from_cgroup_info(struct cgroup_info *cgrp, struct task_info *task) +{ + if (cgrp == NULL || task == NULL) + return -EINVAL; + + HashMap_clear(task->pages); + // kfree(task->pages); + kfree(task); + return 0; +} + +static struct task_info *find_task_info(struct cgroup_info *cgrp, int tgid) +{ + struct task_info *tsk_info, *pos; + + list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) { + if (tsk_info->tgid == tgid) + return tsk_info; + } + return NULL; +} + +static int +remove_page_from_cgroup_info(unsigned long addr, struct cgroup_info *cgrp) +{ + struct task_info *tsk_info, *pos; + + read_lock(&cgrp->cgrp_lock); + list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) { + if (HashMap_erase(tsk_info->pages, addr)) { + spin_lock(&(tsk_info->cnt_lock)); + tsk_info->count -= folio_nr_pages((struct folio *)addr); + spin_unlock(&(tsk_info->cnt_lock)); + read_unlock(&cgrp->cgrp_lock); + return 0; + } + } + read_unlock(&cgrp->cgrp_lock); + return -1; +} + +static struct cgroup_info *create_cgroup_info(struct cgroup *cgrp, + struct mem_cgroup *memcg) +{ + struct cgroup_info *cgrp_info = + kmalloc(sizeof(struct cgroup_info), GFP_ATOMIC); + struct kernfs_node *kn; + + if (!cgrp_info) + return NULL; + + cgrp_info->cgrp = cgrp; + cgrp_info->memcg = memcg; + cgrp_info->id = (memcg->css).id; + kn = cgrp->kn; + strscpy(cgrp_info->name, kn->name, sizeof(cgrp_info->name)); + INIT_LIST_HEAD(&cgrp_info->list); + INIT_LIST_HEAD(&cgrp_info->tasks_list); + INIT_LIST_HEAD(&cgrp_info->oom_list); + rwlock_init(&(cgrp_info->cgrp_lock)); + cgrp_info->cached_bytes = 0; + + return cgrp_info; +} + +static void destroy_cgroup_info(struct cgroup_info *cgrp_info) +{ + struct task_info *task, *tmp; + + if (!cgrp_info) + return; + + write_lock(&cgrp_info->cgrp_lock); + list_for_each_entry_safe(task, tmp, &cgrp_info->tasks_list, list) { + list_del(&task->list); + remove_task_from_cgroup_info(cgrp_info, task); + } + write_unlock(&cgrp_info->cgrp_lock); + list_for_each_entry_safe(task, tmp, &cgrp_info->oom_list, list) { + list_del(&task->list); + remove_task_from_cgroup_info(cgrp_info, task); + } + + kfree(cgrp_info); +} + +static int add_cgroup_info(struct cgroup_info *cgrp_info) +{ + if (!cgrp_info) + return -EINVAL; + + list_add_tail(&cgrp_info->list, &all_cgroup_info); + return 0; +} + +static struct cgroup_info *find_cgroup_info(int id) +{ + struct cgroup_info *cgrp_info = NULL; + + list_for_each_entry(cgrp_info, &all_cgroup_info, list) { + if (cgrp_info->id == id) + return cgrp_info; + } + + return NULL; +} + +static struct task_info *create_oom_task_info(struct task_info *tsk_info) +{ + struct task_info *oom_tsk_info = + kmalloc(sizeof(struct task_info), GFP_ATOMIC); + if (!oom_tsk_info) + return NULL; + + oom_tsk_info->tgid = tsk_info->tgid; + strscpy(oom_tsk_info->comm, tsk_info->comm, sizeof(oom_tsk_info->comm)); + oom_tsk_info->count = tsk_info->count; + oom_tsk_info->pages = NULL; + INIT_LIST_HEAD(&oom_tsk_info->list); + + return oom_tsk_info; +} + +static int +add_oom_task_to_cgroup_info(struct cgroup_info *cgrp, + struct task_info *oom_task) +{ + if (!cgrp || !oom_task) + return -EINVAL; + list_add_tail(&oom_task->list, &cgrp->oom_list); + return 0; +} diff --git a/tools/probeCgroup/run.sh b/tools/probeCgroup/run.sh new file mode 100755 index 000000000000..7e1ffefe66d1 --- /dev/null +++ b/tools/probeCgroup/run.sh @@ -0,0 +1,8 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +cd scripts +./script1.sh +./script2.sh +./script3.sh diff --git a/tools/probeCgroup/scripts/script1.sh b/tools/probeCgroup/scripts/script1.sh new file mode 100755 index 000000000000..539e8258afb9 --- /dev/null +++ b/tools/probeCgroup/scripts/script1.sh @@ -0,0 +1,10 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +cd .. +make +insmod probeCgroup.ko + +cd testcases +gcc simple-mem-allocate.c -o simple-mem-allocate diff --git a/tools/probeCgroup/scripts/script2.sh b/tools/probeCgroup/scripts/script2.sh new file mode 100755 index 000000000000..2ad515cfb912 --- /dev/null +++ b/tools/probeCgroup/scripts/script2.sh @@ -0,0 +1,14 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +current_dir=$(pwd) +cd /sys/fs/cgroup/memory +mkdir test +cd test +sh -c "echo $$ >> cgroup.procs" +sh -c "echo 5M > memory.limit_in_bytes" +sh -c "echo 0 > memory.swappiness" +cd "$current_dir" +cd ../testcases +./simple-mem-allocate diff --git a/tools/probeCgroup/scripts/script3.sh b/tools/probeCgroup/scripts/script3.sh new file mode 100755 index 000000000000..127eb45de5c9 --- /dev/null +++ b/tools/probeCgroup/scripts/script3.sh @@ -0,0 +1,11 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +cd /proc +cat cgroup_memory_usage_per_process + +cat /sys/fs/cgroup/memory/test/cgroup.procs > /sys/fs/cgroup/memory/cgroup.procs +rmdir /sys/fs/cgroup/memory/test +# cat cgroup_memory_usage_per_process +rmmod probeCgroup diff --git a/tools/probeCgroup/testcases/1_load_unload_test.py b/tools/probeCgroup/testcases/1_load_unload_test.py new file mode 100755 index 000000000000..5389a14a1dac --- /dev/null +++ b/tools/probeCgroup/testcases/1_load_unload_test.py @@ -0,0 +1,24 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +import os +import subprocess +import time + +def test_module_load_unload(): + try: + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + print('loading module successfully!') + subprocess.check_call(['rmmod', 'probeCgroup']) + output = subprocess.check_output(['lsmod']) + assert b'probeCgroup' not in output + print('unloading module successfully!') + except subprocess.CalledProcessError as e: + print('Load unload test failed. Insmod failed.') + except AssertionError as e: + print('Load unload test failed. Cannot remove module.') + +if __name__ == '__main__': + test_module_load_unload() \ No newline at end of file diff --git a/tools/probeCgroup/testcases/2_multiple_process_test.py b/tools/probeCgroup/testcases/2_multiple_process_test.py new file mode 100755 index 000000000000..d88c7f7f2952 --- /dev/null +++ b/tools/probeCgroup/testcases/2_multiple_process_test.py @@ -0,0 +1,48 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage +import os +import subprocess +import time + +def test_multiple_process(num_procs): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroup_name = 'test' + cgroup_path = create_cgroup(cgroup_name) + + processes = [] + pids = [] + + for i in range(num_procs): + process = subprocess.Popen(['./mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + processes.append(process) + pids.append(pid) + + time.sleep(0.1) + try: + count = 0 + for i in range (2000): + count += check_memory_usage(cgroup_name, pids, False) + time.sleep(0.01) + assert count <= 50, f"Memory read by probeCgroup is not accurate" + + remove_cgroup(cgroup_path, pids) + check_memory_usage(cgroup_name, pids, True) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass multiple process test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + remove_cgroup(cgroup_path, pids) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_multiple_process(3) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/3_multiple_cgroup_test.py b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py new file mode 100755 index 000000000000..592a716df877 --- /dev/null +++ b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py @@ -0,0 +1,55 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup +import os +import subprocess +import time + +def test_multiple_cgroup(num_procs, num_cgroups): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroups = [] + processes = {} + pids = {} + for i in range(num_cgroups): + cgroup_name = f'test_{i}' + cgroup_path = create_cgroup(cgroup_name) + cgroups.append((cgroup_name, cgroup_path)) + + for j in range(num_procs): + process = subprocess.Popen(['./mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + if cgroup_path not in processes: + processes[cgroup_path] = [] + processes[cgroup_path].append(process) + if cgroup_path not in pids: + pids[cgroup_path] = [] + pids[cgroup_path].append(pid) + + time.sleep(0.1) + try: + for i in range (100): + for cgroup_name, cgroup_path in cgroups: + check_memory_usage(cgroup_name, pids[cgroup_path], False) + time.sleep(0.01) + + for cgroup_name, cgroup_path in cgroups: + remove_cgroup(cgroup_path, pids[cgroup_path]) + check_memory_usage(cgroup_name, pids[cgroup_path], True) + cleanup(processes[cgroup_path]) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass multiple cgroup test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + for cgroup_name, cgroup_path in cgroups: + remove_cgroup(cgroup_path, pids[cgroup_path]) + cleanup(processes[cgroup_path]) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_multiple_cgroup(2,2) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/4_oom_test.py b/tools/probeCgroup/testcases/4_oom_test.py new file mode 100755 index 000000000000..128a258c56f5 --- /dev/null +++ b/tools/probeCgroup/testcases/4_oom_test.py @@ -0,0 +1,52 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, get_oom_process_memory_usage +import os +import subprocess +import time + +def test_oom(num_procs): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroup_name = 'test' + cgroup_path = create_cgroup(cgroup_name) + + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.limit_in_bytes", 'w') as limit_file: + limit_file.write("5M") + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.swappiness", 'w') as swap_file: + swap_file.write("0") + + processes = [] + pids = [] + for i in range(num_procs): + process = subprocess.Popen(['./simple-mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + processes.append(process) + pids.append(pid) + + time.sleep(6) + + try: + for pid in pids: + memory_usage = get_oom_process_memory_usage(pid, cgroup_name) + assert memory_usage is not None, f"Memory usage(oom) not found for PID {pid}" + assert memory_usage > 0, f"Memory usage should be greater than zero for PID {pid}" + + remove_cgroup(cgroup_path, pids) + check_memory_usage(cgroup_name, pids, True) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass oom test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + remove_cgroup(cgroup_path, pids) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_oom(1) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/5_multiple_threads_test.py b/tools/probeCgroup/testcases/5_multiple_threads_test.py new file mode 100755 index 000000000000..7e1b86dabe48 --- /dev/null +++ b/tools/probeCgroup/testcases/5_multiple_threads_test.py @@ -0,0 +1,45 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage +import os +import subprocess +import time + +def test_multiple_thread(num_procs): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroup_name = 'test' + cgroup_path = create_cgroup(cgroup_name) + + processes = [] + pids = [] + for i in range(num_procs): + process = subprocess.Popen(['./multiple-thread-mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + processes.append(process) + pids.append(pid) + + time.sleep(1) + try: + for i in range (200): + check_memory_usage(cgroup_name, pids, False) + time.sleep(0.01) + + remove_cgroup(cgroup_path, pids) + check_memory_usage(cgroup_name, pids, True) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass multiple threads test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + remove_cgroup(cgroup_path, pids) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_multiple_thread(5) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/cgroup_utils.py b/tools/probeCgroup/testcases/cgroup_utils.py new file mode 100644 index 000000000000..e00c580e724e --- /dev/null +++ b/tools/probeCgroup/testcases/cgroup_utils.py @@ -0,0 +1,114 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +import os +import subprocess +import time + +def create_cgroup(cgroup_name): + cgroup_path = f'/sys/fs/cgroup/memory/{cgroup_name}' + + try: + os.makedirs(cgroup_path) + except FileExistsError: + pass + + return cgroup_path + +def add_process_to_cgroup(cgroup_path, pid): + with open(os.path.join(cgroup_path, 'cgroup.procs'), 'w') as procs_file: + procs_file.write(str(pid)) + +def get_process_memory_usage(pid, cgroup_name): + cur_name = '' + with open('/proc/cgroup_memory_usage_per_process', 'r') as file: + for line in file: + parts = line.strip().split() + if len(parts) >= 4 and parts[0] == 'cgroup': + cur_name = parts[3] + if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid: + return int(parts[2]) + return None + +def get_process_kmem_usage(pid, cgroup_name): + cur_name = '' + with open('/proc/cgroup_memory_usage_per_process', 'r') as file: + for line in file: + parts = line.strip().split() + if len(parts) >= 4 and parts[0] == 'cgroup': + cur_name = parts[3] + if len(parts) >= 4 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid: + return int(parts[3]) + return None + +def remove_cgroup(cgroup_path, pids): + for pid in pids: + with open('/sys/fs/cgroup/memory/cgroup.procs', 'w') as backup_file: + backup_file.write(str(pid)) + os.rmdir(cgroup_path) + return + +def check_memory_usage(cgroup_name, pids, delete): + memory_sum = 0 + for pid in pids: + memory_usage = get_process_memory_usage(pid, cgroup_name) + if delete == False: + assert memory_usage is not None, f"Memory usage not found for PID {pid}" + assert memory_usage >= 0, f"Memory usage should be greater than zero for PID {pid}" + memory_sum += memory_usage + else: + assert memory_usage is None, f"Error: Memory usage should not be available for PID {pid} after deleting the cgroup." + if delete == False: + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.usage_in_bytes", 'r') as file: + content = file.readline().strip() + memory_read = int(content) + memory_sum *= 1024 + delta = abs(memory_read - memory_sum) + # print(f"read: {memory_read}") + # print(f"sum : {memory_sum}") + if (delta > max(memory_read, memory_sum) * 0.1): + return 1 + else: + return 0 + else: + return 0 + +def check_kmem_usage(cgroup_name, pids, delete): + kmem_sum = 0 + for pid in pids: + kmem_usage = get_process_kmem_usage(pid, cgroup_name) + if delete == False: + assert kmem_usage is not None, f"Kmem usage not found for PID {pid}" + assert kmem_usage >= 0, f"Kmem usage should be greater than zero for PID {pid}" + kmem_sum += kmem_usage + else: + assert kmem_usage is None, f"Error: Kmem usage should not be available for PID {pid} after deleting the cgroup." + if delete == False: + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.kmem.usage_in_bytes", 'r') as file: + content = file.readline().strip() + kmem_read = int(content) + kmem_sum *= 1024 + delta = abs(kmem_read - kmem_sum) + # print(f"kmem read: {kmem_read}") + # print(f"kmem sum : {kmem_sum}") + # assert delta <= max(kmem_read, kmem_sum) * 0.2, f"Kmem read by probeCgroup is not accurate, {kmem_read}, {kmem_sum}" + +def cleanup(processes): + for process in processes: + process.terminate() + process.wait() + +def get_oom_process_memory_usage(pid, cgroup_name): + cur_name = '' + oom = False + with open('/proc/cgroup_memory_usage_per_process', 'r') as file: + for line in file: + parts = line.strip().split() + if len(parts) >= 4 and parts[0] == 'cgroup': + cur_name = parts[3] + oom = False + if len(parts) >= 1 and parts[0] == 'oom:': + oom = True + if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid and oom == True: + return int(parts[2]) + return None \ No newline at end of file diff --git a/tools/probeCgroup/testcases/mem-allocate.c b/tools/probeCgroup/testcases/mem-allocate.c new file mode 100644 index 000000000000..e78e37cae61b --- /dev/null +++ b/tools/probeCgroup/testcases/mem-allocate.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * mem-allocate.c - The program to test probeCgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#define MB (1024 * 1024) + +char *arr[40]; + +int main(int argc, char *argv[]) +{ + char *p; + int i = 0; + + while (1) { + for (i = 0; i < 40; i++) { + p = (char *)malloc(MB); + memset(p, 0, MB); + arr[i] = p; + usleep(100000); + } + for (int i = 0; i < 40; i++) { + free(arr[i]); + usleep(100000); + } + } + return 0; +} diff --git a/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c new file mode 100644 index 000000000000..55f4c068f55e --- /dev/null +++ b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * multiple-thread-mem-allocate.c - The program to test probeCgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <pthread.h> + +#define MB (1024 * 1024) + +void *memory_test(void *) +{ + char *arr[25]; + char *p; + int i = 0; + int cnt = 0; + + while (1) { + for (i = 0; i < 20; i++) { + p = (char *)malloc(MB); + memset(p, 0, MB); + arr[i] = p; + usleep(10000); + } + for (int i = 0; i < 20; i++) { + free(arr[i]); + usleep(10000); + } + } +} + +int main(int argc, char *argv[]) +{ + pthread_t threads[4]; + int rc; + + // create threads + for (int i = 0; i < 4; i++) { + rc = pthread_create(&threads[i], NULL, memory_test, NULL); + if (rc != 0) { + fprintf(stderr, "Error creating thread: %d\n", rc); + return 1; + } + } + + for (int i = 0; i < 4; i++) { + rc = pthread_join(threads[i], NULL); + if (rc != 0) { + fprintf(stderr, "Error joining thread: %d\n", rc); + return 1; + } + } + + return 0; +} diff --git a/tools/probeCgroup/testcases/run.py b/tools/probeCgroup/testcases/run.py new file mode 100755 index 000000000000..8ffd0ca720d8 --- /dev/null +++ b/tools/probeCgroup/testcases/run.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +import os +import subprocess +import sys + +def run_tests(directory): + """Run all Python scripts in the given directory.""" + python_files = [f for f in os.listdir(directory) if f.endswith('test.py')] + python_files.sort() + + for filename in python_files: + try: + filepath = os.path.join(directory, filename) + + subprocess.check_call([sys.executable, filepath]) + except subprocess.CalledProcessError as e: + print(f"Error executing {filename}:") + return + except Exception as e: + print(f"Error executing {filename}:") + print(e) + return + +if __name__ == '__main__': + tests_directory = '.' + subprocess.check_call(['gcc', 'mem-allocate.c', '-o', 'mem-allocate']) + subprocess.check_call(['gcc', 'simple-mem-allocate.c', '-o', 'simple-mem-allocate']) + subprocess.check_call(['gcc', 'multiple-thread-mem-allocate.c', '-o', 'multiple-thread-mem-allocate']) + run_tests(tests_directory) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/simple-mem-allocate.c b/tools/probeCgroup/testcases/simple-mem-allocate.c new file mode 100644 index 000000000000..16328b10ba48 --- /dev/null +++ b/tools/probeCgroup/testcases/simple-mem-allocate.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * mem-allocate.c - The program to test probeCgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#define MB (1024 * 1024) + +int main(int argc, char *argv[]) +{ + char *p; + int i = 0; + + while (1) { + p = (char *)malloc(MB); + memset(p, 0, MB); + sleep(1); + } + + return 0; +} -- 2.43.0

2 1

[PATCH OLK-5.10 v2] drm/amd/display: Fix null pointer deref in dcn20_resource.c
by Xiaomeng Zhang 06 Sep '24

06 Sep '24

From: Aurabindo Pillai <aurabindo.pillai(a)amd.com> mainline inclusion from mainline-v6.11-rc1 commit ecbf60782662f0a388493685b85a645a0ba1613c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAMMUG CVE: CVE-2024-43899 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Fixes a hang thats triggered when MPV is run on a DCN401 dGPU: mpv --hwdec=vaapi --vo=gpu --hwdec-codecs=all and then enabling fullscreen playback (double click on the video) The following calltrace will be seen: [ 181.843989] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 181.843997] #PF: supervisor instruction fetch in kernel mode [ 181.844003] #PF: error_code(0x0010) - not-present page [ 181.844009] PGD 0 P4D 0 [ 181.844020] Oops: 0010 [#1] PREEMPT SMP NOPTI [ 181.844028] CPU: 6 PID: 1892 Comm: gnome-shell Tainted: G W OE 6.5.0-41-generic #41~22.04.2-Ubuntu [ 181.844038] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 6302 10/23/2018 [ 181.844044] RIP: 0010:0x0 [ 181.844079] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 181.844084] RSP: 0018:ffffb593c2b8f7b0 EFLAGS: 00010246 [ 181.844093] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 181.844099] RDX: ffffb593c2b8f804 RSI: ffffb593c2b8f7e0 RDI: ffff9e3c8e758400 [ 181.844105] RBP: ffffb593c2b8f7b8 R08: ffffb593c2b8f9c8 R09: ffffb593c2b8f96c [ 181.844110] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb593c2b8f9c8 [ 181.844115] R13: 0000000000000001 R14: ffff9e3c88000000 R15: 0000000000000005 [ 181.844121] FS: 00007c6e323bb5c0(0000) GS:ffff9e3f85f80000(0000) knlGS:0000000000000000 [ 181.844128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 181.844134] CR2: ffffffffffffffd6 CR3: 0000000140fbe000 CR4: 00000000003506e0 [ 181.844141] Call Trace: [ 181.844146] <TASK> [ 181.844153] ? show_regs+0x6d/0x80 [ 181.844167] ? __die+0x24/0x80 [ 181.844179] ? page_fault_oops+0x99/0x1b0 [ 181.844192] ? do_user_addr_fault+0x31d/0x6b0 [ 181.844204] ? exc_page_fault+0x83/0x1b0 [ 181.844216] ? asm_exc_page_fault+0x27/0x30 [ 181.844237] dcn20_get_dcc_compression_cap+0x23/0x30 [amdgpu] [ 181.845115] amdgpu_dm_plane_validate_dcc.constprop.0+0xe5/0x180 [amdgpu] [ 181.845985] amdgpu_dm_plane_fill_plane_buffer_attributes+0x300/0x580 [amdgpu] [ 181.846848] fill_dc_plane_info_and_addr+0x258/0x350 [amdgpu] [ 181.847734] fill_dc_plane_attributes+0x162/0x350 [amdgpu] [ 181.848748] dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.849791] ? dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.850840] amdgpu_dm_atomic_check+0xdfe/0x1760 [amdgpu] Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Conflicts: drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c [The conflicts were due to file path changed] Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com> --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index d587f807dfd7..94bf438c7310 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -2178,10 +2178,11 @@ bool dcn20_get_dcc_compression_cap(const struct dc *dc, const struct dc_dcc_surface_param *input, struct dc_surface_dcc_cap *output) { - return dc->res_pool->hubbub->funcs->get_dcc_compression_cap( - dc->res_pool->hubbub, - input, - output); + if (dc->res_pool->hubbub->funcs->get_dcc_compression_cap) + return dc->res_pool->hubbub->funcs->get_dcc_compression_cap( + dc->res_pool->hubbub, input, output); + + return false; } static void dcn20_destroy_resource_pool(struct resource_pool **pool) -- 2.34.1

2 1

[PATCH OLK-5.10 v2] drm/amd/display: Fix null pointer deref in dcn20_resource.c
by Xiaomeng Zhang 06 Sep '24

06 Sep '24

From: Aurabindo Pillai <aurabindo.pillai(a)amd.com> mainline inclusion from mainline-v6.11-rc1 commit ecbf60782662f0a388493685b85a645a0ba1613c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAMMUG CVE: CVE-2024-43899 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Fixes a hang thats triggered when MPV is run on a DCN401 dGPU: mpv --hwdec=vaapi --vo=gpu --hwdec-codecs=all and then enabling fullscreen playback (double click on the video) The following calltrace will be seen: [ 181.843989] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 181.843997] #PF: supervisor instruction fetch in kernel mode [ 181.844003] #PF: error_code(0x0010) - not-present page [ 181.844009] PGD 0 P4D 0 [ 181.844020] Oops: 0010 [#1] PREEMPT SMP NOPTI [ 181.844028] CPU: 6 PID: 1892 Comm: gnome-shell Tainted: G W OE 6.5.0-41-generic #41~22.04.2-Ubuntu [ 181.844038] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 6302 10/23/2018 [ 181.844044] RIP: 0010:0x0 [ 181.844079] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 181.844084] RSP: 0018:ffffb593c2b8f7b0 EFLAGS: 00010246 [ 181.844093] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 181.844099] RDX: ffffb593c2b8f804 RSI: ffffb593c2b8f7e0 RDI: ffff9e3c8e758400 [ 181.844105] RBP: ffffb593c2b8f7b8 R08: ffffb593c2b8f9c8 R09: ffffb593c2b8f96c [ 181.844110] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb593c2b8f9c8 [ 181.844115] R13: 0000000000000001 R14: ffff9e3c88000000 R15: 0000000000000005 [ 181.844121] FS: 00007c6e323bb5c0(0000) GS:ffff9e3f85f80000(0000) knlGS:0000000000000000 [ 181.844128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 181.844134] CR2: ffffffffffffffd6 CR3: 0000000140fbe000 CR4: 00000000003506e0 [ 181.844141] Call Trace: [ 181.844146] <TASK> [ 181.844153] ? show_regs+0x6d/0x80 [ 181.844167] ? __die+0x24/0x80 [ 181.844179] ? page_fault_oops+0x99/0x1b0 [ 181.844192] ? do_user_addr_fault+0x31d/0x6b0 [ 181.844204] ? exc_page_fault+0x83/0x1b0 [ 181.844216] ? asm_exc_page_fault+0x27/0x30 [ 181.844237] dcn20_get_dcc_compression_cap+0x23/0x30 [amdgpu] [ 181.845115] amdgpu_dm_plane_validate_dcc.constprop.0+0xe5/0x180 [amdgpu] [ 181.845985] amdgpu_dm_plane_fill_plane_buffer_attributes+0x300/0x580 [amdgpu] [ 181.846848] fill_dc_plane_info_and_addr+0x258/0x350 [amdgpu] [ 181.847734] fill_dc_plane_attributes+0x162/0x350 [amdgpu] [ 181.848748] dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.849791] ? dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.850840] amdgpu_dm_atomic_check+0xdfe/0x1760 [amdgpu] Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Conflicts: drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c [The conflicts were due to file path changed] Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com> --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index 53ac82693532..2990793e86a2 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -3290,10 +3290,11 @@ bool dcn20_get_dcc_compression_cap(const struct dc *dc, const struct dc_dcc_surface_param *input, struct dc_surface_dcc_cap *output) { - return dc->res_pool->hubbub->funcs->get_dcc_compression_cap( - dc->res_pool->hubbub, - input, - output); + if (dc->res_pool->hubbub->funcs->get_dcc_compression_cap) + return dc->res_pool->hubbub->funcs->get_dcc_compression_cap( + dc->res_pool->hubbub, input, output); + + return false; } static void dcn20_destroy_resource_pool(struct resource_pool **pool) -- 2.34.1

2 1

[PATCH OLK-5.10] cifs: Fix pages leak when cifs_writedata allocate fails in cifs_writedata_direct_alloc()
by Zizhi Wo 05 Sep '24

05 Sep '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAOH1I -------------------------------- The function cifs_writedata_alloc() first allocates pages and then executes cifs_writedata_direct_alloc(). If the subsequent allocation of wdata fails, the previously allocated pages are not released, leading to a memory leak: [root@fedora debug]# cat kmemleak unreferenced object 0xff110001287aa000 (size 8192): comm "kworker/u220:9", pid 1604, jiffies 4294753971 (age 19.114s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000c80fa56d>] kmemleak_alloc+0x65/0xd0 [<0000000032ff36c6>] __kmalloc+0x641/0xa50 [<00000000da9a4c31>] cifs_writedata_alloc+0x1d/0x30 [<00000000337073aa>] cifs_writepages+0x3e0/0x1990 [<0000000095bfb1fd>] do_writepages+0x31/0xb0 [<00000000321368a2>] __writeback_single_inode+0x4f/0x550 [<00000000f200b315>] writeback_sb_inodes+0x24a/0x7a0 [<00000000badd6d82>] __writeback_inodes_wb+0x88/0x120 [<000000000632fc4a>] wb_writeback+0x3aa/0x4c0 [<00000000c184a6e4>] wb_workfn+0x507/0x770 [<000000005d8f6f94>] process_one_work+0x226/0x540 [<000000000c3dd58a>] worker_thread+0x1b3/0x680 [<00000000457537ac>] kthread+0x159/0x1c0 [<000000000ef58c85>] ret_from_fork+0x1f/0x30 This issue can be avoided by promptly using kvfree. Fixes: 8e7360f67e75 ("CIFS: Add support for direct pages in wdata") Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/cifs/cifssmb.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 95992c93bbe3..1a6c9ac25615 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -2119,12 +2119,17 @@ cifs_writev_complete(struct work_struct *work) struct cifs_writedata * cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) { + struct cifs_writedata *wdata = NULL; + struct page **pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); - if (pages) - return cifs_writedata_direct_alloc(pages, complete); + if (pages) { + wdata = cifs_writedata_direct_alloc(pages, complete); + if (!wdata) + kvfree(pages); + } - return NULL; + return wdata; } struct cifs_writedata * -- 2.39.2

2 1

[PATCH openEuler-1.0-LTS V3] cifs: Fix pages leak when cifs_writedata allocate fails in cifs_writedata_direct_alloc()
by Zizhi Wo 05 Sep '24

05 Sep '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAOH1I -------------------------------- The function cifs_writedata_alloc() first allocates pages and then executes cifs_writedata_direct_alloc(). If the subsequent allocation of wdata fails, the previously allocated pages are not released, leading to a memory leak: [root@fedora debug]# cat kmemleak unreferenced object 0xff110001287aa000 (size 8192): comm "kworker/u220:9", pid 1604, jiffies 4294753971 (age 19.114s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000c80fa56d>] kmemleak_alloc+0x65/0xd0 [<0000000032ff36c6>] __kmalloc+0x641/0xa50 [<00000000da9a4c31>] cifs_writedata_alloc+0x1d/0x30 [<00000000337073aa>] cifs_writepages+0x3e0/0x1990 [<0000000095bfb1fd>] do_writepages+0x31/0xb0 [<00000000321368a2>] __writeback_single_inode+0x4f/0x550 [<00000000f200b315>] writeback_sb_inodes+0x24a/0x7a0 [<00000000badd6d82>] __writeback_inodes_wb+0x88/0x120 [<000000000632fc4a>] wb_writeback+0x3aa/0x4c0 [<00000000c184a6e4>] wb_workfn+0x507/0x770 [<000000005d8f6f94>] process_one_work+0x226/0x540 [<000000000c3dd58a>] worker_thread+0x1b3/0x680 [<00000000457537ac>] kthread+0x159/0x1c0 [<000000000ef58c85>] ret_from_fork+0x1f/0x30 This issue can be avoided by promptly using kvfree. Fixes: 8e7360f67e75 ("CIFS: Add support for direct pages in wdata") Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/cifs/cifssmb.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index cb70f0c6aa1b..15b039820a26 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -2109,12 +2109,17 @@ cifs_writev_complete(struct work_struct *work) struct cifs_writedata * cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) { + struct cifs_writedata *wdata = NULL; + struct page **pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); - if (pages) - return cifs_writedata_direct_alloc(pages, complete); + if (pages) { + wdata = cifs_writedata_direct_alloc(pages, complete); + if (!wdata) + kvfree(pages); + } - return NULL; + return wdata; } struct cifs_writedata * -- 2.39.2

2 1

[PATCH OLK-6.6] ext4: Track data blocks freeing operation in journal
by Zhihao Cheng 05 Sep '24

05 Sep '24

Offering: HULK hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z -------------------------------- Since commit 7f6416dcd4a3f ("ext4: implement writeback iomap path"), the order mode is removed in iomap framework, which lets ext4_mb_clear_bb() free data blocks immediately. It may cause stale data read from truncated file in power-cut case. Following is details: P1 P2 vfs_truncate(file A) ext4_setattr EXT4_I(inode)->i_disksize = attr->ia_size // record in journal ext4_truncate ext4_mb_clear_bb mb_free_blocks // free block i vfs_write(file B) // get block i and writeback >> powercut << In the next mount, inode size and extent tree is stale(before truncated), the content in block i is file B. Fix the problem by tracking free data blocks in journal for iomap/non-writeback case. Fixes: 7f6416dcd4a3 ("ext4: implement writeback iomap path") Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> --- fs/ext4/ext4_jbd2.c | 9 ++------- fs/ext4/ext4_jbd2.h | 7 +++++++ 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 94c8073b49e7..cac0eb5068d3 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -11,18 +11,13 @@ int ext4_inode_journal_mode(struct inode *inode) { if (EXT4_JOURNAL(inode) == NULL) return EXT4_INODE_WRITEBACK_DATA_MODE; /* writeback */ - /* - * Ordered mode is no longer needed for the inode that use the - * iomap path, always use writeback mode. - */ - if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) - return EXT4_INODE_WRITEBACK_DATA_MODE; /* writeback */ /* We do not support data journalling with delayed allocation */ if (!S_ISREG(inode->i_mode) || ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && - !test_opt(inode->i_sb, DELALLOC))) { + !test_opt(inode->i_sb, DELALLOC) && + !ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP))) { /* We do not support data journalling for encrypted data */ if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode)) return EXT4_INODE_ORDERED_DATA_MODE; /* ordered */ diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 0c77697d5e90..c52d1caf6622 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -467,6 +467,13 @@ static inline int ext4_should_journal_data(struct inode *inode) static inline int ext4_should_order_data(struct inode *inode) { + /* + * Ordered mode is no longer needed for the inode that use the + * iomap path, always use writeback mode. + */ + if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) + return 0; /* writeback */ + return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE; } -- 2.31.1

2 1

[openeuler:OLK-6.6 13795/13866] mm/shmem.c:1641:57: error: call to '__compiletime_assert_745' declared with 'error' attribute: BUILD_BUG failed
by kernel test robot 05 Sep '24

05 Sep '24

tree: https://gitee.com/openeuler/kernel.git OLK-6.6 head: 552667f6edabe9e4bc2c86f990f53749bebfcbed commit: c7fcbe1041758d0dedc32502609a73a22884d7b8 [13795/13866] mm: shmem: Merge shmem_alloc_hugefolio() with shmem_alloc_folio() config: x86_64-randconfig-014-20240905 (https://download.01.org/0day-ci/archive/20240905/202409051939.TTEs2Xg7-lkp@…) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240905/202409051939.TTEs2Xg7-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409051939.TTEs2Xg7-lkp@intel.com/ All errors (new ones prefixed by >>): mm/shmem.c:1660:6: warning: variable 'folio' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 1660 | if (!shmem_prepare_alloc(&gfp)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ mm/shmem.c:1688:7: note: uninitialized use occurs here 1688 | if (!folio) | ^~~~~ mm/shmem.c:1660:2: note: remove the 'if' if its condition is always false 1660 | if (!shmem_prepare_alloc(&gfp)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1661 | goto no_mem; | ~~~~~~~~~~~ mm/shmem.c:1653:21: note: initialize the variable 'folio' to silence this warning 1653 | struct folio *folio; | ^ | = NULL >> mm/shmem.c:1641:57: error: call to '__compiletime_assert_745' declared with 'error' attribute: BUILD_BUG failed 1641 | folio = vma_alloc_folio(gfp, order, &pvma, 0, order == HPAGE_PMD_ORDER); | ^ include/linux/huge_mm.h:110:26: note: expanded from macro 'HPAGE_PMD_ORDER' 110 | #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) | ^ include/linux/huge_mm.h:106:28: note: expanded from macro 'HPAGE_PMD_SHIFT' 106 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) | ^ include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG' 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) include/linux/compiler_types.h:439:2: note: expanded from macro '_compiletime_assert' 439 | __compiletime_assert(condition, msg, prefix, suffix) | ^ include/linux/compiler_types.h:432:4: note: expanded from macro '__compiletime_assert' 432 | prefix ## suffix(); \ | ^ <scratch space>:61:1: note: expanded from here 61 | __compiletime_assert_745 | ^ >> mm/shmem.c:1641:57: error: call to '__compiletime_assert_745' declared with 'error' attribute: BUILD_BUG failed include/linux/huge_mm.h:110:26: note: expanded from macro 'HPAGE_PMD_ORDER' 110 | #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) | ^ include/linux/huge_mm.h:106:28: note: expanded from macro 'HPAGE_PMD_SHIFT' 106 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) | ^ include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG' 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) include/linux/compiler_types.h:439:2: note: expanded from macro '_compiletime_assert' 439 | __compiletime_assert(condition, msg, prefix, suffix) | ^ include/linux/compiler_types.h:432:4: note: expanded from macro '__compiletime_assert' 432 | prefix ## suffix(); \ | ^ <scratch space>:61:1: note: expanded from here 61 | __compiletime_assert_745 | ^ 1 warning and 2 errors generated. vim +1641 mm/shmem.c 1633 1634 static struct folio *shmem_alloc_folio(gfp_t gfp, int order, 1635 struct shmem_inode_info *info, pgoff_t index) 1636 { 1637 struct vm_area_struct pvma; 1638 struct folio *folio; 1639 1640 shmem_pseudo_vma_init(&pvma, info, index); > 1641 folio = vma_alloc_folio(gfp, order, &pvma, 0, order == HPAGE_PMD_ORDER); 1642 shmem_pseudo_vma_destroy(&pvma); 1643 1644 return folio; 1645 } 1646 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 0

[PATCH openEuler-23.09] tools: Add dynamic process-level cgroup memory monitoring tool
by Taoxy2004 05 Sep '24

05 Sep '24

community inclusion category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I9JQ4D ---------------------------------------------------------------------- This patch introduces a new tool called "probeCgroup" that enables dynamic monitoring of memory usage at the process level within cgroups. By using kprobes at relevant cgroup functions, this tool can track memory allocations and deallocations for individual processes within a cgroup, providing detailed statistics on memory usage. The key features of the tool include: 1. Dynamic insertion of kprobes at critical points in the cgroup subsystem. 2. Tracking memory allocation and deallocation events for each process by recording page addresses in a hash table. 3. Providing real-time statistics on memory usage at the process level. 4. Providing statistics on memory usage for processes that are OOM. Signed-off-by: Taoxy2004 <221870066(a)smail.nju.edu.cn> --- tools/probeCgroup/Makefile | 7 + tools/probeCgroup/README.md | 29 + tools/probeCgroup/probeCgroup.c | 612 ++++++++++++++++++ tools/probeCgroup/probeCgroup.h | 415 ++++++++++++ tools/probeCgroup/run.sh | 8 + tools/probeCgroup/scripts/script1.sh | 10 + tools/probeCgroup/scripts/script2.sh | 14 + tools/probeCgroup/scripts/script3.sh | 11 + .../testcases/1_load_unload_test.py | 24 + .../testcases/2_multiple_process_test.py | 48 ++ .../testcases/3_multiple_cgroup_test.py | 55 ++ tools/probeCgroup/testcases/4_oom_test.py | 52 ++ .../testcases/5_multiple_threads_test.py | 45 ++ tools/probeCgroup/testcases/cgroup_utils.py | 114 ++++ tools/probeCgroup/testcases/mem-allocate.c | 35 + .../testcases/multiple-thread-mem-allocate.c | 60 ++ tools/probeCgroup/testcases/run.py | 32 + .../testcases/simple-mem-allocate.c | 27 + 18 files changed, 1598 insertions(+) create mode 100644 tools/probeCgroup/Makefile create mode 100644 tools/probeCgroup/README.md create mode 100644 tools/probeCgroup/probeCgroup.c create mode 100644 tools/probeCgroup/probeCgroup.h create mode 100755 tools/probeCgroup/run.sh create mode 100755 tools/probeCgroup/scripts/script1.sh create mode 100755 tools/probeCgroup/scripts/script2.sh create mode 100755 tools/probeCgroup/scripts/script3.sh create mode 100755 tools/probeCgroup/testcases/1_load_unload_test.py create mode 100755 tools/probeCgroup/testcases/2_multiple_process_test.py create mode 100755 tools/probeCgroup/testcases/3_multiple_cgroup_test.py create mode 100755 tools/probeCgroup/testcases/4_oom_test.py create mode 100755 tools/probeCgroup/testcases/5_multiple_threads_test.py create mode 100644 tools/probeCgroup/testcases/cgroup_utils.py create mode 100644 tools/probeCgroup/testcases/mem-allocate.c create mode 100644 tools/probeCgroup/testcases/multiple-thread-mem-allocate.c create mode 100755 tools/probeCgroup/testcases/run.py create mode 100644 tools/probeCgroup/testcases/simple-mem-allocate.c diff --git a/tools/probeCgroup/Makefile b/tools/probeCgroup/Makefile new file mode 100644 index 000000000000..606c951e5487 --- /dev/null +++ b/tools/probeCgroup/Makefile @@ -0,0 +1,7 @@ +obj-m := probeCgroup.o +CROSS_COMPILE = '' +KDIR := /lib/modules/$(shell uname -r)/build +all: + make -C $(KDIR) M=$(PWD) modules +clean: + rm -f *.ko *.o *.mod *.mod.o *.mod.c .*.cmd *.symvers module* diff --git a/tools/probeCgroup/README.md b/tools/probeCgroup/README.md new file mode 100644 index 000000000000..ff0b6fc21228 --- /dev/null +++ b/tools/probeCgroup/README.md @@ -0,0 +1,29 @@ +# probeCgroup + +#### Description +probeCgroup is a process-level cgroup memory monitoring tool based on dynamic tracing (kprobe/kretprobe) technology. By inserting kprobes and kretprobes at the entry and exit points of relevant cgroup functions, this tool can track the memory usage of individual processes within each cgroup in real time. + +#### Software Architecture +1. Dynamic Tracing : Insert kprobes and kretprobes at critical points in cgroup functions to capture memory allocation and release events. +2. Hash Table Recording : Record the addresses of pages currently used by each process in a hash table, so that when a page is released, the process it belongs to can be identified. +3. Real-Time Statistics : Provide real-time statistics showing the memory usage of individual processes within each cgroup. + +#### Instruction +1. Compile and Load the Module + a. In the 'probeCgroup' directory, run the 'make' command to compile the module. + b. Load the module: 'insmod probeCgroup.ko'. + c. View memory statistics: 'cat /proc/cgroup_memory_usage_per_process'. + If an OOM (Out of Memory) event occurs in a cgroup, you can see "oom:" followed by the process that experienced the OOM and its memory usage at the time. + +2. Automate OOM Scenario + In the 'probeCgroup' directory, run './run.sh'. This script will automatically set up an OOM scenario and output the content of '/proc/cgroup_memory_usage_per_process' after execution. + +3. Perform More Tests + a. After compiling the module, in the 'testcases' directory, run './run.py'. + b. This script will perform various tests, including: + - Loading and unloading the module + - Each cgroup containing multiple processes + - Creating multiple cgroups + - OOM scenarios + - Multithreading + c. The tests will take approximately one minute to complete. diff --git a/tools/probeCgroup/probeCgroup.c b/tools/probeCgroup/probeCgroup.c new file mode 100644 index 000000000000..9883cb1e082d --- /dev/null +++ b/tools/probeCgroup/probeCgroup.c @@ -0,0 +1,612 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * probeCgroup.c - A tool used to get memory usage for each process in a cgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include "probeCgroup.h" + +// kretprobe at mem_cgroup_charge +struct charge_data { + struct cgroup *cgrp; + struct mem_cgroup *memcg; + struct task_struct *task; + unsigned long addr; +}; + +static int mem_cgroup_charge_entry_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct charge_data *data; + struct folio *page; + struct mm_struct *mm; + struct mem_cgroup *memcg; + struct cgroup_subsys_state css; + struct cgroup *cgrp; + + if (!current->mm) + return 1; + page = (struct folio *)regs->di; + mm = (struct mm_struct *)regs->si; + if (mm == NULL || page == NULL) + return -1; + memcg = get_mem_cgroup_from_mm(mm); + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + + data = (struct charge_data *)ri->data; + data->memcg = memcg; + data->addr = (unsigned long)page; + data->task = current; + data->cgrp = cgrp; + } + return 0; +} + +NOKPROBE_SYMBOL(mem_cgroup_charge_entry_handler); + +static int mem_cgroup_charge_ret_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + unsigned long retval = regs_return_value(regs); + struct charge_data *data = (struct charge_data *)ri->data; + int id; + struct cgroup_info *cgrp_info; + struct task_info *tsk_info; + + if (data->memcg != NULL && retval == 0) { + id = ((data->memcg)->css).id; + + spin_lock(&lock); + cgrp_info = find_cgroup_info(id); + if (cgrp_info == NULL) { + cgrp_info = create_cgroup_info(data->cgrp, data->memcg); + if (cgrp_info == NULL) { + spin_unlock(&lock); + return -1; + } + add_cgroup_info(cgrp_info); + } + spin_unlock(&lock); + + read_lock(&cgrp_info->cgrp_lock); + tsk_info = find_task_info(cgrp_info, data->task->tgid); + read_unlock(&cgrp_info->cgrp_lock); + + // for some cases, task->comm changes over time + if (tsk_info != NULL + && strcmp(data->task->comm, tsk_info->comm) != 0) { + strscpy(tsk_info->comm, data->task->comm, + sizeof(tsk_info->comm)); + } + + if (tsk_info == NULL) { + tsk_info = create_task_info(data->task); + if (tsk_info == NULL) + return -1; + add_task_to_cgroup_info(cgrp_info, tsk_info); + } + + if (HashMap_insert(tsk_info->pages, data->addr)) { + //update counter + spin_lock(&(tsk_info->cnt_lock)); + tsk_info->count += + folio_nr_pages((struct folio *)data->addr); + spin_unlock(&(tsk_info->cnt_lock)); + } + } + + return 0; + +} + +NOKPROBE_SYMBOL(mem_cgroup_charge_ret_handler); + +static struct kretprobe mem_cgroup_charge_kretprobe = { + .handler = mem_cgroup_charge_ret_handler, + .entry_handler = mem_cgroup_charge_entry_handler, + .data_size = sizeof(struct charge_data), + .maxactive = 20, +}; + +static int mem_cgroup_charge_kretprobe_init(void) +{ + int ret; + + mem_cgroup_charge_kretprobe.kp.symbol_name = "__mem_cgroup_charge"; + ret = register_kretprobe(&mem_cgroup_charge_kretprobe); + if (ret < 0) { + pr_err("register_kretprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted return probe at %s: %p\n", + mem_cgroup_charge_kretprobe.kp.symbol_name, + mem_cgroup_charge_kretprobe.kp.addr); + return 0; +} + +static void mem_cgroup_charge_kretprobe_exit(void) +{ + unregister_kretprobe(&mem_cgroup_charge_kretprobe); + pr_info("kretprobe at %p unregistered\n", + mem_cgroup_charge_kretprobe.kp.addr); + + /* nmissed > 0 suggests that maxactive was set too low. */ + pr_info("Missed probing %d instances of %s\n", + mem_cgroup_charge_kretprobe.nmissed, + mem_cgroup_charge_kretprobe.kp.symbol_name); +} + +// kretprobe at uncharge_folio + +struct uncharge_data { + struct cgroup *cgrp; + struct mem_cgroup *memcg; + unsigned long addr; + bool isKmem; + int nr_pages; +}; + +static int uncharge_folio_entry_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct uncharge_data *data; + struct folio *page; + struct mem_cgroup *memcg = NULL; + struct cgroup_subsys_state css; + struct cgroup *cgrp; + struct obj_cgroup *objcg; + int nr_pages = 0; + + data = (struct uncharge_data *)ri->data; + page = (struct folio *)regs->di; + if (page == NULL) { + data->memcg = NULL; + return -1; + } + if (page->memcg_data & MEMCG_DATA_KMEM) { // if the page belongs to kmem + if (!folio_test_large(page)) + nr_pages = 1; + else + nr_pages = page->_folio_nr_pages; + // nr_pages = thp_nr_pages(page); + objcg = __folio_objcg(page); + if (objcg != NULL) + memcg = objcg->memcg; + data->isKmem = true; + data->nr_pages = nr_pages; + } else { + memcg = __folio_memcg(page); + data->isKmem = false; + } + + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + + data->memcg = memcg; + data->addr = (unsigned long)page; + data->cgrp = cgrp; + } + return 0; +} + +NOKPROBE_SYMBOL(uncharge_folio_entry_handler); + +static int uncharge_folio_ret_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct uncharge_data *data = (struct uncharge_data *)ri->data; + int id; + struct cgroup_info *cgrp_info; + int ret = -1; + + if (data->memcg != NULL) { + id = ((data->memcg)->css).id; + cgrp_info = find_cgroup_info(id); + if (cgrp_info == NULL) + return -1; + if (data->isKmem) + ret = -1; + else + ret = remove_page_from_cgroup_info(data->addr, cgrp_info); + } + + return ret; +} + +NOKPROBE_SYMBOL(uncharge_folio_ret_handler); + +static struct kretprobe uncharge_folio_kretprobe = { + .handler = uncharge_folio_ret_handler, + .entry_handler = uncharge_folio_entry_handler, + .data_size = sizeof(struct uncharge_data), + .maxactive = 20, +}; + +static int uncharge_folio_kretprobe_init(void) +{ + int ret; + + uncharge_folio_kretprobe.kp.symbol_name = "uncharge_folio"; + ret = register_kretprobe(&uncharge_folio_kretprobe); + if (ret < 0) { + pr_err("register_kretprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted return probe at %s: %p\n", + uncharge_folio_kretprobe.kp.symbol_name, + uncharge_folio_kretprobe.kp.addr); + return 0; +} + +static void uncharge_folio_kretprobe_exit(void) +{ + unregister_kretprobe(&uncharge_folio_kretprobe); + pr_info("kretprobe at %p unregistered\n", + uncharge_folio_kretprobe.kp.addr); + + /* nmissed > 0 suggests that maxactive was set too low. */ + pr_info("Missed probing %d instances of %s\n", + uncharge_folio_kretprobe.nmissed, + uncharge_folio_kretprobe.kp.symbol_name); +} + +//kprobe at do_exit +static struct kprobe do_exit_kprobe; +static int do_exit_kprobe_pre_handler(struct kprobe *p, struct pt_regs *regs) +{ + struct task_struct *cur = current; + int tgid = cur->tgid; + struct mm_struct *mm = cur->mm; + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + struct cgroup_subsys_state css; + struct cgroup *cgrp; + struct cgroup_info *cgrp_info; + struct task_info *tsk_info; + int id; + + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + id = (memcg->css).id; + cgrp_info = find_cgroup_info(id); + if (cgrp_info != NULL) { + write_lock(&cgrp_info->cgrp_lock); + tsk_info = find_task_info(cgrp_info, tgid); + if (tsk_info != NULL) { + list_del(&tsk_info->list); + write_unlock(&cgrp_info->cgrp_lock); + remove_task_from_cgroup_info(cgrp_info, + tsk_info); + } else { + write_unlock(&cgrp_info->cgrp_lock); + } + return 0; + } + } + return 0; +} + +static void do_exit_kprobe_post_handler(struct kprobe *p, + struct pt_regs *regs, + unsigned long flags) +{ + +} + +static int do_exit_kprobe_init(void) +{ + do_exit_kprobe.pre_handler = do_exit_kprobe_pre_handler; + do_exit_kprobe.post_handler = do_exit_kprobe_post_handler; + do_exit_kprobe.symbol_name = "do_exit"; + if (register_kprobe(&do_exit_kprobe)) { + pr_alert("register_kprobe on do_exit failed!\n"); + return -EINVAL; + } + return 0; +} + +static void do_exit_kprobe_exit(void) +{ + unregister_kprobe(&do_exit_kprobe); +} + +//kprobe at mark_oom_victim +static struct kprobe mark_oom_victim_kprobe; + +static int mark_oom_victim_kprobe_pre_handler(struct kprobe *p, + struct pt_regs *regs) +{ + struct task_struct *victim; + int tgid; + struct mm_struct *mm; + struct mem_cgroup *memcg; + struct cgroup_subsys_state css; + struct cgroup *cgrp; + struct cgroup_info *cgrp_info; + struct task_info *tsk_info; + int id; + struct task_info *oom_info; + + victim = (struct task_struct *)regs->di; + tgid = victim->tgid; + mm = victim->mm; + memcg = get_mem_cgroup_from_mm(mm); + if (memcg != NULL) { + css = memcg->css; + cgrp = css.cgroup; + id = (memcg->css).id; + cgrp_info = find_cgroup_info(id); + if (cgrp_info != NULL) { + read_lock(&cgrp_info->cgrp_lock); + tsk_info = find_task_info(cgrp_info, tgid); + read_unlock(&cgrp_info->cgrp_lock); + if (tsk_info != NULL) { + oom_info = create_oom_task_info(tsk_info); + if (oom_info != NULL) { + add_oom_task_to_cgroup_info(cgrp_info, + oom_info); + } + return 0; + } + } + } + return 0; +} + +static void mark_oom_victim_kprobe_post_handler(struct kprobe *p, + struct pt_regs *regs, + unsigned long flags) +{ + +} + +static int mark_oom_victim_kprobe_init(void) +{ + mark_oom_victim_kprobe.pre_handler = mark_oom_victim_kprobe_pre_handler; + mark_oom_victim_kprobe.post_handler = + mark_oom_victim_kprobe_post_handler; + mark_oom_victim_kprobe.symbol_name = "mark_oom_victim"; + if (register_kprobe(&mark_oom_victim_kprobe)) { + pr_alert("register_kprobe on mark_oom_victim failed!\n"); + return -EINVAL; + } + return 0; +} + +static void mark_oom_victim_kprobe_exit(void) +{ + unregister_kprobe(&mark_oom_victim_kprobe); +} + +//kretporbe at cgroup_destroy_locked +struct destroy_data { + struct cgroup *cgrp; +}; + +static int cgroup_destroy_locked_entry_handler(struct kretprobe_instance + *ri, struct pt_regs *regs) +{ + struct destroy_data *data; + + data = (struct destroy_data *)ri->data; + data->cgrp = (struct cgroup *)regs->di; + return 0; +} + +NOKPROBE_SYMBOL(cgroup_destroy_locked_entry_handler); + +static int cgroup_destroy_locked_ret_handler(struct kretprobe_instance *ri, + struct pt_regs *regs) +{ + struct destroy_data *data = (struct destroy_data *)ri->data; + struct cgroup *cgrp = data->cgrp; + struct cgroup_info *cgrp_info = NULL; + unsigned long retval = regs_return_value(regs); + + if (!cgrp) + return -1; + if (retval != 0) + return -1; + list_for_each_entry(cgrp_info, &all_cgroup_info, list) { + if (cgrp_info->cgrp == cgrp) { + spin_lock(&lock); + list_del(&cgrp_info->list); + spin_unlock(&lock); + destroy_cgroup_info(cgrp_info); + return 0; + } + } + return -1; +} + +NOKPROBE_SYMBOL(cgroup_destroy_locked_ret_handler); + +static struct kretprobe cgroup_destroy_locked_kretprobe = { + .handler = cgroup_destroy_locked_ret_handler, + .entry_handler = cgroup_destroy_locked_entry_handler, + .data_size = sizeof(struct destroy_data), + .maxactive = 20, +}; + +static int cgroup_destroy_locked_kretprobe_init(void) +{ + int ret; + + cgroup_destroy_locked_kretprobe.kp.symbol_name = + "cgroup_destroy_locked"; + ret = register_kretprobe(&cgroup_destroy_locked_kretprobe); + if (ret < 0) { + pr_err("register_kretprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted return probe at %s: %p\n", + cgroup_destroy_locked_kretprobe.kp.symbol_name, + cgroup_destroy_locked_kretprobe.kp.addr); + return 0; +} + +static void cgroup_destroy_locked_kretprobe_exit(void) +{ + unregister_kretprobe(&cgroup_destroy_locked_kretprobe); + pr_info("kretprobe at %p unregistered\n", + cgroup_destroy_locked_kretprobe.kp.addr); + + /* nmissed > 0 suggests that maxactive was set too low. */ + pr_info("Missed probing %d instances of %s\n", + cgroup_destroy_locked_kretprobe.nmissed, + cgroup_destroy_locked_kretprobe.kp.symbol_name); +} + +// print the tasks in order of their memory usage +static void print_sorted_tasks_list(struct cgroup_info *cgrp_info, + int type, struct seq_file *m) +{ + struct list_head *cur, *insert_pos; + struct task_info *task, *insert_task; + struct list_head new_list = LIST_HEAD_INIT(new_list); + struct list_head *old_list; + struct task_info *new_task, *next_task; + + if (type == 0) { + if (cgrp_info == NULL) + return; + read_lock(&cgrp_info->cgrp_lock); + old_list = &cgrp_info->tasks_list; + } else { + if (cgrp_info == NULL) + return; + old_list = &cgrp_info->oom_list; + } + + list_for_each_entry_safe(task, insert_task, old_list, list) { + new_task = kmalloc(sizeof(struct task_info), GFP_ATOMIC); + if (!new_task) + return; + new_task->tgid = task->tgid; + strscpy(new_task->comm, task->comm, sizeof(new_task->comm)); + new_task->count = task->count; + new_task->pages = NULL; + INIT_LIST_HEAD(&new_task->list); + + //insertion sort + cur = &new_list; + insert_pos = cur->next; + while (insert_pos != &new_list) { + next_task = + list_entry(insert_pos, struct task_info, list); + if (new_task->count >= next_task->count) + break; + cur = insert_pos; + insert_pos = insert_pos->next; + } + + (&new_task->list)->prev = insert_pos->prev; + (insert_pos->prev)->next = (&new_task->list); + (&new_task->list)->next = insert_pos; + insert_pos->prev = (&new_task->list); + } + if (type == 0) + read_unlock(&cgrp_info->cgrp_lock); + + //print + if (type == 1 && (&new_list) != new_list.next) { + seq_puts(m, "oom:\n"); + seq_printf(m, "%10s %20s %20s\n", "pid", "command", + "memory usage (KB)"); + } + if (type == 0) + seq_printf(m, "%10s %20s %20s\n", "pid", "command", + "memory usage (KB)"); + list_for_each_entry_safe(task, insert_task, &new_list, list) { + seq_printf(m, "%10d %20s %20d\n", task->tgid, task->comm, + (task->count) * 4); + } + + list_for_each_entry_safe(task, insert_task, &new_list, list) { + list_del(&task->list); + kfree(task); + } +} + +static struct proc_dir_entry *cgroup_info_read; +#define procfs_file_read "cgroup_memory_usage_per_process" + +void seq_print_tasks(struct cgroup_info *cgroup_info, struct seq_file *m) +{ + if (!cgroup_info) + return; + + print_sorted_tasks_list(cgroup_info, 0, m); +} + +void seq_print_oom_tasks(struct cgroup_info *cgroup_info, struct seq_file *m) +{ + if (!cgroup_info) + return; + + print_sorted_tasks_list(cgroup_info, 1, m); +} + +void seq_print_cgroups(struct seq_file *m) +{ + struct cgroup_info *cgrp, *pos; + + spin_lock(&lock); + list_for_each_entry_safe(cgrp, pos, &all_cgroup_info, list) { + seq_printf(m, "cgroup name : %s\n", cgrp->name); + seq_print_tasks(cgrp, m); + seq_print_oom_tasks(cgrp, m); + seq_puts(m, "\n"); + } + spin_unlock(&lock); +} + +static int memory_usage_show(struct seq_file *m, void *v) +{ + seq_print_cgroups(m); + return 0; +} + +static int __init global_init(void) +{ + int ret = 0; + + cgroup_info_read = + proc_create_single(procfs_file_read, 0, NULL, memory_usage_show); + if (!cgroup_info_read) + return -ENOMEM; + ret = mem_cgroup_charge_kretprobe_init(); + uncharge_folio_kretprobe_init(); + do_exit_kprobe_init(); + mark_oom_victim_kprobe_init(); + cgroup_destroy_locked_kretprobe_init(); + + return ret; +} + +static void __exit global_exit(void) +{ + struct cgroup_info *cgrp_info, *pos; + + mem_cgroup_charge_kretprobe_exit(); + uncharge_folio_kretprobe_exit(); + do_exit_kprobe_exit(); + mark_oom_victim_kprobe_exit(); + cgroup_destroy_locked_kretprobe_exit(); + + remove_proc_entry(procfs_file_read, NULL); + + //release all memory use + list_for_each_entry_safe(cgrp_info, pos, &all_cgroup_info, list) { + list_del(&cgrp_info->list); + destroy_cgroup_info(cgrp_info); + } +} + +module_init(global_init) +module_exit(global_exit) +MODULE_LICENSE("GPL"); diff --git a/tools/probeCgroup/probeCgroup.h b/tools/probeCgroup/probeCgroup.h new file mode 100644 index 000000000000..953a6e0aca31 --- /dev/null +++ b/tools/probeCgroup/probeCgroup.h @@ -0,0 +1,415 @@ +/* SPDX-License-Identifier: GPL-2.0*/ +/* + * probeCgroup.h + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/kprobes.h> +#include <linux/ktime.h> +#include <linux/limits.h> +#include <linux/sched.h> +#include <linux/mm_types.h> +#include <linux/memcontrol.h> +#include <linux/cgroup-defs.h> +#include <linux/kernfs.h> +#include <linux/string.h> +#include <linux/list.h> +#include <linux/oom.h> +#include <linux/fs.h> +#include <linux/proc_fs.h> +#include <linux/huge_mm.h> +#include <linux/page-flags.h> +#include <linux/spinlock.h> +#include <linux/rwlock.h> + +static spinlock_t lock; // global lock for the list of cgroup_info + +struct HashNode { + unsigned long addr; + struct HashNode *next; +}; + +struct HashNode *HashNode_create(unsigned long addr) +{ + struct HashNode *node = NULL; + + node = kzalloc(sizeof(struct HashNode), GFP_ATOMIC); + if (node == NULL) + return NULL; + node->addr = addr; + node->next = NULL; + return node; +} + +struct HashBucket { + struct HashNode *head; + spinlock_t bkt_lock; +}; + +void HashBucket_init(struct HashBucket *bkt) +{ + bkt->head = NULL; + spin_lock_init(&bkt->bkt_lock); +} + +bool HashBucket_insert(struct HashBucket *bkt, unsigned long addr) +{ + struct HashNode *new_node; + struct HashNode *node; + struct HashNode *prev; + bool ret = true; + + if (bkt == NULL) + return false; + + prev = NULL; + new_node = NULL; + new_node = HashNode_create(addr); + + spin_lock(&bkt->bkt_lock); + node = bkt->head; + while (node != NULL && node->addr != addr) { + prev = node; + node = node->next; + } + if (node == NULL) { + if (new_node == NULL) { + pr_info("not enough memory for HashNode\n"); + spin_unlock(&bkt->bkt_lock); + return false; + } + if (bkt->head == NULL) + bkt->head = new_node; + else + prev->next = new_node; + spin_unlock(&bkt->bkt_lock); + ret = true; + } else { + spin_unlock(&bkt->bkt_lock); + kfree(new_node); + ret = false; + } + + return ret; +} + +bool HashBucket_erase(struct HashBucket *bkt, unsigned long addr) +{ + struct HashNode *node; + struct HashNode *prev; + bool ret = true; + + if (bkt == NULL) + return false; + + spin_lock(&bkt->bkt_lock); + node = bkt->head; + prev = NULL; + while (node != NULL && node->addr != addr) { + prev = node; + node = node->next; + } + if (node == NULL) { + spin_unlock(&bkt->bkt_lock); + ret = false; + } else { + if (bkt->head == node) + bkt->head = node->next; + else + prev->next = node->next; + kfree(node); + spin_unlock(&bkt->bkt_lock); + ret = true; + } + + return ret; +} + +void HashBucket_clear(struct HashBucket *bkt) +{ + struct HashNode *node; + struct HashNode *prev; + + if (bkt == NULL) + return; + + spin_lock(&bkt->bkt_lock); + node = bkt->head; + prev = NULL; + bkt->head = NULL; + while (node != NULL) { + prev = node; + node = node->next; + kfree(prev); + } + spin_unlock(&bkt->bkt_lock); +} + +struct HashMap { + unsigned long size; + struct HashBucket *HashTable; +}; + +unsigned long hash_func(unsigned long addr, unsigned long size) +{ + return addr % size; +} + +struct HashMap *HashMap_create(unsigned long size) +{ + struct HashMap *hm = NULL; + struct HashBucket *ht = NULL; + int i = 0; + + hm = kmalloc(sizeof(struct HashMap), GFP_ATOMIC); + if (hm == NULL) + return NULL; + ht = kmalloc((size * sizeof(struct HashBucket)), GFP_ATOMIC); + if (ht == NULL) { + kfree(hm); + return NULL; + } + for (i = 0; i < size; i++) + HashBucket_init(&(ht[i])); + + hm->size = size; + hm->HashTable = ht; + return hm; +} + +bool HashMap_insert(struct HashMap *hm, unsigned long addr) +{ + unsigned long index; + + if (hm == NULL) + return false; + index = hash_func(addr, hm->size); + if (hm->HashTable == NULL) + return false; + return HashBucket_insert(&(hm->HashTable[index]), addr); +} + +bool HashMap_erase(struct HashMap *hm, unsigned long addr) +{ + unsigned long index; + + if (hm == NULL) + return false; + index = hash_func(addr, hm->size); + if (hm->HashTable == NULL) + return false; + return HashBucket_erase(&(hm->HashTable[index]), addr); +} + +void HashMap_clear(struct HashMap *hm) +{ + unsigned long size; + struct HashBucket *ht; + int i; + + if (hm == NULL) + return; + size = hm->size; + ht = hm->HashTable; + if (ht == NULL) + return; + hm->HashTable = NULL; + for (i = 0; i < size; i++) + HashBucket_clear(&(ht[i])); + + kfree(ht); + kfree(hm); +} + +//struct that save the information for each task +struct task_info { + int tgid; + char comm[TASK_COMM_LEN]; + int count; // number of pages + struct HashMap *pages; + struct list_head list; + spinlock_t cnt_lock; +}; + +// struct that save the information for each cgroup +struct cgroup_info { + struct cgroup *cgrp; + struct mem_cgroup *memcg; + int id; + char name[64]; + struct list_head list; + struct list_head tasks_list; + struct list_head oom_list; + rwlock_t cgrp_lock; + unsigned int cached_bytes; +}; + +static LIST_HEAD(all_cgroup_info); // a list that linked all the cgroup_info struct + +static struct task_info *create_task_info(struct task_struct *cur_task) +{ + struct task_info *tsk_info = + kmalloc(sizeof(struct task_info), GFP_ATOMIC); + if (!tsk_info) + return NULL; + + // initialization + tsk_info->tgid = cur_task->tgid; + strscpy(tsk_info->comm, cur_task->comm, sizeof(tsk_info->comm)); + tsk_info->count = 0; + tsk_info->pages = NULL; + tsk_info->pages = HashMap_create(1023); + INIT_LIST_HEAD(&tsk_info->list); + spin_lock_init(&(tsk_info->cnt_lock)); + + return tsk_info; +} + +static int +add_task_to_cgroup_info(struct cgroup_info *cgrp, struct task_info *task) +{ + if (!cgrp || !task) + return -EINVAL; + + write_lock(&cgrp->cgrp_lock); + list_add_tail(&task->list, &cgrp->tasks_list); + write_unlock(&cgrp->cgrp_lock); + return 0; +} + +static int +remove_task_from_cgroup_info(struct cgroup_info *cgrp, struct task_info *task) +{ + if (cgrp == NULL || task == NULL) + return -EINVAL; + + HashMap_clear(task->pages); + // kfree(task->pages); + kfree(task); + return 0; +} + +static struct task_info *find_task_info(struct cgroup_info *cgrp, int tgid) +{ + struct task_info *tsk_info, *pos; + + list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) { + if (tsk_info->tgid == tgid) + return tsk_info; + } + return NULL; +} + +static int +remove_page_from_cgroup_info(unsigned long addr, struct cgroup_info *cgrp) +{ + struct task_info *tsk_info, *pos; + + read_lock(&cgrp->cgrp_lock); + list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) { + if (HashMap_erase(tsk_info->pages, addr)) { + spin_lock(&(tsk_info->cnt_lock)); + tsk_info->count -= folio_nr_pages((struct folio *)addr); + spin_unlock(&(tsk_info->cnt_lock)); + read_unlock(&cgrp->cgrp_lock); + return 0; + } + } + read_unlock(&cgrp->cgrp_lock); + return -1; +} + +static struct cgroup_info *create_cgroup_info(struct cgroup *cgrp, + struct mem_cgroup *memcg) +{ + struct cgroup_info *cgrp_info = + kmalloc(sizeof(struct cgroup_info), GFP_ATOMIC); + struct kernfs_node *kn; + + if (!cgrp_info) + return NULL; + + cgrp_info->cgrp = cgrp; + cgrp_info->memcg = memcg; + cgrp_info->id = (memcg->css).id; + kn = cgrp->kn; + strscpy(cgrp_info->name, kn->name, sizeof(cgrp_info->name)); + INIT_LIST_HEAD(&cgrp_info->list); + INIT_LIST_HEAD(&cgrp_info->tasks_list); + INIT_LIST_HEAD(&cgrp_info->oom_list); + rwlock_init(&(cgrp_info->cgrp_lock)); + cgrp_info->cached_bytes = 0; + + return cgrp_info; +} + +static void destroy_cgroup_info(struct cgroup_info *cgrp_info) +{ + struct task_info *task, *tmp; + + if (!cgrp_info) + return; + + write_lock(&cgrp_info->cgrp_lock); + list_for_each_entry_safe(task, tmp, &cgrp_info->tasks_list, list) { + list_del(&task->list); + remove_task_from_cgroup_info(cgrp_info, task); + } + write_unlock(&cgrp_info->cgrp_lock); + list_for_each_entry_safe(task, tmp, &cgrp_info->oom_list, list) { + list_del(&task->list); + remove_task_from_cgroup_info(cgrp_info, task); + } + + kfree(cgrp_info); +} + +static int add_cgroup_info(struct cgroup_info *cgrp_info) +{ + if (!cgrp_info) + return -EINVAL; + + list_add_tail(&cgrp_info->list, &all_cgroup_info); + return 0; +} + +static struct cgroup_info *find_cgroup_info(int id) +{ + struct cgroup_info *cgrp_info = NULL; + + list_for_each_entry(cgrp_info, &all_cgroup_info, list) { + if (cgrp_info->id == id) + return cgrp_info; + } + + return NULL; +} + +static struct task_info *create_oom_task_info(struct task_info *tsk_info) +{ + struct task_info *oom_tsk_info = + kmalloc(sizeof(struct task_info), GFP_ATOMIC); + if (!oom_tsk_info) + return NULL; + + oom_tsk_info->tgid = tsk_info->tgid; + strscpy(oom_tsk_info->comm, tsk_info->comm, sizeof(oom_tsk_info->comm)); + oom_tsk_info->count = tsk_info->count; + oom_tsk_info->pages = NULL; + INIT_LIST_HEAD(&oom_tsk_info->list); + + return oom_tsk_info; +} + +static int +add_oom_task_to_cgroup_info(struct cgroup_info *cgrp, + struct task_info *oom_task) +{ + if (!cgrp || !oom_task) + return -EINVAL; + list_add_tail(&oom_task->list, &cgrp->oom_list); + return 0; +} diff --git a/tools/probeCgroup/run.sh b/tools/probeCgroup/run.sh new file mode 100755 index 000000000000..7e1ffefe66d1 --- /dev/null +++ b/tools/probeCgroup/run.sh @@ -0,0 +1,8 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +cd scripts +./script1.sh +./script2.sh +./script3.sh diff --git a/tools/probeCgroup/scripts/script1.sh b/tools/probeCgroup/scripts/script1.sh new file mode 100755 index 000000000000..539e8258afb9 --- /dev/null +++ b/tools/probeCgroup/scripts/script1.sh @@ -0,0 +1,10 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +cd .. +make +insmod probeCgroup.ko + +cd testcases +gcc simple-mem-allocate.c -o simple-mem-allocate diff --git a/tools/probeCgroup/scripts/script2.sh b/tools/probeCgroup/scripts/script2.sh new file mode 100755 index 000000000000..2ad515cfb912 --- /dev/null +++ b/tools/probeCgroup/scripts/script2.sh @@ -0,0 +1,14 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +current_dir=$(pwd) +cd /sys/fs/cgroup/memory +mkdir test +cd test +sh -c "echo $$ >> cgroup.procs" +sh -c "echo 5M > memory.limit_in_bytes" +sh -c "echo 0 > memory.swappiness" +cd "$current_dir" +cd ../testcases +./simple-mem-allocate diff --git a/tools/probeCgroup/scripts/script3.sh b/tools/probeCgroup/scripts/script3.sh new file mode 100755 index 000000000000..127eb45de5c9 --- /dev/null +++ b/tools/probeCgroup/scripts/script3.sh @@ -0,0 +1,11 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +cd /proc +cat cgroup_memory_usage_per_process + +cat /sys/fs/cgroup/memory/test/cgroup.procs > /sys/fs/cgroup/memory/cgroup.procs +rmdir /sys/fs/cgroup/memory/test +# cat cgroup_memory_usage_per_process +rmmod probeCgroup diff --git a/tools/probeCgroup/testcases/1_load_unload_test.py b/tools/probeCgroup/testcases/1_load_unload_test.py new file mode 100755 index 000000000000..5389a14a1dac --- /dev/null +++ b/tools/probeCgroup/testcases/1_load_unload_test.py @@ -0,0 +1,24 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +import os +import subprocess +import time + +def test_module_load_unload(): + try: + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + print('loading module successfully!') + subprocess.check_call(['rmmod', 'probeCgroup']) + output = subprocess.check_output(['lsmod']) + assert b'probeCgroup' not in output + print('unloading module successfully!') + except subprocess.CalledProcessError as e: + print('Load unload test failed. Insmod failed.') + except AssertionError as e: + print('Load unload test failed. Cannot remove module.') + +if __name__ == '__main__': + test_module_load_unload() \ No newline at end of file diff --git a/tools/probeCgroup/testcases/2_multiple_process_test.py b/tools/probeCgroup/testcases/2_multiple_process_test.py new file mode 100755 index 000000000000..d88c7f7f2952 --- /dev/null +++ b/tools/probeCgroup/testcases/2_multiple_process_test.py @@ -0,0 +1,48 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage +import os +import subprocess +import time + +def test_multiple_process(num_procs): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroup_name = 'test' + cgroup_path = create_cgroup(cgroup_name) + + processes = [] + pids = [] + + for i in range(num_procs): + process = subprocess.Popen(['./mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + processes.append(process) + pids.append(pid) + + time.sleep(0.1) + try: + count = 0 + for i in range (2000): + count += check_memory_usage(cgroup_name, pids, False) + time.sleep(0.01) + assert count <= 50, f"Memory read by probeCgroup is not accurate" + + remove_cgroup(cgroup_path, pids) + check_memory_usage(cgroup_name, pids, True) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass multiple process test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + remove_cgroup(cgroup_path, pids) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_multiple_process(3) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/3_multiple_cgroup_test.py b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py new file mode 100755 index 000000000000..592a716df877 --- /dev/null +++ b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py @@ -0,0 +1,55 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup +import os +import subprocess +import time + +def test_multiple_cgroup(num_procs, num_cgroups): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroups = [] + processes = {} + pids = {} + for i in range(num_cgroups): + cgroup_name = f'test_{i}' + cgroup_path = create_cgroup(cgroup_name) + cgroups.append((cgroup_name, cgroup_path)) + + for j in range(num_procs): + process = subprocess.Popen(['./mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + if cgroup_path not in processes: + processes[cgroup_path] = [] + processes[cgroup_path].append(process) + if cgroup_path not in pids: + pids[cgroup_path] = [] + pids[cgroup_path].append(pid) + + time.sleep(0.1) + try: + for i in range (100): + for cgroup_name, cgroup_path in cgroups: + check_memory_usage(cgroup_name, pids[cgroup_path], False) + time.sleep(0.01) + + for cgroup_name, cgroup_path in cgroups: + remove_cgroup(cgroup_path, pids[cgroup_path]) + check_memory_usage(cgroup_name, pids[cgroup_path], True) + cleanup(processes[cgroup_path]) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass multiple cgroup test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + for cgroup_name, cgroup_path in cgroups: + remove_cgroup(cgroup_path, pids[cgroup_path]) + cleanup(processes[cgroup_path]) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_multiple_cgroup(2,2) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/4_oom_test.py b/tools/probeCgroup/testcases/4_oom_test.py new file mode 100755 index 000000000000..128a258c56f5 --- /dev/null +++ b/tools/probeCgroup/testcases/4_oom_test.py @@ -0,0 +1,52 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, get_oom_process_memory_usage +import os +import subprocess +import time + +def test_oom(num_procs): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroup_name = 'test' + cgroup_path = create_cgroup(cgroup_name) + + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.limit_in_bytes", 'w') as limit_file: + limit_file.write("5M") + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.swappiness", 'w') as swap_file: + swap_file.write("0") + + processes = [] + pids = [] + for i in range(num_procs): + process = subprocess.Popen(['./simple-mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + processes.append(process) + pids.append(pid) + + time.sleep(6) + + try: + for pid in pids: + memory_usage = get_oom_process_memory_usage(pid, cgroup_name) + assert memory_usage is not None, f"Memory usage(oom) not found for PID {pid}" + assert memory_usage > 0, f"Memory usage should be greater than zero for PID {pid}" + + remove_cgroup(cgroup_path, pids) + check_memory_usage(cgroup_name, pids, True) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass oom test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + remove_cgroup(cgroup_path, pids) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_oom(1) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/5_multiple_threads_test.py b/tools/probeCgroup/testcases/5_multiple_threads_test.py new file mode 100755 index 000000000000..7e1b86dabe48 --- /dev/null +++ b/tools/probeCgroup/testcases/5_multiple_threads_test.py @@ -0,0 +1,45 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage +import os +import subprocess +import time + +def test_multiple_thread(num_procs): + subprocess.check_call(['insmod', '../probeCgroup.ko']) + time.sleep(1) + + cgroup_name = 'test' + cgroup_path = create_cgroup(cgroup_name) + + processes = [] + pids = [] + for i in range(num_procs): + process = subprocess.Popen(['./multiple-thread-mem-allocate']) + pid = process.pid + add_process_to_cgroup(cgroup_path, pid) + processes.append(process) + pids.append(pid) + + time.sleep(1) + try: + for i in range (200): + check_memory_usage(cgroup_name, pids, False) + time.sleep(0.01) + + remove_cgroup(cgroup_path, pids) + check_memory_usage(cgroup_name, pids, True) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + + print('pass multiple threads test!') + except AssertionError as e: + print(f"Assertion failed: {e}") + remove_cgroup(cgroup_path, pids) + cleanup(processes) + subprocess.check_call(['rmmod', 'probeCgroup']) + +if __name__ == '__main__': + test_multiple_thread(5) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/cgroup_utils.py b/tools/probeCgroup/testcases/cgroup_utils.py new file mode 100644 index 000000000000..e00c580e724e --- /dev/null +++ b/tools/probeCgroup/testcases/cgroup_utils.py @@ -0,0 +1,114 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +import os +import subprocess +import time + +def create_cgroup(cgroup_name): + cgroup_path = f'/sys/fs/cgroup/memory/{cgroup_name}' + + try: + os.makedirs(cgroup_path) + except FileExistsError: + pass + + return cgroup_path + +def add_process_to_cgroup(cgroup_path, pid): + with open(os.path.join(cgroup_path, 'cgroup.procs'), 'w') as procs_file: + procs_file.write(str(pid)) + +def get_process_memory_usage(pid, cgroup_name): + cur_name = '' + with open('/proc/cgroup_memory_usage_per_process', 'r') as file: + for line in file: + parts = line.strip().split() + if len(parts) >= 4 and parts[0] == 'cgroup': + cur_name = parts[3] + if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid: + return int(parts[2]) + return None + +def get_process_kmem_usage(pid, cgroup_name): + cur_name = '' + with open('/proc/cgroup_memory_usage_per_process', 'r') as file: + for line in file: + parts = line.strip().split() + if len(parts) >= 4 and parts[0] == 'cgroup': + cur_name = parts[3] + if len(parts) >= 4 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid: + return int(parts[3]) + return None + +def remove_cgroup(cgroup_path, pids): + for pid in pids: + with open('/sys/fs/cgroup/memory/cgroup.procs', 'w') as backup_file: + backup_file.write(str(pid)) + os.rmdir(cgroup_path) + return + +def check_memory_usage(cgroup_name, pids, delete): + memory_sum = 0 + for pid in pids: + memory_usage = get_process_memory_usage(pid, cgroup_name) + if delete == False: + assert memory_usage is not None, f"Memory usage not found for PID {pid}" + assert memory_usage >= 0, f"Memory usage should be greater than zero for PID {pid}" + memory_sum += memory_usage + else: + assert memory_usage is None, f"Error: Memory usage should not be available for PID {pid} after deleting the cgroup." + if delete == False: + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.usage_in_bytes", 'r') as file: + content = file.readline().strip() + memory_read = int(content) + memory_sum *= 1024 + delta = abs(memory_read - memory_sum) + # print(f"read: {memory_read}") + # print(f"sum : {memory_sum}") + if (delta > max(memory_read, memory_sum) * 0.1): + return 1 + else: + return 0 + else: + return 0 + +def check_kmem_usage(cgroup_name, pids, delete): + kmem_sum = 0 + for pid in pids: + kmem_usage = get_process_kmem_usage(pid, cgroup_name) + if delete == False: + assert kmem_usage is not None, f"Kmem usage not found for PID {pid}" + assert kmem_usage >= 0, f"Kmem usage should be greater than zero for PID {pid}" + kmem_sum += kmem_usage + else: + assert kmem_usage is None, f"Error: Kmem usage should not be available for PID {pid} after deleting the cgroup." + if delete == False: + with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.kmem.usage_in_bytes", 'r') as file: + content = file.readline().strip() + kmem_read = int(content) + kmem_sum *= 1024 + delta = abs(kmem_read - kmem_sum) + # print(f"kmem read: {kmem_read}") + # print(f"kmem sum : {kmem_sum}") + # assert delta <= max(kmem_read, kmem_sum) * 0.2, f"Kmem read by probeCgroup is not accurate, {kmem_read}, {kmem_sum}" + +def cleanup(processes): + for process in processes: + process.terminate() + process.wait() + +def get_oom_process_memory_usage(pid, cgroup_name): + cur_name = '' + oom = False + with open('/proc/cgroup_memory_usage_per_process', 'r') as file: + for line in file: + parts = line.strip().split() + if len(parts) >= 4 and parts[0] == 'cgroup': + cur_name = parts[3] + oom = False + if len(parts) >= 1 and parts[0] == 'oom:': + oom = True + if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid and oom == True: + return int(parts[2]) + return None \ No newline at end of file diff --git a/tools/probeCgroup/testcases/mem-allocate.c b/tools/probeCgroup/testcases/mem-allocate.c new file mode 100644 index 000000000000..e78e37cae61b --- /dev/null +++ b/tools/probeCgroup/testcases/mem-allocate.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * mem-allocate.c - The program to test probeCgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#define MB (1024 * 1024) + +char *arr[40]; + +int main(int argc, char *argv[]) +{ + char *p; + int i = 0; + + while (1) { + for (i = 0; i < 40; i++) { + p = (char *)malloc(MB); + memset(p, 0, MB); + arr[i] = p; + usleep(100000); + } + for (int i = 0; i < 40; i++) { + free(arr[i]); + usleep(100000); + } + } + return 0; +} diff --git a/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c new file mode 100644 index 000000000000..55f4c068f55e --- /dev/null +++ b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * multiple-thread-mem-allocate.c - The program to test probeCgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <pthread.h> + +#define MB (1024 * 1024) + +void *memory_test(void *) +{ + char *arr[25]; + char *p; + int i = 0; + int cnt = 0; + + while (1) { + for (i = 0; i < 20; i++) { + p = (char *)malloc(MB); + memset(p, 0, MB); + arr[i] = p; + usleep(10000); + } + for (int i = 0; i < 20; i++) { + free(arr[i]); + usleep(10000); + } + } +} + +int main(int argc, char *argv[]) +{ + pthread_t threads[4]; + int rc; + + // create threads + for (int i = 0; i < 4; i++) { + rc = pthread_create(&threads[i], NULL, memory_test, NULL); + if (rc != 0) { + fprintf(stderr, "Error creating thread: %d\n", rc); + return 1; + } + } + + for (int i = 0; i < 4; i++) { + rc = pthread_join(threads[i], NULL); + if (rc != 0) { + fprintf(stderr, "Error joining thread: %d\n", rc); + return 1; + } + } + + return 0; +} diff --git a/tools/probeCgroup/testcases/run.py b/tools/probeCgroup/testcases/run.py new file mode 100755 index 000000000000..8ffd0ca720d8 --- /dev/null +++ b/tools/probeCgroup/testcases/run.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + +import os +import subprocess +import sys + +def run_tests(directory): + """Run all Python scripts in the given directory.""" + python_files = [f for f in os.listdir(directory) if f.endswith('test.py')] + python_files.sort() + + for filename in python_files: + try: + filepath = os.path.join(directory, filename) + + subprocess.check_call([sys.executable, filepath]) + except subprocess.CalledProcessError as e: + print(f"Error executing {filename}:") + return + except Exception as e: + print(f"Error executing {filename}:") + print(e) + return + +if __name__ == '__main__': + tests_directory = '.' + subprocess.check_call(['gcc', 'mem-allocate.c', '-o', 'mem-allocate']) + subprocess.check_call(['gcc', 'simple-mem-allocate.c', '-o', 'simple-mem-allocate']) + subprocess.check_call(['gcc', 'multiple-thread-mem-allocate.c', '-o', 'multiple-thread-mem-allocate']) + run_tests(tests_directory) \ No newline at end of file diff --git a/tools/probeCgroup/testcases/simple-mem-allocate.c b/tools/probeCgroup/testcases/simple-mem-allocate.c new file mode 100644 index 000000000000..16328b10ba48 --- /dev/null +++ b/tools/probeCgroup/testcases/simple-mem-allocate.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * mem-allocate.c - The program to test probeCgroup + * + * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#define MB (1024 * 1024) + +int main(int argc, char *argv[]) +{ + char *p; + int i = 0; + + while (1) { + p = (char *)malloc(MB); + memset(p, 0, MB); + sleep(1); + } + + return 0; +} -- 2.43.0

2 1

openEuler Kernel SIG双周例会
by openEuler conference 05 Sep '24

05 Sep '24

您好！ Kernel 邀请您参加 2024-09-06 14:00 召开的WeLink会议(自动录制) 会议主题：openEuler Kernel SIG双周例会会议内容： 1. 进展update 2. 议题征集中（新增议题可回复本邮件申请，也可直接填写至会议纪要看板）会议链接：https://meeting.huaweicloud.com:36443/#/j/989033877 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 更多资讯尽在：https://www.openeuler.org/zh/ Hello! Kernel invites you to attend the WeLink conference(auto recording) will be held at 2024-09-06 14:00, The subject of the conference is openEuler Kernel SIG双周例会, Summary: 1. 进展update 2. 议题征集中（新增议题可回复本邮件申请，也可直接填写至会议纪要看板） You can join the meeting at https://meeting.huaweicloud.com:36443/#/j/989033877. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. More information: https://www.openeuler.org/en/

1 0