Kernel
Threads by month
- ----- 2025 -----
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- 51 participants
- 18727 discussions

[PATCH OLK-5.10] Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
by Kaixiong Yu 05 Sep '24
by Kaixiong Yu 05 Sep '24
05 Sep '24
From: Jan Kara <jack(a)suse.cz>
stable inclusion
from stable-v5.10.222
commit 145faa3d03688cbb7bbaaecbd84c01539852942c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IAD6H2
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
commit 30139c702048f1097342a31302cbd3d478f50c63 upstream.
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Kaixiong Yu <yukaixiong(a)huawei.com>
---
mm/page-writeback.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0d7cc65c6367..31bc5904bbf8 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1541,7 +1541,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
--
2.25.1
2
1

[PATCH openEuler-23.09] Add dynamic kprobe-based process-level cgroup memory monitoring tool
by Taoxy2004 05 Sep '24
by Taoxy2004 05 Sep '24
05 Sep '24
---
tools/probeCgroup:
This patch introduces a new tool called "probeCgroup" that enables dynamic
monitoring of memory usage at the process level within cgroups. By using
kprobes at relevant cgroup functions, this tool can track memory allocations
and deallocations for individual processes within a cgroup, providing detailed
statistics on memory usage.
The key features of the tool include:
1. Dynamic insertion of kprobes at critical points in the cgroup subsystem.
2. Tracking memory allocation and deallocation events for each process by recording page addresses in a hash table.
3. Providing real-time statistics on memory usage at the process level.
4. Providing statistics on memory usage for processes that are OOM.
Signed-off-by: Taoxy2004 <221870066(a)smail.nju.edu.cn>
---
tools/probeCgroup/Makefile | 7 +
tools/probeCgroup/README.md | 29 +
tools/probeCgroup/probeCgroup.c | 612 ++++++++++++++++++
tools/probeCgroup/probeCgroup.h | 415 ++++++++++++
tools/probeCgroup/run.sh | 8 +
tools/probeCgroup/scripts/script1.sh | 10 +
tools/probeCgroup/scripts/script2.sh | 14 +
tools/probeCgroup/scripts/script3.sh | 11 +
.../testcases/1_load_unload_test.py | 24 +
.../testcases/2_multiple_process_test.py | 48 ++
.../testcases/3_multiple_cgroup_test.py | 55 ++
tools/probeCgroup/testcases/4_oom_test.py | 52 ++
.../testcases/5_multiple_threads_test.py | 45 ++
tools/probeCgroup/testcases/cgroup_utils.py | 115 ++++
tools/probeCgroup/testcases/mem-allocate.c | 35 +
.../testcases/multiple-thread-mem-allocate.c | 60 ++
tools/probeCgroup/testcases/run.py | 32 +
.../testcases/simple-mem-allocate.c | 27 +
18 files changed, 1599 insertions(+)
create mode 100644 tools/probeCgroup/Makefile
create mode 100644 tools/probeCgroup/README.md
create mode 100644 tools/probeCgroup/probeCgroup.c
create mode 100644 tools/probeCgroup/probeCgroup.h
create mode 100755 tools/probeCgroup/run.sh
create mode 100755 tools/probeCgroup/scripts/script1.sh
create mode 100755 tools/probeCgroup/scripts/script2.sh
create mode 100755 tools/probeCgroup/scripts/script3.sh
create mode 100755 tools/probeCgroup/testcases/1_load_unload_test.py
create mode 100755 tools/probeCgroup/testcases/2_multiple_process_test.py
create mode 100755 tools/probeCgroup/testcases/3_multiple_cgroup_test.py
create mode 100755 tools/probeCgroup/testcases/4_oom_test.py
create mode 100755 tools/probeCgroup/testcases/5_multiple_threads_test.py
create mode 100644 tools/probeCgroup/testcases/cgroup_utils.py
create mode 100644 tools/probeCgroup/testcases/mem-allocate.c
create mode 100644 tools/probeCgroup/testcases/multiple-thread-mem-allocate.c
create mode 100755 tools/probeCgroup/testcases/run.py
create mode 100644 tools/probeCgroup/testcases/simple-mem-allocate.c
diff --git a/tools/probeCgroup/Makefile b/tools/probeCgroup/Makefile
new file mode 100644
index 000000000000..606c951e5487
--- /dev/null
+++ b/tools/probeCgroup/Makefile
@@ -0,0 +1,7 @@
+obj-m := probeCgroup.o
+CROSS_COMPILE = ''
+KDIR := /lib/modules/$(shell uname -r)/build
+all:
+ make -C $(KDIR) M=$(PWD) modules
+clean:
+ rm -f *.ko *.o *.mod *.mod.o *.mod.c .*.cmd *.symvers module*
diff --git a/tools/probeCgroup/README.md b/tools/probeCgroup/README.md
new file mode 100644
index 000000000000..ff0b6fc21228
--- /dev/null
+++ b/tools/probeCgroup/README.md
@@ -0,0 +1,29 @@
+# probeCgroup
+
+#### Description
+probeCgroup is a process-level cgroup memory monitoring tool based on dynamic tracing (kprobe/kretprobe) technology. By inserting kprobes and kretprobes at the entry and exit points of relevant cgroup functions, this tool can track the memory usage of individual processes within each cgroup in real time.
+
+#### Software Architecture
+1. Dynamic Tracing : Insert kprobes and kretprobes at critical points in cgroup functions to capture memory allocation and release events.
+2. Hash Table Recording : Record the addresses of pages currently used by each process in a hash table, so that when a page is released, the process it belongs to can be identified.
+3. Real-Time Statistics : Provide real-time statistics showing the memory usage of individual processes within each cgroup.
+
+#### Instruction
+1. Compile and Load the Module
+ a. In the 'probeCgroup' directory, run the 'make' command to compile the module.
+ b. Load the module: 'insmod probeCgroup.ko'.
+ c. View memory statistics: 'cat /proc/cgroup_memory_usage_per_process'.
+ If an OOM (Out of Memory) event occurs in a cgroup, you can see "oom:" followed by the process that experienced the OOM and its memory usage at the time.
+
+2. Automate OOM Scenario
+ In the 'probeCgroup' directory, run './run.sh'. This script will automatically set up an OOM scenario and output the content of '/proc/cgroup_memory_usage_per_process' after execution.
+
+3. Perform More Tests
+ a. After compiling the module, in the 'testcases' directory, run './run.py'.
+ b. This script will perform various tests, including:
+ - Loading and unloading the module
+ - Each cgroup containing multiple processes
+ - Creating multiple cgroups
+ - OOM scenarios
+ - Multithreading
+ c. The tests will take approximately one minute to complete.
diff --git a/tools/probeCgroup/probeCgroup.c b/tools/probeCgroup/probeCgroup.c
new file mode 100644
index 000000000000..9883cb1e082d
--- /dev/null
+++ b/tools/probeCgroup/probeCgroup.c
@@ -0,0 +1,612 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * probeCgroup.c - A tool used to get memory usage for each process in a cgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include "probeCgroup.h"
+
+// kretprobe at mem_cgroup_charge
+struct charge_data {
+ struct cgroup *cgrp;
+ struct mem_cgroup *memcg;
+ struct task_struct *task;
+ unsigned long addr;
+};
+
+static int mem_cgroup_charge_entry_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct charge_data *data;
+ struct folio *page;
+ struct mm_struct *mm;
+ struct mem_cgroup *memcg;
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+
+ if (!current->mm)
+ return 1;
+ page = (struct folio *)regs->di;
+ mm = (struct mm_struct *)regs->si;
+ if (mm == NULL || page == NULL)
+ return -1;
+ memcg = get_mem_cgroup_from_mm(mm);
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+
+ data = (struct charge_data *)ri->data;
+ data->memcg = memcg;
+ data->addr = (unsigned long)page;
+ data->task = current;
+ data->cgrp = cgrp;
+ }
+ return 0;
+}
+
+NOKPROBE_SYMBOL(mem_cgroup_charge_entry_handler);
+
+static int mem_cgroup_charge_ret_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ unsigned long retval = regs_return_value(regs);
+ struct charge_data *data = (struct charge_data *)ri->data;
+ int id;
+ struct cgroup_info *cgrp_info;
+ struct task_info *tsk_info;
+
+ if (data->memcg != NULL && retval == 0) {
+ id = ((data->memcg)->css).id;
+
+ spin_lock(&lock);
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info == NULL) {
+ cgrp_info = create_cgroup_info(data->cgrp, data->memcg);
+ if (cgrp_info == NULL) {
+ spin_unlock(&lock);
+ return -1;
+ }
+ add_cgroup_info(cgrp_info);
+ }
+ spin_unlock(&lock);
+
+ read_lock(&cgrp_info->cgrp_lock);
+ tsk_info = find_task_info(cgrp_info, data->task->tgid);
+ read_unlock(&cgrp_info->cgrp_lock);
+
+ // for some cases, task->comm changes over time
+ if (tsk_info != NULL
+ && strcmp(data->task->comm, tsk_info->comm) != 0) {
+ strscpy(tsk_info->comm, data->task->comm,
+ sizeof(tsk_info->comm));
+ }
+
+ if (tsk_info == NULL) {
+ tsk_info = create_task_info(data->task);
+ if (tsk_info == NULL)
+ return -1;
+ add_task_to_cgroup_info(cgrp_info, tsk_info);
+ }
+
+ if (HashMap_insert(tsk_info->pages, data->addr)) {
+ //update counter
+ spin_lock(&(tsk_info->cnt_lock));
+ tsk_info->count +=
+ folio_nr_pages((struct folio *)data->addr);
+ spin_unlock(&(tsk_info->cnt_lock));
+ }
+ }
+
+ return 0;
+
+}
+
+NOKPROBE_SYMBOL(mem_cgroup_charge_ret_handler);
+
+static struct kretprobe mem_cgroup_charge_kretprobe = {
+ .handler = mem_cgroup_charge_ret_handler,
+ .entry_handler = mem_cgroup_charge_entry_handler,
+ .data_size = sizeof(struct charge_data),
+ .maxactive = 20,
+};
+
+static int mem_cgroup_charge_kretprobe_init(void)
+{
+ int ret;
+
+ mem_cgroup_charge_kretprobe.kp.symbol_name = "__mem_cgroup_charge";
+ ret = register_kretprobe(&mem_cgroup_charge_kretprobe);
+ if (ret < 0) {
+ pr_err("register_kretprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted return probe at %s: %p\n",
+ mem_cgroup_charge_kretprobe.kp.symbol_name,
+ mem_cgroup_charge_kretprobe.kp.addr);
+ return 0;
+}
+
+static void mem_cgroup_charge_kretprobe_exit(void)
+{
+ unregister_kretprobe(&mem_cgroup_charge_kretprobe);
+ pr_info("kretprobe at %p unregistered\n",
+ mem_cgroup_charge_kretprobe.kp.addr);
+
+ /* nmissed > 0 suggests that maxactive was set too low. */
+ pr_info("Missed probing %d instances of %s\n",
+ mem_cgroup_charge_kretprobe.nmissed,
+ mem_cgroup_charge_kretprobe.kp.symbol_name);
+}
+
+// kretprobe at uncharge_folio
+
+struct uncharge_data {
+ struct cgroup *cgrp;
+ struct mem_cgroup *memcg;
+ unsigned long addr;
+ bool isKmem;
+ int nr_pages;
+};
+
+static int uncharge_folio_entry_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct uncharge_data *data;
+ struct folio *page;
+ struct mem_cgroup *memcg = NULL;
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+ struct obj_cgroup *objcg;
+ int nr_pages = 0;
+
+ data = (struct uncharge_data *)ri->data;
+ page = (struct folio *)regs->di;
+ if (page == NULL) {
+ data->memcg = NULL;
+ return -1;
+ }
+ if (page->memcg_data & MEMCG_DATA_KMEM) { // if the page belongs to kmem
+ if (!folio_test_large(page))
+ nr_pages = 1;
+ else
+ nr_pages = page->_folio_nr_pages;
+ // nr_pages = thp_nr_pages(page);
+ objcg = __folio_objcg(page);
+ if (objcg != NULL)
+ memcg = objcg->memcg;
+ data->isKmem = true;
+ data->nr_pages = nr_pages;
+ } else {
+ memcg = __folio_memcg(page);
+ data->isKmem = false;
+ }
+
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+
+ data->memcg = memcg;
+ data->addr = (unsigned long)page;
+ data->cgrp = cgrp;
+ }
+ return 0;
+}
+
+NOKPROBE_SYMBOL(uncharge_folio_entry_handler);
+
+static int uncharge_folio_ret_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct uncharge_data *data = (struct uncharge_data *)ri->data;
+ int id;
+ struct cgroup_info *cgrp_info;
+ int ret = -1;
+
+ if (data->memcg != NULL) {
+ id = ((data->memcg)->css).id;
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info == NULL)
+ return -1;
+ if (data->isKmem)
+ ret = -1;
+ else
+ ret = remove_page_from_cgroup_info(data->addr, cgrp_info);
+ }
+
+ return ret;
+}
+
+NOKPROBE_SYMBOL(uncharge_folio_ret_handler);
+
+static struct kretprobe uncharge_folio_kretprobe = {
+ .handler = uncharge_folio_ret_handler,
+ .entry_handler = uncharge_folio_entry_handler,
+ .data_size = sizeof(struct uncharge_data),
+ .maxactive = 20,
+};
+
+static int uncharge_folio_kretprobe_init(void)
+{
+ int ret;
+
+ uncharge_folio_kretprobe.kp.symbol_name = "uncharge_folio";
+ ret = register_kretprobe(&uncharge_folio_kretprobe);
+ if (ret < 0) {
+ pr_err("register_kretprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted return probe at %s: %p\n",
+ uncharge_folio_kretprobe.kp.symbol_name,
+ uncharge_folio_kretprobe.kp.addr);
+ return 0;
+}
+
+static void uncharge_folio_kretprobe_exit(void)
+{
+ unregister_kretprobe(&uncharge_folio_kretprobe);
+ pr_info("kretprobe at %p unregistered\n",
+ uncharge_folio_kretprobe.kp.addr);
+
+ /* nmissed > 0 suggests that maxactive was set too low. */
+ pr_info("Missed probing %d instances of %s\n",
+ uncharge_folio_kretprobe.nmissed,
+ uncharge_folio_kretprobe.kp.symbol_name);
+}
+
+//kprobe at do_exit
+static struct kprobe do_exit_kprobe;
+static int do_exit_kprobe_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct task_struct *cur = current;
+ int tgid = cur->tgid;
+ struct mm_struct *mm = cur->mm;
+ struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+ struct cgroup_info *cgrp_info;
+ struct task_info *tsk_info;
+ int id;
+
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+ id = (memcg->css).id;
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info != NULL) {
+ write_lock(&cgrp_info->cgrp_lock);
+ tsk_info = find_task_info(cgrp_info, tgid);
+ if (tsk_info != NULL) {
+ list_del(&tsk_info->list);
+ write_unlock(&cgrp_info->cgrp_lock);
+ remove_task_from_cgroup_info(cgrp_info,
+ tsk_info);
+ } else {
+ write_unlock(&cgrp_info->cgrp_lock);
+ }
+ return 0;
+ }
+ }
+ return 0;
+}
+
+static void do_exit_kprobe_post_handler(struct kprobe *p,
+ struct pt_regs *regs,
+ unsigned long flags)
+{
+
+}
+
+static int do_exit_kprobe_init(void)
+{
+ do_exit_kprobe.pre_handler = do_exit_kprobe_pre_handler;
+ do_exit_kprobe.post_handler = do_exit_kprobe_post_handler;
+ do_exit_kprobe.symbol_name = "do_exit";
+ if (register_kprobe(&do_exit_kprobe)) {
+ pr_alert("register_kprobe on do_exit failed!\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static void do_exit_kprobe_exit(void)
+{
+ unregister_kprobe(&do_exit_kprobe);
+}
+
+//kprobe at mark_oom_victim
+static struct kprobe mark_oom_victim_kprobe;
+
+static int mark_oom_victim_kprobe_pre_handler(struct kprobe *p,
+ struct pt_regs *regs)
+{
+ struct task_struct *victim;
+ int tgid;
+ struct mm_struct *mm;
+ struct mem_cgroup *memcg;
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+ struct cgroup_info *cgrp_info;
+ struct task_info *tsk_info;
+ int id;
+ struct task_info *oom_info;
+
+ victim = (struct task_struct *)regs->di;
+ tgid = victim->tgid;
+ mm = victim->mm;
+ memcg = get_mem_cgroup_from_mm(mm);
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+ id = (memcg->css).id;
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info != NULL) {
+ read_lock(&cgrp_info->cgrp_lock);
+ tsk_info = find_task_info(cgrp_info, tgid);
+ read_unlock(&cgrp_info->cgrp_lock);
+ if (tsk_info != NULL) {
+ oom_info = create_oom_task_info(tsk_info);
+ if (oom_info != NULL) {
+ add_oom_task_to_cgroup_info(cgrp_info,
+ oom_info);
+ }
+ return 0;
+ }
+ }
+ }
+ return 0;
+}
+
+static void mark_oom_victim_kprobe_post_handler(struct kprobe *p,
+ struct pt_regs *regs,
+ unsigned long flags)
+{
+
+}
+
+static int mark_oom_victim_kprobe_init(void)
+{
+ mark_oom_victim_kprobe.pre_handler = mark_oom_victim_kprobe_pre_handler;
+ mark_oom_victim_kprobe.post_handler =
+ mark_oom_victim_kprobe_post_handler;
+ mark_oom_victim_kprobe.symbol_name = "mark_oom_victim";
+ if (register_kprobe(&mark_oom_victim_kprobe)) {
+ pr_alert("register_kprobe on mark_oom_victim failed!\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static void mark_oom_victim_kprobe_exit(void)
+{
+ unregister_kprobe(&mark_oom_victim_kprobe);
+}
+
+//kretporbe at cgroup_destroy_locked
+struct destroy_data {
+ struct cgroup *cgrp;
+};
+
+static int cgroup_destroy_locked_entry_handler(struct kretprobe_instance
+ *ri, struct pt_regs *regs)
+{
+ struct destroy_data *data;
+
+ data = (struct destroy_data *)ri->data;
+ data->cgrp = (struct cgroup *)regs->di;
+ return 0;
+}
+
+NOKPROBE_SYMBOL(cgroup_destroy_locked_entry_handler);
+
+static int cgroup_destroy_locked_ret_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct destroy_data *data = (struct destroy_data *)ri->data;
+ struct cgroup *cgrp = data->cgrp;
+ struct cgroup_info *cgrp_info = NULL;
+ unsigned long retval = regs_return_value(regs);
+
+ if (!cgrp)
+ return -1;
+ if (retval != 0)
+ return -1;
+ list_for_each_entry(cgrp_info, &all_cgroup_info, list) {
+ if (cgrp_info->cgrp == cgrp) {
+ spin_lock(&lock);
+ list_del(&cgrp_info->list);
+ spin_unlock(&lock);
+ destroy_cgroup_info(cgrp_info);
+ return 0;
+ }
+ }
+ return -1;
+}
+
+NOKPROBE_SYMBOL(cgroup_destroy_locked_ret_handler);
+
+static struct kretprobe cgroup_destroy_locked_kretprobe = {
+ .handler = cgroup_destroy_locked_ret_handler,
+ .entry_handler = cgroup_destroy_locked_entry_handler,
+ .data_size = sizeof(struct destroy_data),
+ .maxactive = 20,
+};
+
+static int cgroup_destroy_locked_kretprobe_init(void)
+{
+ int ret;
+
+ cgroup_destroy_locked_kretprobe.kp.symbol_name =
+ "cgroup_destroy_locked";
+ ret = register_kretprobe(&cgroup_destroy_locked_kretprobe);
+ if (ret < 0) {
+ pr_err("register_kretprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted return probe at %s: %p\n",
+ cgroup_destroy_locked_kretprobe.kp.symbol_name,
+ cgroup_destroy_locked_kretprobe.kp.addr);
+ return 0;
+}
+
+static void cgroup_destroy_locked_kretprobe_exit(void)
+{
+ unregister_kretprobe(&cgroup_destroy_locked_kretprobe);
+ pr_info("kretprobe at %p unregistered\n",
+ cgroup_destroy_locked_kretprobe.kp.addr);
+
+ /* nmissed > 0 suggests that maxactive was set too low. */
+ pr_info("Missed probing %d instances of %s\n",
+ cgroup_destroy_locked_kretprobe.nmissed,
+ cgroup_destroy_locked_kretprobe.kp.symbol_name);
+}
+
+// print the tasks in order of their memory usage
+static void print_sorted_tasks_list(struct cgroup_info *cgrp_info,
+ int type, struct seq_file *m)
+{
+ struct list_head *cur, *insert_pos;
+ struct task_info *task, *insert_task;
+ struct list_head new_list = LIST_HEAD_INIT(new_list);
+ struct list_head *old_list;
+ struct task_info *new_task, *next_task;
+
+ if (type == 0) {
+ if (cgrp_info == NULL)
+ return;
+ read_lock(&cgrp_info->cgrp_lock);
+ old_list = &cgrp_info->tasks_list;
+ } else {
+ if (cgrp_info == NULL)
+ return;
+ old_list = &cgrp_info->oom_list;
+ }
+
+ list_for_each_entry_safe(task, insert_task, old_list, list) {
+ new_task = kmalloc(sizeof(struct task_info), GFP_ATOMIC);
+ if (!new_task)
+ return;
+ new_task->tgid = task->tgid;
+ strscpy(new_task->comm, task->comm, sizeof(new_task->comm));
+ new_task->count = task->count;
+ new_task->pages = NULL;
+ INIT_LIST_HEAD(&new_task->list);
+
+ //insertion sort
+ cur = &new_list;
+ insert_pos = cur->next;
+ while (insert_pos != &new_list) {
+ next_task =
+ list_entry(insert_pos, struct task_info, list);
+ if (new_task->count >= next_task->count)
+ break;
+ cur = insert_pos;
+ insert_pos = insert_pos->next;
+ }
+
+ (&new_task->list)->prev = insert_pos->prev;
+ (insert_pos->prev)->next = (&new_task->list);
+ (&new_task->list)->next = insert_pos;
+ insert_pos->prev = (&new_task->list);
+ }
+ if (type == 0)
+ read_unlock(&cgrp_info->cgrp_lock);
+
+ //print
+ if (type == 1 && (&new_list) != new_list.next) {
+ seq_puts(m, "oom:\n");
+ seq_printf(m, "%10s %20s %20s\n", "pid", "command",
+ "memory usage (KB)");
+ }
+ if (type == 0)
+ seq_printf(m, "%10s %20s %20s\n", "pid", "command",
+ "memory usage (KB)");
+ list_for_each_entry_safe(task, insert_task, &new_list, list) {
+ seq_printf(m, "%10d %20s %20d\n", task->tgid, task->comm,
+ (task->count) * 4);
+ }
+
+ list_for_each_entry_safe(task, insert_task, &new_list, list) {
+ list_del(&task->list);
+ kfree(task);
+ }
+}
+
+static struct proc_dir_entry *cgroup_info_read;
+#define procfs_file_read "cgroup_memory_usage_per_process"
+
+void seq_print_tasks(struct cgroup_info *cgroup_info, struct seq_file *m)
+{
+ if (!cgroup_info)
+ return;
+
+ print_sorted_tasks_list(cgroup_info, 0, m);
+}
+
+void seq_print_oom_tasks(struct cgroup_info *cgroup_info, struct seq_file *m)
+{
+ if (!cgroup_info)
+ return;
+
+ print_sorted_tasks_list(cgroup_info, 1, m);
+}
+
+void seq_print_cgroups(struct seq_file *m)
+{
+ struct cgroup_info *cgrp, *pos;
+
+ spin_lock(&lock);
+ list_for_each_entry_safe(cgrp, pos, &all_cgroup_info, list) {
+ seq_printf(m, "cgroup name : %s\n", cgrp->name);
+ seq_print_tasks(cgrp, m);
+ seq_print_oom_tasks(cgrp, m);
+ seq_puts(m, "\n");
+ }
+ spin_unlock(&lock);
+}
+
+static int memory_usage_show(struct seq_file *m, void *v)
+{
+ seq_print_cgroups(m);
+ return 0;
+}
+
+static int __init global_init(void)
+{
+ int ret = 0;
+
+ cgroup_info_read =
+ proc_create_single(procfs_file_read, 0, NULL, memory_usage_show);
+ if (!cgroup_info_read)
+ return -ENOMEM;
+ ret = mem_cgroup_charge_kretprobe_init();
+ uncharge_folio_kretprobe_init();
+ do_exit_kprobe_init();
+ mark_oom_victim_kprobe_init();
+ cgroup_destroy_locked_kretprobe_init();
+
+ return ret;
+}
+
+static void __exit global_exit(void)
+{
+ struct cgroup_info *cgrp_info, *pos;
+
+ mem_cgroup_charge_kretprobe_exit();
+ uncharge_folio_kretprobe_exit();
+ do_exit_kprobe_exit();
+ mark_oom_victim_kprobe_exit();
+ cgroup_destroy_locked_kretprobe_exit();
+
+ remove_proc_entry(procfs_file_read, NULL);
+
+ //release all memory use
+ list_for_each_entry_safe(cgrp_info, pos, &all_cgroup_info, list) {
+ list_del(&cgrp_info->list);
+ destroy_cgroup_info(cgrp_info);
+ }
+}
+
+module_init(global_init)
+module_exit(global_exit)
+MODULE_LICENSE("GPL");
diff --git a/tools/probeCgroup/probeCgroup.h b/tools/probeCgroup/probeCgroup.h
new file mode 100644
index 000000000000..953a6e0aca31
--- /dev/null
+++ b/tools/probeCgroup/probeCgroup.h
@@ -0,0 +1,415 @@
+/* SPDX-License-Identifier: GPL-2.0*/
+/*
+ * probeCgroup.h
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kprobes.h>
+#include <linux/ktime.h>
+#include <linux/limits.h>
+#include <linux/sched.h>
+#include <linux/mm_types.h>
+#include <linux/memcontrol.h>
+#include <linux/cgroup-defs.h>
+#include <linux/kernfs.h>
+#include <linux/string.h>
+#include <linux/list.h>
+#include <linux/oom.h>
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+#include <linux/huge_mm.h>
+#include <linux/page-flags.h>
+#include <linux/spinlock.h>
+#include <linux/rwlock.h>
+
+static spinlock_t lock; // global lock for the list of cgroup_info
+
+struct HashNode {
+ unsigned long addr;
+ struct HashNode *next;
+};
+
+struct HashNode *HashNode_create(unsigned long addr)
+{
+ struct HashNode *node = NULL;
+
+ node = kzalloc(sizeof(struct HashNode), GFP_ATOMIC);
+ if (node == NULL)
+ return NULL;
+ node->addr = addr;
+ node->next = NULL;
+ return node;
+}
+
+struct HashBucket {
+ struct HashNode *head;
+ spinlock_t bkt_lock;
+};
+
+void HashBucket_init(struct HashBucket *bkt)
+{
+ bkt->head = NULL;
+ spin_lock_init(&bkt->bkt_lock);
+}
+
+bool HashBucket_insert(struct HashBucket *bkt, unsigned long addr)
+{
+ struct HashNode *new_node;
+ struct HashNode *node;
+ struct HashNode *prev;
+ bool ret = true;
+
+ if (bkt == NULL)
+ return false;
+
+ prev = NULL;
+ new_node = NULL;
+ new_node = HashNode_create(addr);
+
+ spin_lock(&bkt->bkt_lock);
+ node = bkt->head;
+ while (node != NULL && node->addr != addr) {
+ prev = node;
+ node = node->next;
+ }
+ if (node == NULL) {
+ if (new_node == NULL) {
+ pr_info("not enough memory for HashNode\n");
+ spin_unlock(&bkt->bkt_lock);
+ return false;
+ }
+ if (bkt->head == NULL)
+ bkt->head = new_node;
+ else
+ prev->next = new_node;
+ spin_unlock(&bkt->bkt_lock);
+ ret = true;
+ } else {
+ spin_unlock(&bkt->bkt_lock);
+ kfree(new_node);
+ ret = false;
+ }
+
+ return ret;
+}
+
+bool HashBucket_erase(struct HashBucket *bkt, unsigned long addr)
+{
+ struct HashNode *node;
+ struct HashNode *prev;
+ bool ret = true;
+
+ if (bkt == NULL)
+ return false;
+
+ spin_lock(&bkt->bkt_lock);
+ node = bkt->head;
+ prev = NULL;
+ while (node != NULL && node->addr != addr) {
+ prev = node;
+ node = node->next;
+ }
+ if (node == NULL) {
+ spin_unlock(&bkt->bkt_lock);
+ ret = false;
+ } else {
+ if (bkt->head == node)
+ bkt->head = node->next;
+ else
+ prev->next = node->next;
+ kfree(node);
+ spin_unlock(&bkt->bkt_lock);
+ ret = true;
+ }
+
+ return ret;
+}
+
+void HashBucket_clear(struct HashBucket *bkt)
+{
+ struct HashNode *node;
+ struct HashNode *prev;
+
+ if (bkt == NULL)
+ return;
+
+ spin_lock(&bkt->bkt_lock);
+ node = bkt->head;
+ prev = NULL;
+ bkt->head = NULL;
+ while (node != NULL) {
+ prev = node;
+ node = node->next;
+ kfree(prev);
+ }
+ spin_unlock(&bkt->bkt_lock);
+}
+
+struct HashMap {
+ unsigned long size;
+ struct HashBucket *HashTable;
+};
+
+unsigned long hash_func(unsigned long addr, unsigned long size)
+{
+ return addr % size;
+}
+
+struct HashMap *HashMap_create(unsigned long size)
+{
+ struct HashMap *hm = NULL;
+ struct HashBucket *ht = NULL;
+ int i = 0;
+
+ hm = kmalloc(sizeof(struct HashMap), GFP_ATOMIC);
+ if (hm == NULL)
+ return NULL;
+ ht = kmalloc((size * sizeof(struct HashBucket)), GFP_ATOMIC);
+ if (ht == NULL) {
+ kfree(hm);
+ return NULL;
+ }
+ for (i = 0; i < size; i++)
+ HashBucket_init(&(ht[i]));
+
+ hm->size = size;
+ hm->HashTable = ht;
+ return hm;
+}
+
+bool HashMap_insert(struct HashMap *hm, unsigned long addr)
+{
+ unsigned long index;
+
+ if (hm == NULL)
+ return false;
+ index = hash_func(addr, hm->size);
+ if (hm->HashTable == NULL)
+ return false;
+ return HashBucket_insert(&(hm->HashTable[index]), addr);
+}
+
+bool HashMap_erase(struct HashMap *hm, unsigned long addr)
+{
+ unsigned long index;
+
+ if (hm == NULL)
+ return false;
+ index = hash_func(addr, hm->size);
+ if (hm->HashTable == NULL)
+ return false;
+ return HashBucket_erase(&(hm->HashTable[index]), addr);
+}
+
+void HashMap_clear(struct HashMap *hm)
+{
+ unsigned long size;
+ struct HashBucket *ht;
+ int i;
+
+ if (hm == NULL)
+ return;
+ size = hm->size;
+ ht = hm->HashTable;
+ if (ht == NULL)
+ return;
+ hm->HashTable = NULL;
+ for (i = 0; i < size; i++)
+ HashBucket_clear(&(ht[i]));
+
+ kfree(ht);
+ kfree(hm);
+}
+
+//struct that save the information for each task
+struct task_info {
+ int tgid;
+ char comm[TASK_COMM_LEN];
+ int count; // number of pages
+ struct HashMap *pages;
+ struct list_head list;
+ spinlock_t cnt_lock;
+};
+
+// struct that save the information for each cgroup
+struct cgroup_info {
+ struct cgroup *cgrp;
+ struct mem_cgroup *memcg;
+ int id;
+ char name[64];
+ struct list_head list;
+ struct list_head tasks_list;
+ struct list_head oom_list;
+ rwlock_t cgrp_lock;
+ unsigned int cached_bytes;
+};
+
+static LIST_HEAD(all_cgroup_info); // a list that linked all the cgroup_info struct
+
+static struct task_info *create_task_info(struct task_struct *cur_task)
+{
+ struct task_info *tsk_info =
+ kmalloc(sizeof(struct task_info), GFP_ATOMIC);
+ if (!tsk_info)
+ return NULL;
+
+ // initialization
+ tsk_info->tgid = cur_task->tgid;
+ strscpy(tsk_info->comm, cur_task->comm, sizeof(tsk_info->comm));
+ tsk_info->count = 0;
+ tsk_info->pages = NULL;
+ tsk_info->pages = HashMap_create(1023);
+ INIT_LIST_HEAD(&tsk_info->list);
+ spin_lock_init(&(tsk_info->cnt_lock));
+
+ return tsk_info;
+}
+
+static int
+add_task_to_cgroup_info(struct cgroup_info *cgrp, struct task_info *task)
+{
+ if (!cgrp || !task)
+ return -EINVAL;
+
+ write_lock(&cgrp->cgrp_lock);
+ list_add_tail(&task->list, &cgrp->tasks_list);
+ write_unlock(&cgrp->cgrp_lock);
+ return 0;
+}
+
+static int
+remove_task_from_cgroup_info(struct cgroup_info *cgrp, struct task_info *task)
+{
+ if (cgrp == NULL || task == NULL)
+ return -EINVAL;
+
+ HashMap_clear(task->pages);
+ // kfree(task->pages);
+ kfree(task);
+ return 0;
+}
+
+static struct task_info *find_task_info(struct cgroup_info *cgrp, int tgid)
+{
+ struct task_info *tsk_info, *pos;
+
+ list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) {
+ if (tsk_info->tgid == tgid)
+ return tsk_info;
+ }
+ return NULL;
+}
+
+static int
+remove_page_from_cgroup_info(unsigned long addr, struct cgroup_info *cgrp)
+{
+ struct task_info *tsk_info, *pos;
+
+ read_lock(&cgrp->cgrp_lock);
+ list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) {
+ if (HashMap_erase(tsk_info->pages, addr)) {
+ spin_lock(&(tsk_info->cnt_lock));
+ tsk_info->count -= folio_nr_pages((struct folio *)addr);
+ spin_unlock(&(tsk_info->cnt_lock));
+ read_unlock(&cgrp->cgrp_lock);
+ return 0;
+ }
+ }
+ read_unlock(&cgrp->cgrp_lock);
+ return -1;
+}
+
+static struct cgroup_info *create_cgroup_info(struct cgroup *cgrp,
+ struct mem_cgroup *memcg)
+{
+ struct cgroup_info *cgrp_info =
+ kmalloc(sizeof(struct cgroup_info), GFP_ATOMIC);
+ struct kernfs_node *kn;
+
+ if (!cgrp_info)
+ return NULL;
+
+ cgrp_info->cgrp = cgrp;
+ cgrp_info->memcg = memcg;
+ cgrp_info->id = (memcg->css).id;
+ kn = cgrp->kn;
+ strscpy(cgrp_info->name, kn->name, sizeof(cgrp_info->name));
+ INIT_LIST_HEAD(&cgrp_info->list);
+ INIT_LIST_HEAD(&cgrp_info->tasks_list);
+ INIT_LIST_HEAD(&cgrp_info->oom_list);
+ rwlock_init(&(cgrp_info->cgrp_lock));
+ cgrp_info->cached_bytes = 0;
+
+ return cgrp_info;
+}
+
+static void destroy_cgroup_info(struct cgroup_info *cgrp_info)
+{
+ struct task_info *task, *tmp;
+
+ if (!cgrp_info)
+ return;
+
+ write_lock(&cgrp_info->cgrp_lock);
+ list_for_each_entry_safe(task, tmp, &cgrp_info->tasks_list, list) {
+ list_del(&task->list);
+ remove_task_from_cgroup_info(cgrp_info, task);
+ }
+ write_unlock(&cgrp_info->cgrp_lock);
+ list_for_each_entry_safe(task, tmp, &cgrp_info->oom_list, list) {
+ list_del(&task->list);
+ remove_task_from_cgroup_info(cgrp_info, task);
+ }
+
+ kfree(cgrp_info);
+}
+
+static int add_cgroup_info(struct cgroup_info *cgrp_info)
+{
+ if (!cgrp_info)
+ return -EINVAL;
+
+ list_add_tail(&cgrp_info->list, &all_cgroup_info);
+ return 0;
+}
+
+static struct cgroup_info *find_cgroup_info(int id)
+{
+ struct cgroup_info *cgrp_info = NULL;
+
+ list_for_each_entry(cgrp_info, &all_cgroup_info, list) {
+ if (cgrp_info->id == id)
+ return cgrp_info;
+ }
+
+ return NULL;
+}
+
+static struct task_info *create_oom_task_info(struct task_info *tsk_info)
+{
+ struct task_info *oom_tsk_info =
+ kmalloc(sizeof(struct task_info), GFP_ATOMIC);
+ if (!oom_tsk_info)
+ return NULL;
+
+ oom_tsk_info->tgid = tsk_info->tgid;
+ strscpy(oom_tsk_info->comm, tsk_info->comm, sizeof(oom_tsk_info->comm));
+ oom_tsk_info->count = tsk_info->count;
+ oom_tsk_info->pages = NULL;
+ INIT_LIST_HEAD(&oom_tsk_info->list);
+
+ return oom_tsk_info;
+}
+
+static int
+add_oom_task_to_cgroup_info(struct cgroup_info *cgrp,
+ struct task_info *oom_task)
+{
+ if (!cgrp || !oom_task)
+ return -EINVAL;
+ list_add_tail(&oom_task->list, &cgrp->oom_list);
+ return 0;
+}
diff --git a/tools/probeCgroup/run.sh b/tools/probeCgroup/run.sh
new file mode 100755
index 000000000000..7e1ffefe66d1
--- /dev/null
+++ b/tools/probeCgroup/run.sh
@@ -0,0 +1,8 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+cd scripts
+./script1.sh
+./script2.sh
+./script3.sh
diff --git a/tools/probeCgroup/scripts/script1.sh b/tools/probeCgroup/scripts/script1.sh
new file mode 100755
index 000000000000..539e8258afb9
--- /dev/null
+++ b/tools/probeCgroup/scripts/script1.sh
@@ -0,0 +1,10 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+cd ..
+make
+insmod probeCgroup.ko
+
+cd testcases
+gcc simple-mem-allocate.c -o simple-mem-allocate
diff --git a/tools/probeCgroup/scripts/script2.sh b/tools/probeCgroup/scripts/script2.sh
new file mode 100755
index 000000000000..2ad515cfb912
--- /dev/null
+++ b/tools/probeCgroup/scripts/script2.sh
@@ -0,0 +1,14 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+current_dir=$(pwd)
+cd /sys/fs/cgroup/memory
+mkdir test
+cd test
+sh -c "echo $$ >> cgroup.procs"
+sh -c "echo 5M > memory.limit_in_bytes"
+sh -c "echo 0 > memory.swappiness"
+cd "$current_dir"
+cd ../testcases
+./simple-mem-allocate
diff --git a/tools/probeCgroup/scripts/script3.sh b/tools/probeCgroup/scripts/script3.sh
new file mode 100755
index 000000000000..127eb45de5c9
--- /dev/null
+++ b/tools/probeCgroup/scripts/script3.sh
@@ -0,0 +1,11 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+cd /proc
+cat cgroup_memory_usage_per_process
+
+cat /sys/fs/cgroup/memory/test/cgroup.procs > /sys/fs/cgroup/memory/cgroup.procs
+rmdir /sys/fs/cgroup/memory/test
+# cat cgroup_memory_usage_per_process
+rmmod probeCgroup
diff --git a/tools/probeCgroup/testcases/1_load_unload_test.py b/tools/probeCgroup/testcases/1_load_unload_test.py
new file mode 100755
index 000000000000..5389a14a1dac
--- /dev/null
+++ b/tools/probeCgroup/testcases/1_load_unload_test.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+import os
+import subprocess
+import time
+
+def test_module_load_unload():
+ try:
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+ print('loading module successfully!')
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+ output = subprocess.check_output(['lsmod'])
+ assert b'probeCgroup' not in output
+ print('unloading module successfully!')
+ except subprocess.CalledProcessError as e:
+ print('Load unload test failed. Insmod failed.')
+ except AssertionError as e:
+ print('Load unload test failed. Cannot remove module.')
+
+if __name__ == '__main__':
+ test_module_load_unload()
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/2_multiple_process_test.py b/tools/probeCgroup/testcases/2_multiple_process_test.py
new file mode 100755
index 000000000000..d88c7f7f2952
--- /dev/null
+++ b/tools/probeCgroup/testcases/2_multiple_process_test.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage
+import os
+import subprocess
+import time
+
+def test_multiple_process(num_procs):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroup_name = 'test'
+ cgroup_path = create_cgroup(cgroup_name)
+
+ processes = []
+ pids = []
+
+ for i in range(num_procs):
+ process = subprocess.Popen(['./mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ processes.append(process)
+ pids.append(pid)
+
+ time.sleep(0.1)
+ try:
+ count = 0
+ for i in range (2000):
+ count += check_memory_usage(cgroup_name, pids, False)
+ time.sleep(0.01)
+ assert count <= 50, f"Memory read by probeCgroup is not accurate"
+
+ remove_cgroup(cgroup_path, pids)
+ check_memory_usage(cgroup_name, pids, True)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass multiple process test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ remove_cgroup(cgroup_path, pids)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_multiple_process(3)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/3_multiple_cgroup_test.py b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py
new file mode 100755
index 000000000000..592a716df877
--- /dev/null
+++ b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup
+import os
+import subprocess
+import time
+
+def test_multiple_cgroup(num_procs, num_cgroups):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroups = []
+ processes = {}
+ pids = {}
+ for i in range(num_cgroups):
+ cgroup_name = f'test_{i}'
+ cgroup_path = create_cgroup(cgroup_name)
+ cgroups.append((cgroup_name, cgroup_path))
+
+ for j in range(num_procs):
+ process = subprocess.Popen(['./mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ if cgroup_path not in processes:
+ processes[cgroup_path] = []
+ processes[cgroup_path].append(process)
+ if cgroup_path not in pids:
+ pids[cgroup_path] = []
+ pids[cgroup_path].append(pid)
+
+ time.sleep(0.1)
+ try:
+ for i in range (100):
+ for cgroup_name, cgroup_path in cgroups:
+ check_memory_usage(cgroup_name, pids[cgroup_path], False)
+ time.sleep(0.01)
+
+ for cgroup_name, cgroup_path in cgroups:
+ remove_cgroup(cgroup_path, pids[cgroup_path])
+ check_memory_usage(cgroup_name, pids[cgroup_path], True)
+ cleanup(processes[cgroup_path])
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass multiple cgroup test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ for cgroup_name, cgroup_path in cgroups:
+ remove_cgroup(cgroup_path, pids[cgroup_path])
+ cleanup(processes[cgroup_path])
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_multiple_cgroup(2,2)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/4_oom_test.py b/tools/probeCgroup/testcases/4_oom_test.py
new file mode 100755
index 000000000000..128a258c56f5
--- /dev/null
+++ b/tools/probeCgroup/testcases/4_oom_test.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, get_oom_process_memory_usage
+import os
+import subprocess
+import time
+
+def test_oom(num_procs):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroup_name = 'test'
+ cgroup_path = create_cgroup(cgroup_name)
+
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.limit_in_bytes", 'w') as limit_file:
+ limit_file.write("5M")
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.swappiness", 'w') as swap_file:
+ swap_file.write("0")
+
+ processes = []
+ pids = []
+ for i in range(num_procs):
+ process = subprocess.Popen(['./simple-mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ processes.append(process)
+ pids.append(pid)
+
+ time.sleep(6)
+
+ try:
+ for pid in pids:
+ memory_usage = get_oom_process_memory_usage(pid, cgroup_name)
+ assert memory_usage is not None, f"Memory usage(oom) not found for PID {pid}"
+ assert memory_usage > 0, f"Memory usage should be greater than zero for PID {pid}"
+
+ remove_cgroup(cgroup_path, pids)
+ check_memory_usage(cgroup_name, pids, True)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass oom test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ remove_cgroup(cgroup_path, pids)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_oom(1)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/5_multiple_threads_test.py b/tools/probeCgroup/testcases/5_multiple_threads_test.py
new file mode 100755
index 000000000000..7e1b86dabe48
--- /dev/null
+++ b/tools/probeCgroup/testcases/5_multiple_threads_test.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage
+import os
+import subprocess
+import time
+
+def test_multiple_thread(num_procs):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroup_name = 'test'
+ cgroup_path = create_cgroup(cgroup_name)
+
+ processes = []
+ pids = []
+ for i in range(num_procs):
+ process = subprocess.Popen(['./multiple-thread-mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ processes.append(process)
+ pids.append(pid)
+
+ time.sleep(1)
+ try:
+ for i in range (200):
+ check_memory_usage(cgroup_name, pids, False)
+ time.sleep(0.01)
+
+ remove_cgroup(cgroup_path, pids)
+ check_memory_usage(cgroup_name, pids, True)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass multiple threads test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ remove_cgroup(cgroup_path, pids)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_multiple_thread(5)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/cgroup_utils.py b/tools/probeCgroup/testcases/cgroup_utils.py
new file mode 100644
index 000000000000..f70c68c1f188
--- /dev/null
+++ b/tools/probeCgroup/testcases/cgroup_utils.py
@@ -0,0 +1,115 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+import os
+import subprocess
+import time
+
+def create_cgroup(cgroup_name):
+ cgroup_path = f'/sys/fs/cgroup/memory/{cgroup_name}'
+
+ try:
+ os.makedirs(cgroup_path)
+ except FileExistsError:
+ pass
+
+ return cgroup_path
+
+def add_process_to_cgroup(cgroup_path, pid):
+ with open(os.path.join(cgroup_path, 'cgroup.procs'), 'w') as procs_file:
+ procs_file.write(str(pid))
+
+def get_process_memory_usage(pid, cgroup_name):
+ cur_name = ''
+ with open('/proc/cgroup_memory_usage_per_process', 'r') as file:
+ for line in file:
+ parts = line.strip().split()
+ if len(parts) >= 4 and parts[0] == 'cgroup':
+ cur_name = parts[3]
+ if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid:
+ return int(parts[2])
+ return None
+
+def get_process_kmem_usage(pid, cgroup_name):
+ cur_name = ''
+ with open('/proc/cgroup_memory_usage_per_process', 'r') as file:
+ for line in file:
+ parts = line.strip().split()
+ if len(parts) >= 4 and parts[0] == 'cgroup':
+ cur_name = parts[3]
+ if len(parts) >= 4 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid:
+ return int(parts[3])
+ return None
+
+def remove_cgroup(cgroup_path, pids):
+ for pid in pids:
+ with open('/sys/fs/cgroup/memory/cgroup.procs', 'w') as backup_file:
+ backup_file.write(str(pid))
+ os.rmdir(cgroup_path)
+ return
+
+def check_memory_usage(cgroup_name, pids, delete):
+ memory_sum = 0
+ for pid in pids:
+ memory_usage = get_process_memory_usage(pid, cgroup_name)
+ if delete == False:
+ assert memory_usage is not None, f"Memory usage not found for PID {pid}"
+ assert memory_usage >= 0, f"Memory usage should be greater than zero for PID {pid}"
+ memory_sum += memory_usage
+ else:
+ assert memory_usage is None, f"Error: Memory usage should not be available for PID {pid} after deleting the cgroup."
+ if delete == False:
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.usage_in_bytes", 'r') as file:
+ content = file.readline().strip()
+ memory_read = int(content)
+ memory_sum *= 1024
+ delta = abs(memory_read - memory_sum)
+ # print(f"read: {memory_read}")
+ # print(f"sum : {memory_sum}")
+ if (delta > max(memory_read, memory_sum) * 0.1):
+ return 1
+ else:
+ return 0
+ else:
+ return 0
+
+def check_kmem_usage(cgroup_name, pids, delete):
+ kmem_sum = 0
+ for pid in pids:
+ kmem_usage = get_process_kmem_usage(pid, cgroup_name)
+ if delete == False:
+ assert kmem_usage is not None, f"Kmem usage not found for PID {pid}"
+ assert kmem_usage >= 0, f"Kmem usage should be greater than zero for PID {pid}"
+ kmem_sum += kmem_usage
+ else:
+ assert kmem_usage is None, f"Error: Kmem usage should not be available for PID {pid} after deleting the cgroup."
+ if delete == False:
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.kmem.usage_in_bytes", 'r') as file:
+ content = file.readline().strip()
+ kmem_read = int(content)
+ kmem_sum *= 1024
+ delta = abs(kmem_read - kmem_sum)
+ # print(f"kmem read: {kmem_read}")
+ # print(f"kmem sum : {kmem_sum}")
+ # assert delta <= max(kmem_read, kmem_sum) * 0.2, f"Kmem read by probeCgroup is not accurate, {kmem_read}, {kmem_sum}"
+
+def cleanup(processes):
+ for process in processes:
+ process.terminate()
+ process.wait()
+
+def get_oom_process_memory_usage(pid, cgroup_name):
+ # ������������������������������
+ cur_name = ''
+ oom = False
+ with open('/proc/cgroup_memory_usage_per_process', 'r') as file:
+ for line in file:
+ parts = line.strip().split()
+ if len(parts) >= 4 and parts[0] == 'cgroup':
+ cur_name = parts[3]
+ oom = False
+ if len(parts) >= 1 and parts[0] == 'oom:':
+ oom = True
+ if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid and oom == True:
+ return int(parts[2])
+ return None
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/mem-allocate.c b/tools/probeCgroup/testcases/mem-allocate.c
new file mode 100644
index 000000000000..e78e37cae61b
--- /dev/null
+++ b/tools/probeCgroup/testcases/mem-allocate.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mem-allocate.c - The program to test probeCgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#define MB (1024 * 1024)
+
+char *arr[40];
+
+int main(int argc, char *argv[])
+{
+ char *p;
+ int i = 0;
+
+ while (1) {
+ for (i = 0; i < 40; i++) {
+ p = (char *)malloc(MB);
+ memset(p, 0, MB);
+ arr[i] = p;
+ usleep(100000);
+ }
+ for (int i = 0; i < 40; i++) {
+ free(arr[i]);
+ usleep(100000);
+ }
+ }
+ return 0;
+}
diff --git a/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c
new file mode 100644
index 000000000000..55f4c068f55e
--- /dev/null
+++ b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * multiple-thread-mem-allocate.c - The program to test probeCgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <pthread.h>
+
+#define MB (1024 * 1024)
+
+void *memory_test(void *)
+{
+ char *arr[25];
+ char *p;
+ int i = 0;
+ int cnt = 0;
+
+ while (1) {
+ for (i = 0; i < 20; i++) {
+ p = (char *)malloc(MB);
+ memset(p, 0, MB);
+ arr[i] = p;
+ usleep(10000);
+ }
+ for (int i = 0; i < 20; i++) {
+ free(arr[i]);
+ usleep(10000);
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ pthread_t threads[4];
+ int rc;
+
+ // create threads
+ for (int i = 0; i < 4; i++) {
+ rc = pthread_create(&threads[i], NULL, memory_test, NULL);
+ if (rc != 0) {
+ fprintf(stderr, "Error creating thread: %d\n", rc);
+ return 1;
+ }
+ }
+
+ for (int i = 0; i < 4; i++) {
+ rc = pthread_join(threads[i], NULL);
+ if (rc != 0) {
+ fprintf(stderr, "Error joining thread: %d\n", rc);
+ return 1;
+ }
+ }
+
+ return 0;
+}
diff --git a/tools/probeCgroup/testcases/run.py b/tools/probeCgroup/testcases/run.py
new file mode 100755
index 000000000000..8ffd0ca720d8
--- /dev/null
+++ b/tools/probeCgroup/testcases/run.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+import os
+import subprocess
+import sys
+
+def run_tests(directory):
+ """Run all Python scripts in the given directory."""
+ python_files = [f for f in os.listdir(directory) if f.endswith('test.py')]
+ python_files.sort()
+
+ for filename in python_files:
+ try:
+ filepath = os.path.join(directory, filename)
+
+ subprocess.check_call([sys.executable, filepath])
+ except subprocess.CalledProcessError as e:
+ print(f"Error executing {filename}:")
+ return
+ except Exception as e:
+ print(f"Error executing {filename}:")
+ print(e)
+ return
+
+if __name__ == '__main__':
+ tests_directory = '.'
+ subprocess.check_call(['gcc', 'mem-allocate.c', '-o', 'mem-allocate'])
+ subprocess.check_call(['gcc', 'simple-mem-allocate.c', '-o', 'simple-mem-allocate'])
+ subprocess.check_call(['gcc', 'multiple-thread-mem-allocate.c', '-o', 'multiple-thread-mem-allocate'])
+ run_tests(tests_directory)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/simple-mem-allocate.c b/tools/probeCgroup/testcases/simple-mem-allocate.c
new file mode 100644
index 000000000000..16328b10ba48
--- /dev/null
+++ b/tools/probeCgroup/testcases/simple-mem-allocate.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mem-allocate.c - The program to test probeCgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#define MB (1024 * 1024)
+
+int main(int argc, char *argv[])
+{
+ char *p;
+ int i = 0;
+
+ while (1) {
+ p = (char *)malloc(MB);
+ memset(p, 0, MB);
+ sleep(1);
+ }
+
+ return 0;
+}
--
2.43.0
2
1
Chen Ridong (1):
cgroup/cpuset: fix panic caused by partcmd_update
Waiman Long (3):
cgroup/cpuset: Optimize isolated partition only
generate_sched_domains() calls
cgroup/cpuset: Fix remote root partition creation problem
cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if
cpus.exclusive not set
kernel/cgroup/cpuset.c | 68 +++++++++++++++++++++++++++++++++---------
1 file changed, 54 insertions(+), 14 deletions(-)
--
2.34.1
2
5

[PATCH penEuler-23.09] Add dynamic kprobe-based process-level cgroup memory monitoring tool
by Taoxy2004 05 Sep '24
by Taoxy2004 05 Sep '24
05 Sep '24
---
tools/probeCgroup:
This patch introduces a new tool called "probeCgroup" that enables dynamic
monitoring of memory usage at the process level within cgroups. By using
kprobes at relevant cgroup functions, this tool can track memory allocations
and deallocations for individual processes within a cgroup, providing detailed
statistics on memory usage.
The key features of the tool include:
1. Dynamic insertion of kprobes at critical points in the cgroup subsystem.
2. Tracking memory allocation and deallocation events for each process by recording page addresses in a hash table.
3. Providing real-time statistics on memory usage at the process level.
4. Providing statistics on memory usage for processes that are OOM.
Signed-off-by: Taoxy2004 <221870066(a)smail.nju.edu.cn>
---
tools/probeCgroup/Makefile | 7 +
tools/probeCgroup/README.md | 29 +
tools/probeCgroup/probeCgroup.c | 612 ++++++++++++++++++
tools/probeCgroup/probeCgroup.h | 415 ++++++++++++
tools/probeCgroup/run.sh | 8 +
tools/probeCgroup/scripts/script1.sh | 10 +
tools/probeCgroup/scripts/script2.sh | 14 +
tools/probeCgroup/scripts/script3.sh | 11 +
.../testcases/1_load_unload_test.py | 24 +
.../testcases/2_multiple_process_test.py | 48 ++
.../testcases/3_multiple_cgroup_test.py | 55 ++
tools/probeCgroup/testcases/4_oom_test.py | 52 ++
.../testcases/5_multiple_threads_test.py | 45 ++
tools/probeCgroup/testcases/cgroup_utils.py | 115 ++++
tools/probeCgroup/testcases/mem-allocate.c | 35 +
.../testcases/multiple-thread-mem-allocate.c | 60 ++
tools/probeCgroup/testcases/run.py | 32 +
.../testcases/simple-mem-allocate.c | 27 +
18 files changed, 1599 insertions(+)
create mode 100644 tools/probeCgroup/Makefile
create mode 100644 tools/probeCgroup/README.md
create mode 100644 tools/probeCgroup/probeCgroup.c
create mode 100644 tools/probeCgroup/probeCgroup.h
create mode 100755 tools/probeCgroup/run.sh
create mode 100755 tools/probeCgroup/scripts/script1.sh
create mode 100755 tools/probeCgroup/scripts/script2.sh
create mode 100755 tools/probeCgroup/scripts/script3.sh
create mode 100755 tools/probeCgroup/testcases/1_load_unload_test.py
create mode 100755 tools/probeCgroup/testcases/2_multiple_process_test.py
create mode 100755 tools/probeCgroup/testcases/3_multiple_cgroup_test.py
create mode 100755 tools/probeCgroup/testcases/4_oom_test.py
create mode 100755 tools/probeCgroup/testcases/5_multiple_threads_test.py
create mode 100644 tools/probeCgroup/testcases/cgroup_utils.py
create mode 100644 tools/probeCgroup/testcases/mem-allocate.c
create mode 100644 tools/probeCgroup/testcases/multiple-thread-mem-allocate.c
create mode 100755 tools/probeCgroup/testcases/run.py
create mode 100644 tools/probeCgroup/testcases/simple-mem-allocate.c
diff --git a/tools/probeCgroup/Makefile b/tools/probeCgroup/Makefile
new file mode 100644
index 000000000000..606c951e5487
--- /dev/null
+++ b/tools/probeCgroup/Makefile
@@ -0,0 +1,7 @@
+obj-m := probeCgroup.o
+CROSS_COMPILE = ''
+KDIR := /lib/modules/$(shell uname -r)/build
+all:
+ make -C $(KDIR) M=$(PWD) modules
+clean:
+ rm -f *.ko *.o *.mod *.mod.o *.mod.c .*.cmd *.symvers module*
diff --git a/tools/probeCgroup/README.md b/tools/probeCgroup/README.md
new file mode 100644
index 000000000000..ff0b6fc21228
--- /dev/null
+++ b/tools/probeCgroup/README.md
@@ -0,0 +1,29 @@
+# probeCgroup
+
+#### Description
+probeCgroup is a process-level cgroup memory monitoring tool based on dynamic tracing (kprobe/kretprobe) technology. By inserting kprobes and kretprobes at the entry and exit points of relevant cgroup functions, this tool can track the memory usage of individual processes within each cgroup in real time.
+
+#### Software Architecture
+1. Dynamic Tracing : Insert kprobes and kretprobes at critical points in cgroup functions to capture memory allocation and release events.
+2. Hash Table Recording : Record the addresses of pages currently used by each process in a hash table, so that when a page is released, the process it belongs to can be identified.
+3. Real-Time Statistics : Provide real-time statistics showing the memory usage of individual processes within each cgroup.
+
+#### Instruction
+1. Compile and Load the Module
+ a. In the 'probeCgroup' directory, run the 'make' command to compile the module.
+ b. Load the module: 'insmod probeCgroup.ko'.
+ c. View memory statistics: 'cat /proc/cgroup_memory_usage_per_process'.
+ If an OOM (Out of Memory) event occurs in a cgroup, you can see "oom:" followed by the process that experienced the OOM and its memory usage at the time.
+
+2. Automate OOM Scenario
+ In the 'probeCgroup' directory, run './run.sh'. This script will automatically set up an OOM scenario and output the content of '/proc/cgroup_memory_usage_per_process' after execution.
+
+3. Perform More Tests
+ a. After compiling the module, in the 'testcases' directory, run './run.py'.
+ b. This script will perform various tests, including:
+ - Loading and unloading the module
+ - Each cgroup containing multiple processes
+ - Creating multiple cgroups
+ - OOM scenarios
+ - Multithreading
+ c. The tests will take approximately one minute to complete.
diff --git a/tools/probeCgroup/probeCgroup.c b/tools/probeCgroup/probeCgroup.c
new file mode 100644
index 000000000000..9883cb1e082d
--- /dev/null
+++ b/tools/probeCgroup/probeCgroup.c
@@ -0,0 +1,612 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * probeCgroup.c - A tool used to get memory usage for each process in a cgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include "probeCgroup.h"
+
+// kretprobe at mem_cgroup_charge
+struct charge_data {
+ struct cgroup *cgrp;
+ struct mem_cgroup *memcg;
+ struct task_struct *task;
+ unsigned long addr;
+};
+
+static int mem_cgroup_charge_entry_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct charge_data *data;
+ struct folio *page;
+ struct mm_struct *mm;
+ struct mem_cgroup *memcg;
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+
+ if (!current->mm)
+ return 1;
+ page = (struct folio *)regs->di;
+ mm = (struct mm_struct *)regs->si;
+ if (mm == NULL || page == NULL)
+ return -1;
+ memcg = get_mem_cgroup_from_mm(mm);
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+
+ data = (struct charge_data *)ri->data;
+ data->memcg = memcg;
+ data->addr = (unsigned long)page;
+ data->task = current;
+ data->cgrp = cgrp;
+ }
+ return 0;
+}
+
+NOKPROBE_SYMBOL(mem_cgroup_charge_entry_handler);
+
+static int mem_cgroup_charge_ret_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ unsigned long retval = regs_return_value(regs);
+ struct charge_data *data = (struct charge_data *)ri->data;
+ int id;
+ struct cgroup_info *cgrp_info;
+ struct task_info *tsk_info;
+
+ if (data->memcg != NULL && retval == 0) {
+ id = ((data->memcg)->css).id;
+
+ spin_lock(&lock);
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info == NULL) {
+ cgrp_info = create_cgroup_info(data->cgrp, data->memcg);
+ if (cgrp_info == NULL) {
+ spin_unlock(&lock);
+ return -1;
+ }
+ add_cgroup_info(cgrp_info);
+ }
+ spin_unlock(&lock);
+
+ read_lock(&cgrp_info->cgrp_lock);
+ tsk_info = find_task_info(cgrp_info, data->task->tgid);
+ read_unlock(&cgrp_info->cgrp_lock);
+
+ // for some cases, task->comm changes over time
+ if (tsk_info != NULL
+ && strcmp(data->task->comm, tsk_info->comm) != 0) {
+ strscpy(tsk_info->comm, data->task->comm,
+ sizeof(tsk_info->comm));
+ }
+
+ if (tsk_info == NULL) {
+ tsk_info = create_task_info(data->task);
+ if (tsk_info == NULL)
+ return -1;
+ add_task_to_cgroup_info(cgrp_info, tsk_info);
+ }
+
+ if (HashMap_insert(tsk_info->pages, data->addr)) {
+ //update counter
+ spin_lock(&(tsk_info->cnt_lock));
+ tsk_info->count +=
+ folio_nr_pages((struct folio *)data->addr);
+ spin_unlock(&(tsk_info->cnt_lock));
+ }
+ }
+
+ return 0;
+
+}
+
+NOKPROBE_SYMBOL(mem_cgroup_charge_ret_handler);
+
+static struct kretprobe mem_cgroup_charge_kretprobe = {
+ .handler = mem_cgroup_charge_ret_handler,
+ .entry_handler = mem_cgroup_charge_entry_handler,
+ .data_size = sizeof(struct charge_data),
+ .maxactive = 20,
+};
+
+static int mem_cgroup_charge_kretprobe_init(void)
+{
+ int ret;
+
+ mem_cgroup_charge_kretprobe.kp.symbol_name = "__mem_cgroup_charge";
+ ret = register_kretprobe(&mem_cgroup_charge_kretprobe);
+ if (ret < 0) {
+ pr_err("register_kretprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted return probe at %s: %p\n",
+ mem_cgroup_charge_kretprobe.kp.symbol_name,
+ mem_cgroup_charge_kretprobe.kp.addr);
+ return 0;
+}
+
+static void mem_cgroup_charge_kretprobe_exit(void)
+{
+ unregister_kretprobe(&mem_cgroup_charge_kretprobe);
+ pr_info("kretprobe at %p unregistered\n",
+ mem_cgroup_charge_kretprobe.kp.addr);
+
+ /* nmissed > 0 suggests that maxactive was set too low. */
+ pr_info("Missed probing %d instances of %s\n",
+ mem_cgroup_charge_kretprobe.nmissed,
+ mem_cgroup_charge_kretprobe.kp.symbol_name);
+}
+
+// kretprobe at uncharge_folio
+
+struct uncharge_data {
+ struct cgroup *cgrp;
+ struct mem_cgroup *memcg;
+ unsigned long addr;
+ bool isKmem;
+ int nr_pages;
+};
+
+static int uncharge_folio_entry_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct uncharge_data *data;
+ struct folio *page;
+ struct mem_cgroup *memcg = NULL;
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+ struct obj_cgroup *objcg;
+ int nr_pages = 0;
+
+ data = (struct uncharge_data *)ri->data;
+ page = (struct folio *)regs->di;
+ if (page == NULL) {
+ data->memcg = NULL;
+ return -1;
+ }
+ if (page->memcg_data & MEMCG_DATA_KMEM) { // if the page belongs to kmem
+ if (!folio_test_large(page))
+ nr_pages = 1;
+ else
+ nr_pages = page->_folio_nr_pages;
+ // nr_pages = thp_nr_pages(page);
+ objcg = __folio_objcg(page);
+ if (objcg != NULL)
+ memcg = objcg->memcg;
+ data->isKmem = true;
+ data->nr_pages = nr_pages;
+ } else {
+ memcg = __folio_memcg(page);
+ data->isKmem = false;
+ }
+
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+
+ data->memcg = memcg;
+ data->addr = (unsigned long)page;
+ data->cgrp = cgrp;
+ }
+ return 0;
+}
+
+NOKPROBE_SYMBOL(uncharge_folio_entry_handler);
+
+static int uncharge_folio_ret_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct uncharge_data *data = (struct uncharge_data *)ri->data;
+ int id;
+ struct cgroup_info *cgrp_info;
+ int ret = -1;
+
+ if (data->memcg != NULL) {
+ id = ((data->memcg)->css).id;
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info == NULL)
+ return -1;
+ if (data->isKmem)
+ ret = -1;
+ else
+ ret = remove_page_from_cgroup_info(data->addr, cgrp_info);
+ }
+
+ return ret;
+}
+
+NOKPROBE_SYMBOL(uncharge_folio_ret_handler);
+
+static struct kretprobe uncharge_folio_kretprobe = {
+ .handler = uncharge_folio_ret_handler,
+ .entry_handler = uncharge_folio_entry_handler,
+ .data_size = sizeof(struct uncharge_data),
+ .maxactive = 20,
+};
+
+static int uncharge_folio_kretprobe_init(void)
+{
+ int ret;
+
+ uncharge_folio_kretprobe.kp.symbol_name = "uncharge_folio";
+ ret = register_kretprobe(&uncharge_folio_kretprobe);
+ if (ret < 0) {
+ pr_err("register_kretprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted return probe at %s: %p\n",
+ uncharge_folio_kretprobe.kp.symbol_name,
+ uncharge_folio_kretprobe.kp.addr);
+ return 0;
+}
+
+static void uncharge_folio_kretprobe_exit(void)
+{
+ unregister_kretprobe(&uncharge_folio_kretprobe);
+ pr_info("kretprobe at %p unregistered\n",
+ uncharge_folio_kretprobe.kp.addr);
+
+ /* nmissed > 0 suggests that maxactive was set too low. */
+ pr_info("Missed probing %d instances of %s\n",
+ uncharge_folio_kretprobe.nmissed,
+ uncharge_folio_kretprobe.kp.symbol_name);
+}
+
+//kprobe at do_exit
+static struct kprobe do_exit_kprobe;
+static int do_exit_kprobe_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct task_struct *cur = current;
+ int tgid = cur->tgid;
+ struct mm_struct *mm = cur->mm;
+ struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+ struct cgroup_info *cgrp_info;
+ struct task_info *tsk_info;
+ int id;
+
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+ id = (memcg->css).id;
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info != NULL) {
+ write_lock(&cgrp_info->cgrp_lock);
+ tsk_info = find_task_info(cgrp_info, tgid);
+ if (tsk_info != NULL) {
+ list_del(&tsk_info->list);
+ write_unlock(&cgrp_info->cgrp_lock);
+ remove_task_from_cgroup_info(cgrp_info,
+ tsk_info);
+ } else {
+ write_unlock(&cgrp_info->cgrp_lock);
+ }
+ return 0;
+ }
+ }
+ return 0;
+}
+
+static void do_exit_kprobe_post_handler(struct kprobe *p,
+ struct pt_regs *regs,
+ unsigned long flags)
+{
+
+}
+
+static int do_exit_kprobe_init(void)
+{
+ do_exit_kprobe.pre_handler = do_exit_kprobe_pre_handler;
+ do_exit_kprobe.post_handler = do_exit_kprobe_post_handler;
+ do_exit_kprobe.symbol_name = "do_exit";
+ if (register_kprobe(&do_exit_kprobe)) {
+ pr_alert("register_kprobe on do_exit failed!\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static void do_exit_kprobe_exit(void)
+{
+ unregister_kprobe(&do_exit_kprobe);
+}
+
+//kprobe at mark_oom_victim
+static struct kprobe mark_oom_victim_kprobe;
+
+static int mark_oom_victim_kprobe_pre_handler(struct kprobe *p,
+ struct pt_regs *regs)
+{
+ struct task_struct *victim;
+ int tgid;
+ struct mm_struct *mm;
+ struct mem_cgroup *memcg;
+ struct cgroup_subsys_state css;
+ struct cgroup *cgrp;
+ struct cgroup_info *cgrp_info;
+ struct task_info *tsk_info;
+ int id;
+ struct task_info *oom_info;
+
+ victim = (struct task_struct *)regs->di;
+ tgid = victim->tgid;
+ mm = victim->mm;
+ memcg = get_mem_cgroup_from_mm(mm);
+ if (memcg != NULL) {
+ css = memcg->css;
+ cgrp = css.cgroup;
+ id = (memcg->css).id;
+ cgrp_info = find_cgroup_info(id);
+ if (cgrp_info != NULL) {
+ read_lock(&cgrp_info->cgrp_lock);
+ tsk_info = find_task_info(cgrp_info, tgid);
+ read_unlock(&cgrp_info->cgrp_lock);
+ if (tsk_info != NULL) {
+ oom_info = create_oom_task_info(tsk_info);
+ if (oom_info != NULL) {
+ add_oom_task_to_cgroup_info(cgrp_info,
+ oom_info);
+ }
+ return 0;
+ }
+ }
+ }
+ return 0;
+}
+
+static void mark_oom_victim_kprobe_post_handler(struct kprobe *p,
+ struct pt_regs *regs,
+ unsigned long flags)
+{
+
+}
+
+static int mark_oom_victim_kprobe_init(void)
+{
+ mark_oom_victim_kprobe.pre_handler = mark_oom_victim_kprobe_pre_handler;
+ mark_oom_victim_kprobe.post_handler =
+ mark_oom_victim_kprobe_post_handler;
+ mark_oom_victim_kprobe.symbol_name = "mark_oom_victim";
+ if (register_kprobe(&mark_oom_victim_kprobe)) {
+ pr_alert("register_kprobe on mark_oom_victim failed!\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static void mark_oom_victim_kprobe_exit(void)
+{
+ unregister_kprobe(&mark_oom_victim_kprobe);
+}
+
+//kretporbe at cgroup_destroy_locked
+struct destroy_data {
+ struct cgroup *cgrp;
+};
+
+static int cgroup_destroy_locked_entry_handler(struct kretprobe_instance
+ *ri, struct pt_regs *regs)
+{
+ struct destroy_data *data;
+
+ data = (struct destroy_data *)ri->data;
+ data->cgrp = (struct cgroup *)regs->di;
+ return 0;
+}
+
+NOKPROBE_SYMBOL(cgroup_destroy_locked_entry_handler);
+
+static int cgroup_destroy_locked_ret_handler(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct destroy_data *data = (struct destroy_data *)ri->data;
+ struct cgroup *cgrp = data->cgrp;
+ struct cgroup_info *cgrp_info = NULL;
+ unsigned long retval = regs_return_value(regs);
+
+ if (!cgrp)
+ return -1;
+ if (retval != 0)
+ return -1;
+ list_for_each_entry(cgrp_info, &all_cgroup_info, list) {
+ if (cgrp_info->cgrp == cgrp) {
+ spin_lock(&lock);
+ list_del(&cgrp_info->list);
+ spin_unlock(&lock);
+ destroy_cgroup_info(cgrp_info);
+ return 0;
+ }
+ }
+ return -1;
+}
+
+NOKPROBE_SYMBOL(cgroup_destroy_locked_ret_handler);
+
+static struct kretprobe cgroup_destroy_locked_kretprobe = {
+ .handler = cgroup_destroy_locked_ret_handler,
+ .entry_handler = cgroup_destroy_locked_entry_handler,
+ .data_size = sizeof(struct destroy_data),
+ .maxactive = 20,
+};
+
+static int cgroup_destroy_locked_kretprobe_init(void)
+{
+ int ret;
+
+ cgroup_destroy_locked_kretprobe.kp.symbol_name =
+ "cgroup_destroy_locked";
+ ret = register_kretprobe(&cgroup_destroy_locked_kretprobe);
+ if (ret < 0) {
+ pr_err("register_kretprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted return probe at %s: %p\n",
+ cgroup_destroy_locked_kretprobe.kp.symbol_name,
+ cgroup_destroy_locked_kretprobe.kp.addr);
+ return 0;
+}
+
+static void cgroup_destroy_locked_kretprobe_exit(void)
+{
+ unregister_kretprobe(&cgroup_destroy_locked_kretprobe);
+ pr_info("kretprobe at %p unregistered\n",
+ cgroup_destroy_locked_kretprobe.kp.addr);
+
+ /* nmissed > 0 suggests that maxactive was set too low. */
+ pr_info("Missed probing %d instances of %s\n",
+ cgroup_destroy_locked_kretprobe.nmissed,
+ cgroup_destroy_locked_kretprobe.kp.symbol_name);
+}
+
+// print the tasks in order of their memory usage
+static void print_sorted_tasks_list(struct cgroup_info *cgrp_info,
+ int type, struct seq_file *m)
+{
+ struct list_head *cur, *insert_pos;
+ struct task_info *task, *insert_task;
+ struct list_head new_list = LIST_HEAD_INIT(new_list);
+ struct list_head *old_list;
+ struct task_info *new_task, *next_task;
+
+ if (type == 0) {
+ if (cgrp_info == NULL)
+ return;
+ read_lock(&cgrp_info->cgrp_lock);
+ old_list = &cgrp_info->tasks_list;
+ } else {
+ if (cgrp_info == NULL)
+ return;
+ old_list = &cgrp_info->oom_list;
+ }
+
+ list_for_each_entry_safe(task, insert_task, old_list, list) {
+ new_task = kmalloc(sizeof(struct task_info), GFP_ATOMIC);
+ if (!new_task)
+ return;
+ new_task->tgid = task->tgid;
+ strscpy(new_task->comm, task->comm, sizeof(new_task->comm));
+ new_task->count = task->count;
+ new_task->pages = NULL;
+ INIT_LIST_HEAD(&new_task->list);
+
+ //insertion sort
+ cur = &new_list;
+ insert_pos = cur->next;
+ while (insert_pos != &new_list) {
+ next_task =
+ list_entry(insert_pos, struct task_info, list);
+ if (new_task->count >= next_task->count)
+ break;
+ cur = insert_pos;
+ insert_pos = insert_pos->next;
+ }
+
+ (&new_task->list)->prev = insert_pos->prev;
+ (insert_pos->prev)->next = (&new_task->list);
+ (&new_task->list)->next = insert_pos;
+ insert_pos->prev = (&new_task->list);
+ }
+ if (type == 0)
+ read_unlock(&cgrp_info->cgrp_lock);
+
+ //print
+ if (type == 1 && (&new_list) != new_list.next) {
+ seq_puts(m, "oom:\n");
+ seq_printf(m, "%10s %20s %20s\n", "pid", "command",
+ "memory usage (KB)");
+ }
+ if (type == 0)
+ seq_printf(m, "%10s %20s %20s\n", "pid", "command",
+ "memory usage (KB)");
+ list_for_each_entry_safe(task, insert_task, &new_list, list) {
+ seq_printf(m, "%10d %20s %20d\n", task->tgid, task->comm,
+ (task->count) * 4);
+ }
+
+ list_for_each_entry_safe(task, insert_task, &new_list, list) {
+ list_del(&task->list);
+ kfree(task);
+ }
+}
+
+static struct proc_dir_entry *cgroup_info_read;
+#define procfs_file_read "cgroup_memory_usage_per_process"
+
+void seq_print_tasks(struct cgroup_info *cgroup_info, struct seq_file *m)
+{
+ if (!cgroup_info)
+ return;
+
+ print_sorted_tasks_list(cgroup_info, 0, m);
+}
+
+void seq_print_oom_tasks(struct cgroup_info *cgroup_info, struct seq_file *m)
+{
+ if (!cgroup_info)
+ return;
+
+ print_sorted_tasks_list(cgroup_info, 1, m);
+}
+
+void seq_print_cgroups(struct seq_file *m)
+{
+ struct cgroup_info *cgrp, *pos;
+
+ spin_lock(&lock);
+ list_for_each_entry_safe(cgrp, pos, &all_cgroup_info, list) {
+ seq_printf(m, "cgroup name : %s\n", cgrp->name);
+ seq_print_tasks(cgrp, m);
+ seq_print_oom_tasks(cgrp, m);
+ seq_puts(m, "\n");
+ }
+ spin_unlock(&lock);
+}
+
+static int memory_usage_show(struct seq_file *m, void *v)
+{
+ seq_print_cgroups(m);
+ return 0;
+}
+
+static int __init global_init(void)
+{
+ int ret = 0;
+
+ cgroup_info_read =
+ proc_create_single(procfs_file_read, 0, NULL, memory_usage_show);
+ if (!cgroup_info_read)
+ return -ENOMEM;
+ ret = mem_cgroup_charge_kretprobe_init();
+ uncharge_folio_kretprobe_init();
+ do_exit_kprobe_init();
+ mark_oom_victim_kprobe_init();
+ cgroup_destroy_locked_kretprobe_init();
+
+ return ret;
+}
+
+static void __exit global_exit(void)
+{
+ struct cgroup_info *cgrp_info, *pos;
+
+ mem_cgroup_charge_kretprobe_exit();
+ uncharge_folio_kretprobe_exit();
+ do_exit_kprobe_exit();
+ mark_oom_victim_kprobe_exit();
+ cgroup_destroy_locked_kretprobe_exit();
+
+ remove_proc_entry(procfs_file_read, NULL);
+
+ //release all memory use
+ list_for_each_entry_safe(cgrp_info, pos, &all_cgroup_info, list) {
+ list_del(&cgrp_info->list);
+ destroy_cgroup_info(cgrp_info);
+ }
+}
+
+module_init(global_init)
+module_exit(global_exit)
+MODULE_LICENSE("GPL");
diff --git a/tools/probeCgroup/probeCgroup.h b/tools/probeCgroup/probeCgroup.h
new file mode 100644
index 000000000000..953a6e0aca31
--- /dev/null
+++ b/tools/probeCgroup/probeCgroup.h
@@ -0,0 +1,415 @@
+/* SPDX-License-Identifier: GPL-2.0*/
+/*
+ * probeCgroup.h
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kprobes.h>
+#include <linux/ktime.h>
+#include <linux/limits.h>
+#include <linux/sched.h>
+#include <linux/mm_types.h>
+#include <linux/memcontrol.h>
+#include <linux/cgroup-defs.h>
+#include <linux/kernfs.h>
+#include <linux/string.h>
+#include <linux/list.h>
+#include <linux/oom.h>
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+#include <linux/huge_mm.h>
+#include <linux/page-flags.h>
+#include <linux/spinlock.h>
+#include <linux/rwlock.h>
+
+static spinlock_t lock; // global lock for the list of cgroup_info
+
+struct HashNode {
+ unsigned long addr;
+ struct HashNode *next;
+};
+
+struct HashNode *HashNode_create(unsigned long addr)
+{
+ struct HashNode *node = NULL;
+
+ node = kzalloc(sizeof(struct HashNode), GFP_ATOMIC);
+ if (node == NULL)
+ return NULL;
+ node->addr = addr;
+ node->next = NULL;
+ return node;
+}
+
+struct HashBucket {
+ struct HashNode *head;
+ spinlock_t bkt_lock;
+};
+
+void HashBucket_init(struct HashBucket *bkt)
+{
+ bkt->head = NULL;
+ spin_lock_init(&bkt->bkt_lock);
+}
+
+bool HashBucket_insert(struct HashBucket *bkt, unsigned long addr)
+{
+ struct HashNode *new_node;
+ struct HashNode *node;
+ struct HashNode *prev;
+ bool ret = true;
+
+ if (bkt == NULL)
+ return false;
+
+ prev = NULL;
+ new_node = NULL;
+ new_node = HashNode_create(addr);
+
+ spin_lock(&bkt->bkt_lock);
+ node = bkt->head;
+ while (node != NULL && node->addr != addr) {
+ prev = node;
+ node = node->next;
+ }
+ if (node == NULL) {
+ if (new_node == NULL) {
+ pr_info("not enough memory for HashNode\n");
+ spin_unlock(&bkt->bkt_lock);
+ return false;
+ }
+ if (bkt->head == NULL)
+ bkt->head = new_node;
+ else
+ prev->next = new_node;
+ spin_unlock(&bkt->bkt_lock);
+ ret = true;
+ } else {
+ spin_unlock(&bkt->bkt_lock);
+ kfree(new_node);
+ ret = false;
+ }
+
+ return ret;
+}
+
+bool HashBucket_erase(struct HashBucket *bkt, unsigned long addr)
+{
+ struct HashNode *node;
+ struct HashNode *prev;
+ bool ret = true;
+
+ if (bkt == NULL)
+ return false;
+
+ spin_lock(&bkt->bkt_lock);
+ node = bkt->head;
+ prev = NULL;
+ while (node != NULL && node->addr != addr) {
+ prev = node;
+ node = node->next;
+ }
+ if (node == NULL) {
+ spin_unlock(&bkt->bkt_lock);
+ ret = false;
+ } else {
+ if (bkt->head == node)
+ bkt->head = node->next;
+ else
+ prev->next = node->next;
+ kfree(node);
+ spin_unlock(&bkt->bkt_lock);
+ ret = true;
+ }
+
+ return ret;
+}
+
+void HashBucket_clear(struct HashBucket *bkt)
+{
+ struct HashNode *node;
+ struct HashNode *prev;
+
+ if (bkt == NULL)
+ return;
+
+ spin_lock(&bkt->bkt_lock);
+ node = bkt->head;
+ prev = NULL;
+ bkt->head = NULL;
+ while (node != NULL) {
+ prev = node;
+ node = node->next;
+ kfree(prev);
+ }
+ spin_unlock(&bkt->bkt_lock);
+}
+
+struct HashMap {
+ unsigned long size;
+ struct HashBucket *HashTable;
+};
+
+unsigned long hash_func(unsigned long addr, unsigned long size)
+{
+ return addr % size;
+}
+
+struct HashMap *HashMap_create(unsigned long size)
+{
+ struct HashMap *hm = NULL;
+ struct HashBucket *ht = NULL;
+ int i = 0;
+
+ hm = kmalloc(sizeof(struct HashMap), GFP_ATOMIC);
+ if (hm == NULL)
+ return NULL;
+ ht = kmalloc((size * sizeof(struct HashBucket)), GFP_ATOMIC);
+ if (ht == NULL) {
+ kfree(hm);
+ return NULL;
+ }
+ for (i = 0; i < size; i++)
+ HashBucket_init(&(ht[i]));
+
+ hm->size = size;
+ hm->HashTable = ht;
+ return hm;
+}
+
+bool HashMap_insert(struct HashMap *hm, unsigned long addr)
+{
+ unsigned long index;
+
+ if (hm == NULL)
+ return false;
+ index = hash_func(addr, hm->size);
+ if (hm->HashTable == NULL)
+ return false;
+ return HashBucket_insert(&(hm->HashTable[index]), addr);
+}
+
+bool HashMap_erase(struct HashMap *hm, unsigned long addr)
+{
+ unsigned long index;
+
+ if (hm == NULL)
+ return false;
+ index = hash_func(addr, hm->size);
+ if (hm->HashTable == NULL)
+ return false;
+ return HashBucket_erase(&(hm->HashTable[index]), addr);
+}
+
+void HashMap_clear(struct HashMap *hm)
+{
+ unsigned long size;
+ struct HashBucket *ht;
+ int i;
+
+ if (hm == NULL)
+ return;
+ size = hm->size;
+ ht = hm->HashTable;
+ if (ht == NULL)
+ return;
+ hm->HashTable = NULL;
+ for (i = 0; i < size; i++)
+ HashBucket_clear(&(ht[i]));
+
+ kfree(ht);
+ kfree(hm);
+}
+
+//struct that save the information for each task
+struct task_info {
+ int tgid;
+ char comm[TASK_COMM_LEN];
+ int count; // number of pages
+ struct HashMap *pages;
+ struct list_head list;
+ spinlock_t cnt_lock;
+};
+
+// struct that save the information for each cgroup
+struct cgroup_info {
+ struct cgroup *cgrp;
+ struct mem_cgroup *memcg;
+ int id;
+ char name[64];
+ struct list_head list;
+ struct list_head tasks_list;
+ struct list_head oom_list;
+ rwlock_t cgrp_lock;
+ unsigned int cached_bytes;
+};
+
+static LIST_HEAD(all_cgroup_info); // a list that linked all the cgroup_info struct
+
+static struct task_info *create_task_info(struct task_struct *cur_task)
+{
+ struct task_info *tsk_info =
+ kmalloc(sizeof(struct task_info), GFP_ATOMIC);
+ if (!tsk_info)
+ return NULL;
+
+ // initialization
+ tsk_info->tgid = cur_task->tgid;
+ strscpy(tsk_info->comm, cur_task->comm, sizeof(tsk_info->comm));
+ tsk_info->count = 0;
+ tsk_info->pages = NULL;
+ tsk_info->pages = HashMap_create(1023);
+ INIT_LIST_HEAD(&tsk_info->list);
+ spin_lock_init(&(tsk_info->cnt_lock));
+
+ return tsk_info;
+}
+
+static int
+add_task_to_cgroup_info(struct cgroup_info *cgrp, struct task_info *task)
+{
+ if (!cgrp || !task)
+ return -EINVAL;
+
+ write_lock(&cgrp->cgrp_lock);
+ list_add_tail(&task->list, &cgrp->tasks_list);
+ write_unlock(&cgrp->cgrp_lock);
+ return 0;
+}
+
+static int
+remove_task_from_cgroup_info(struct cgroup_info *cgrp, struct task_info *task)
+{
+ if (cgrp == NULL || task == NULL)
+ return -EINVAL;
+
+ HashMap_clear(task->pages);
+ // kfree(task->pages);
+ kfree(task);
+ return 0;
+}
+
+static struct task_info *find_task_info(struct cgroup_info *cgrp, int tgid)
+{
+ struct task_info *tsk_info, *pos;
+
+ list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) {
+ if (tsk_info->tgid == tgid)
+ return tsk_info;
+ }
+ return NULL;
+}
+
+static int
+remove_page_from_cgroup_info(unsigned long addr, struct cgroup_info *cgrp)
+{
+ struct task_info *tsk_info, *pos;
+
+ read_lock(&cgrp->cgrp_lock);
+ list_for_each_entry_safe(tsk_info, pos, &cgrp->tasks_list, list) {
+ if (HashMap_erase(tsk_info->pages, addr)) {
+ spin_lock(&(tsk_info->cnt_lock));
+ tsk_info->count -= folio_nr_pages((struct folio *)addr);
+ spin_unlock(&(tsk_info->cnt_lock));
+ read_unlock(&cgrp->cgrp_lock);
+ return 0;
+ }
+ }
+ read_unlock(&cgrp->cgrp_lock);
+ return -1;
+}
+
+static struct cgroup_info *create_cgroup_info(struct cgroup *cgrp,
+ struct mem_cgroup *memcg)
+{
+ struct cgroup_info *cgrp_info =
+ kmalloc(sizeof(struct cgroup_info), GFP_ATOMIC);
+ struct kernfs_node *kn;
+
+ if (!cgrp_info)
+ return NULL;
+
+ cgrp_info->cgrp = cgrp;
+ cgrp_info->memcg = memcg;
+ cgrp_info->id = (memcg->css).id;
+ kn = cgrp->kn;
+ strscpy(cgrp_info->name, kn->name, sizeof(cgrp_info->name));
+ INIT_LIST_HEAD(&cgrp_info->list);
+ INIT_LIST_HEAD(&cgrp_info->tasks_list);
+ INIT_LIST_HEAD(&cgrp_info->oom_list);
+ rwlock_init(&(cgrp_info->cgrp_lock));
+ cgrp_info->cached_bytes = 0;
+
+ return cgrp_info;
+}
+
+static void destroy_cgroup_info(struct cgroup_info *cgrp_info)
+{
+ struct task_info *task, *tmp;
+
+ if (!cgrp_info)
+ return;
+
+ write_lock(&cgrp_info->cgrp_lock);
+ list_for_each_entry_safe(task, tmp, &cgrp_info->tasks_list, list) {
+ list_del(&task->list);
+ remove_task_from_cgroup_info(cgrp_info, task);
+ }
+ write_unlock(&cgrp_info->cgrp_lock);
+ list_for_each_entry_safe(task, tmp, &cgrp_info->oom_list, list) {
+ list_del(&task->list);
+ remove_task_from_cgroup_info(cgrp_info, task);
+ }
+
+ kfree(cgrp_info);
+}
+
+static int add_cgroup_info(struct cgroup_info *cgrp_info)
+{
+ if (!cgrp_info)
+ return -EINVAL;
+
+ list_add_tail(&cgrp_info->list, &all_cgroup_info);
+ return 0;
+}
+
+static struct cgroup_info *find_cgroup_info(int id)
+{
+ struct cgroup_info *cgrp_info = NULL;
+
+ list_for_each_entry(cgrp_info, &all_cgroup_info, list) {
+ if (cgrp_info->id == id)
+ return cgrp_info;
+ }
+
+ return NULL;
+}
+
+static struct task_info *create_oom_task_info(struct task_info *tsk_info)
+{
+ struct task_info *oom_tsk_info =
+ kmalloc(sizeof(struct task_info), GFP_ATOMIC);
+ if (!oom_tsk_info)
+ return NULL;
+
+ oom_tsk_info->tgid = tsk_info->tgid;
+ strscpy(oom_tsk_info->comm, tsk_info->comm, sizeof(oom_tsk_info->comm));
+ oom_tsk_info->count = tsk_info->count;
+ oom_tsk_info->pages = NULL;
+ INIT_LIST_HEAD(&oom_tsk_info->list);
+
+ return oom_tsk_info;
+}
+
+static int
+add_oom_task_to_cgroup_info(struct cgroup_info *cgrp,
+ struct task_info *oom_task)
+{
+ if (!cgrp || !oom_task)
+ return -EINVAL;
+ list_add_tail(&oom_task->list, &cgrp->oom_list);
+ return 0;
+}
diff --git a/tools/probeCgroup/run.sh b/tools/probeCgroup/run.sh
new file mode 100755
index 000000000000..7e1ffefe66d1
--- /dev/null
+++ b/tools/probeCgroup/run.sh
@@ -0,0 +1,8 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+cd scripts
+./script1.sh
+./script2.sh
+./script3.sh
diff --git a/tools/probeCgroup/scripts/script1.sh b/tools/probeCgroup/scripts/script1.sh
new file mode 100755
index 000000000000..539e8258afb9
--- /dev/null
+++ b/tools/probeCgroup/scripts/script1.sh
@@ -0,0 +1,10 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+cd ..
+make
+insmod probeCgroup.ko
+
+cd testcases
+gcc simple-mem-allocate.c -o simple-mem-allocate
diff --git a/tools/probeCgroup/scripts/script2.sh b/tools/probeCgroup/scripts/script2.sh
new file mode 100755
index 000000000000..2ad515cfb912
--- /dev/null
+++ b/tools/probeCgroup/scripts/script2.sh
@@ -0,0 +1,14 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+current_dir=$(pwd)
+cd /sys/fs/cgroup/memory
+mkdir test
+cd test
+sh -c "echo $$ >> cgroup.procs"
+sh -c "echo 5M > memory.limit_in_bytes"
+sh -c "echo 0 > memory.swappiness"
+cd "$current_dir"
+cd ../testcases
+./simple-mem-allocate
diff --git a/tools/probeCgroup/scripts/script3.sh b/tools/probeCgroup/scripts/script3.sh
new file mode 100755
index 000000000000..127eb45de5c9
--- /dev/null
+++ b/tools/probeCgroup/scripts/script3.sh
@@ -0,0 +1,11 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+cd /proc
+cat cgroup_memory_usage_per_process
+
+cat /sys/fs/cgroup/memory/test/cgroup.procs > /sys/fs/cgroup/memory/cgroup.procs
+rmdir /sys/fs/cgroup/memory/test
+# cat cgroup_memory_usage_per_process
+rmmod probeCgroup
diff --git a/tools/probeCgroup/testcases/1_load_unload_test.py b/tools/probeCgroup/testcases/1_load_unload_test.py
new file mode 100755
index 000000000000..5389a14a1dac
--- /dev/null
+++ b/tools/probeCgroup/testcases/1_load_unload_test.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+import os
+import subprocess
+import time
+
+def test_module_load_unload():
+ try:
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+ print('loading module successfully!')
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+ output = subprocess.check_output(['lsmod'])
+ assert b'probeCgroup' not in output
+ print('unloading module successfully!')
+ except subprocess.CalledProcessError as e:
+ print('Load unload test failed. Insmod failed.')
+ except AssertionError as e:
+ print('Load unload test failed. Cannot remove module.')
+
+if __name__ == '__main__':
+ test_module_load_unload()
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/2_multiple_process_test.py b/tools/probeCgroup/testcases/2_multiple_process_test.py
new file mode 100755
index 000000000000..d88c7f7f2952
--- /dev/null
+++ b/tools/probeCgroup/testcases/2_multiple_process_test.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage
+import os
+import subprocess
+import time
+
+def test_multiple_process(num_procs):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroup_name = 'test'
+ cgroup_path = create_cgroup(cgroup_name)
+
+ processes = []
+ pids = []
+
+ for i in range(num_procs):
+ process = subprocess.Popen(['./mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ processes.append(process)
+ pids.append(pid)
+
+ time.sleep(0.1)
+ try:
+ count = 0
+ for i in range (2000):
+ count += check_memory_usage(cgroup_name, pids, False)
+ time.sleep(0.01)
+ assert count <= 50, f"Memory read by probeCgroup is not accurate"
+
+ remove_cgroup(cgroup_path, pids)
+ check_memory_usage(cgroup_name, pids, True)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass multiple process test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ remove_cgroup(cgroup_path, pids)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_multiple_process(3)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/3_multiple_cgroup_test.py b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py
new file mode 100755
index 000000000000..592a716df877
--- /dev/null
+++ b/tools/probeCgroup/testcases/3_multiple_cgroup_test.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup
+import os
+import subprocess
+import time
+
+def test_multiple_cgroup(num_procs, num_cgroups):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroups = []
+ processes = {}
+ pids = {}
+ for i in range(num_cgroups):
+ cgroup_name = f'test_{i}'
+ cgroup_path = create_cgroup(cgroup_name)
+ cgroups.append((cgroup_name, cgroup_path))
+
+ for j in range(num_procs):
+ process = subprocess.Popen(['./mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ if cgroup_path not in processes:
+ processes[cgroup_path] = []
+ processes[cgroup_path].append(process)
+ if cgroup_path not in pids:
+ pids[cgroup_path] = []
+ pids[cgroup_path].append(pid)
+
+ time.sleep(0.1)
+ try:
+ for i in range (100):
+ for cgroup_name, cgroup_path in cgroups:
+ check_memory_usage(cgroup_name, pids[cgroup_path], False)
+ time.sleep(0.01)
+
+ for cgroup_name, cgroup_path in cgroups:
+ remove_cgroup(cgroup_path, pids[cgroup_path])
+ check_memory_usage(cgroup_name, pids[cgroup_path], True)
+ cleanup(processes[cgroup_path])
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass multiple cgroup test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ for cgroup_name, cgroup_path in cgroups:
+ remove_cgroup(cgroup_path, pids[cgroup_path])
+ cleanup(processes[cgroup_path])
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_multiple_cgroup(2,2)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/4_oom_test.py b/tools/probeCgroup/testcases/4_oom_test.py
new file mode 100755
index 000000000000..128a258c56f5
--- /dev/null
+++ b/tools/probeCgroup/testcases/4_oom_test.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, get_oom_process_memory_usage
+import os
+import subprocess
+import time
+
+def test_oom(num_procs):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroup_name = 'test'
+ cgroup_path = create_cgroup(cgroup_name)
+
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.limit_in_bytes", 'w') as limit_file:
+ limit_file.write("5M")
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.swappiness", 'w') as swap_file:
+ swap_file.write("0")
+
+ processes = []
+ pids = []
+ for i in range(num_procs):
+ process = subprocess.Popen(['./simple-mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ processes.append(process)
+ pids.append(pid)
+
+ time.sleep(6)
+
+ try:
+ for pid in pids:
+ memory_usage = get_oom_process_memory_usage(pid, cgroup_name)
+ assert memory_usage is not None, f"Memory usage(oom) not found for PID {pid}"
+ assert memory_usage > 0, f"Memory usage should be greater than zero for PID {pid}"
+
+ remove_cgroup(cgroup_path, pids)
+ check_memory_usage(cgroup_name, pids, True)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass oom test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ remove_cgroup(cgroup_path, pids)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_oom(1)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/5_multiple_threads_test.py b/tools/probeCgroup/testcases/5_multiple_threads_test.py
new file mode 100755
index 000000000000..7e1b86dabe48
--- /dev/null
+++ b/tools/probeCgroup/testcases/5_multiple_threads_test.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+from cgroup_utils import create_cgroup, add_process_to_cgroup, get_process_memory_usage, remove_cgroup, check_memory_usage, cleanup, check_kmem_usage
+import os
+import subprocess
+import time
+
+def test_multiple_thread(num_procs):
+ subprocess.check_call(['insmod', '../probeCgroup.ko'])
+ time.sleep(1)
+
+ cgroup_name = 'test'
+ cgroup_path = create_cgroup(cgroup_name)
+
+ processes = []
+ pids = []
+ for i in range(num_procs):
+ process = subprocess.Popen(['./multiple-thread-mem-allocate'])
+ pid = process.pid
+ add_process_to_cgroup(cgroup_path, pid)
+ processes.append(process)
+ pids.append(pid)
+
+ time.sleep(1)
+ try:
+ for i in range (200):
+ check_memory_usage(cgroup_name, pids, False)
+ time.sleep(0.01)
+
+ remove_cgroup(cgroup_path, pids)
+ check_memory_usage(cgroup_name, pids, True)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+ print('pass multiple threads test!')
+ except AssertionError as e:
+ print(f"Assertion failed: {e}")
+ remove_cgroup(cgroup_path, pids)
+ cleanup(processes)
+ subprocess.check_call(['rmmod', 'probeCgroup'])
+
+if __name__ == '__main__':
+ test_multiple_thread(5)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/cgroup_utils.py b/tools/probeCgroup/testcases/cgroup_utils.py
new file mode 100644
index 000000000000..f70c68c1f188
--- /dev/null
+++ b/tools/probeCgroup/testcases/cgroup_utils.py
@@ -0,0 +1,115 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+import os
+import subprocess
+import time
+
+def create_cgroup(cgroup_name):
+ cgroup_path = f'/sys/fs/cgroup/memory/{cgroup_name}'
+
+ try:
+ os.makedirs(cgroup_path)
+ except FileExistsError:
+ pass
+
+ return cgroup_path
+
+def add_process_to_cgroup(cgroup_path, pid):
+ with open(os.path.join(cgroup_path, 'cgroup.procs'), 'w') as procs_file:
+ procs_file.write(str(pid))
+
+def get_process_memory_usage(pid, cgroup_name):
+ cur_name = ''
+ with open('/proc/cgroup_memory_usage_per_process', 'r') as file:
+ for line in file:
+ parts = line.strip().split()
+ if len(parts) >= 4 and parts[0] == 'cgroup':
+ cur_name = parts[3]
+ if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid:
+ return int(parts[2])
+ return None
+
+def get_process_kmem_usage(pid, cgroup_name):
+ cur_name = ''
+ with open('/proc/cgroup_memory_usage_per_process', 'r') as file:
+ for line in file:
+ parts = line.strip().split()
+ if len(parts) >= 4 and parts[0] == 'cgroup':
+ cur_name = parts[3]
+ if len(parts) >= 4 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid:
+ return int(parts[3])
+ return None
+
+def remove_cgroup(cgroup_path, pids):
+ for pid in pids:
+ with open('/sys/fs/cgroup/memory/cgroup.procs', 'w') as backup_file:
+ backup_file.write(str(pid))
+ os.rmdir(cgroup_path)
+ return
+
+def check_memory_usage(cgroup_name, pids, delete):
+ memory_sum = 0
+ for pid in pids:
+ memory_usage = get_process_memory_usage(pid, cgroup_name)
+ if delete == False:
+ assert memory_usage is not None, f"Memory usage not found for PID {pid}"
+ assert memory_usage >= 0, f"Memory usage should be greater than zero for PID {pid}"
+ memory_sum += memory_usage
+ else:
+ assert memory_usage is None, f"Error: Memory usage should not be available for PID {pid} after deleting the cgroup."
+ if delete == False:
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.usage_in_bytes", 'r') as file:
+ content = file.readline().strip()
+ memory_read = int(content)
+ memory_sum *= 1024
+ delta = abs(memory_read - memory_sum)
+ # print(f"read: {memory_read}")
+ # print(f"sum : {memory_sum}")
+ if (delta > max(memory_read, memory_sum) * 0.1):
+ return 1
+ else:
+ return 0
+ else:
+ return 0
+
+def check_kmem_usage(cgroup_name, pids, delete):
+ kmem_sum = 0
+ for pid in pids:
+ kmem_usage = get_process_kmem_usage(pid, cgroup_name)
+ if delete == False:
+ assert kmem_usage is not None, f"Kmem usage not found for PID {pid}"
+ assert kmem_usage >= 0, f"Kmem usage should be greater than zero for PID {pid}"
+ kmem_sum += kmem_usage
+ else:
+ assert kmem_usage is None, f"Error: Kmem usage should not be available for PID {pid} after deleting the cgroup."
+ if delete == False:
+ with open(f"/sys/fs/cgroup/memory/{cgroup_name}/memory.kmem.usage_in_bytes", 'r') as file:
+ content = file.readline().strip()
+ kmem_read = int(content)
+ kmem_sum *= 1024
+ delta = abs(kmem_read - kmem_sum)
+ # print(f"kmem read: {kmem_read}")
+ # print(f"kmem sum : {kmem_sum}")
+ # assert delta <= max(kmem_read, kmem_sum) * 0.2, f"Kmem read by probeCgroup is not accurate, {kmem_read}, {kmem_sum}"
+
+def cleanup(processes):
+ for process in processes:
+ process.terminate()
+ process.wait()
+
+def get_oom_process_memory_usage(pid, cgroup_name):
+ # ������������������������������
+ cur_name = ''
+ oom = False
+ with open('/proc/cgroup_memory_usage_per_process', 'r') as file:
+ for line in file:
+ parts = line.strip().split()
+ if len(parts) >= 4 and parts[0] == 'cgroup':
+ cur_name = parts[3]
+ oom = False
+ if len(parts) >= 1 and parts[0] == 'oom:':
+ oom = True
+ if len(parts) >= 3 and parts[0] != 'cgroup' and cur_name == cgroup_name and parts[0] != 'pid' and int(parts[0]) == pid and oom == True:
+ return int(parts[2])
+ return None
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/mem-allocate.c b/tools/probeCgroup/testcases/mem-allocate.c
new file mode 100644
index 000000000000..e78e37cae61b
--- /dev/null
+++ b/tools/probeCgroup/testcases/mem-allocate.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mem-allocate.c - The program to test probeCgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#define MB (1024 * 1024)
+
+char *arr[40];
+
+int main(int argc, char *argv[])
+{
+ char *p;
+ int i = 0;
+
+ while (1) {
+ for (i = 0; i < 40; i++) {
+ p = (char *)malloc(MB);
+ memset(p, 0, MB);
+ arr[i] = p;
+ usleep(100000);
+ }
+ for (int i = 0; i < 40; i++) {
+ free(arr[i]);
+ usleep(100000);
+ }
+ }
+ return 0;
+}
diff --git a/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c
new file mode 100644
index 000000000000..55f4c068f55e
--- /dev/null
+++ b/tools/probeCgroup/testcases/multiple-thread-mem-allocate.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * multiple-thread-mem-allocate.c - The program to test probeCgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <pthread.h>
+
+#define MB (1024 * 1024)
+
+void *memory_test(void *)
+{
+ char *arr[25];
+ char *p;
+ int i = 0;
+ int cnt = 0;
+
+ while (1) {
+ for (i = 0; i < 20; i++) {
+ p = (char *)malloc(MB);
+ memset(p, 0, MB);
+ arr[i] = p;
+ usleep(10000);
+ }
+ for (int i = 0; i < 20; i++) {
+ free(arr[i]);
+ usleep(10000);
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ pthread_t threads[4];
+ int rc;
+
+ // create threads
+ for (int i = 0; i < 4; i++) {
+ rc = pthread_create(&threads[i], NULL, memory_test, NULL);
+ if (rc != 0) {
+ fprintf(stderr, "Error creating thread: %d\n", rc);
+ return 1;
+ }
+ }
+
+ for (int i = 0; i < 4; i++) {
+ rc = pthread_join(threads[i], NULL);
+ if (rc != 0) {
+ fprintf(stderr, "Error joining thread: %d\n", rc);
+ return 1;
+ }
+ }
+
+ return 0;
+}
diff --git a/tools/probeCgroup/testcases/run.py b/tools/probeCgroup/testcases/run.py
new file mode 100755
index 000000000000..8ffd0ca720d8
--- /dev/null
+++ b/tools/probeCgroup/testcases/run.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+
+import os
+import subprocess
+import sys
+
+def run_tests(directory):
+ """Run all Python scripts in the given directory."""
+ python_files = [f for f in os.listdir(directory) if f.endswith('test.py')]
+ python_files.sort()
+
+ for filename in python_files:
+ try:
+ filepath = os.path.join(directory, filename)
+
+ subprocess.check_call([sys.executable, filepath])
+ except subprocess.CalledProcessError as e:
+ print(f"Error executing {filename}:")
+ return
+ except Exception as e:
+ print(f"Error executing {filename}:")
+ print(e)
+ return
+
+if __name__ == '__main__':
+ tests_directory = '.'
+ subprocess.check_call(['gcc', 'mem-allocate.c', '-o', 'mem-allocate'])
+ subprocess.check_call(['gcc', 'simple-mem-allocate.c', '-o', 'simple-mem-allocate'])
+ subprocess.check_call(['gcc', 'multiple-thread-mem-allocate.c', '-o', 'multiple-thread-mem-allocate'])
+ run_tests(tests_directory)
\ No newline at end of file
diff --git a/tools/probeCgroup/testcases/simple-mem-allocate.c b/tools/probeCgroup/testcases/simple-mem-allocate.c
new file mode 100644
index 000000000000..16328b10ba48
--- /dev/null
+++ b/tools/probeCgroup/testcases/simple-mem-allocate.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mem-allocate.c - The program to test probeCgroup
+ *
+ * Copyright (C) Taoxy2004 <221870066(a)smail.nju.edu.cn>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#define MB (1024 * 1024)
+
+int main(int argc, char *argv[])
+{
+ char *p;
+ int i = 0;
+
+ while (1) {
+ p = (char *)malloc(MB);
+ memset(p, 0, MB);
+ sleep(1);
+ }
+
+ return 0;
+}
--
2.43.0
1
0

[PATCH openEuler-22.03-LTS-SP1 0/2] nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()
by Kaixiong Yu 04 Sep '24
by Kaixiong Yu 04 Sep '24
04 Sep '24
fix CVE-2023-52907
Fedor Pchelkin (1):
nfc: pn533: initialize struct pn533_out_arg properly
Minsuk Kang (1):
nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()
drivers/nfc/pn533/usb.c | 45 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 3 deletions(-)
--
2.25.1
2
3

[PATCH openEuler-1.0-LTS 0/2] nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()
by Kaixiong Yu 04 Sep '24
by Kaixiong Yu 04 Sep '24
04 Sep '24
fix CVE-2023-52907
Fedor Pchelkin (1):
nfc: pn533: initialize struct pn533_out_arg properly
Minsuk Kang (1):
nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()
drivers/nfc/pn533/usb.c | 45 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 3 deletions(-)
--
2.25.1
2
3

[PATCH openEuler-1.0-LTS] md/raid5: avoid BUG_ON() while continue reshape after reassembling
by Li Nan 04 Sep '24
by Li Nan 04 Sep '24
04 Sep '24
From: Yu Kuai <yukuai3(a)huawei.com>
stable inclusion
from stable-v4.19.320
commit 2c92f8c1c456d556f15cbf51667b385026b2e6a0
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAMNBN
CVE: CVE-2024-43914
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 305a5170dc5cf3d395bb4c4e9239bca6d0b54b49 ]
Currently, mdadm support --revert-reshape to abort the reshape while
reassembling, as the test 07revert-grow. However, following BUG_ON()
can be triggerred by the test:
kernel BUG at drivers/md/raid5.c:6278!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
irq event stamp: 158985
CPU: 6 PID: 891 Comm: md0_reshape Not tainted 6.9.0-03335-g7592a0b0049a #94
RIP: 0010:reshape_request+0x3f1/0xe60
Call Trace:
<TASK>
raid5_sync_request+0x43d/0x550
md_do_sync+0xb7a/0x2110
md_thread+0x294/0x2b0
kthread+0x147/0x1c0
ret_from_fork+0x59/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
Root cause is that --revert-reshape update the raid_disks from 5 to 4,
while reshape position is still set, and after reassembling the array,
reshape position will be read from super block, then during reshape the
checking of 'writepos' that is caculated by old reshape position will
fail.
Fix this panic the easy way first, by converting the BUG_ON() to
WARN_ON(), and stop the reshape if checkings fail.
Noted that mdadm must fix --revert-shape as well, and probably md/raid
should enhance metadata validation as well, however this means
reassemble will fail and there must be user tools to fix the wrong
metadata.
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Signed-off-by: Song Liu <song(a)kernel.org>
Link: https://lore.kernel.org/r/20240611132251.1967786-13-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Li Nan <linan122(a)huawei.com>
---
drivers/md/raid5.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 1c2e2ff162dc..b2b35cdabac5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5817,7 +5817,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
safepos = conf->reshape_safe;
sector_div(safepos, data_disks);
if (mddev->reshape_backwards) {
- BUG_ON(writepos < reshape_sectors);
+ if (WARN_ON(writepos < reshape_sectors))
+ return MaxSector;
+
writepos -= reshape_sectors;
readpos += reshape_sectors;
safepos += reshape_sectors;
@@ -5835,14 +5837,18 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
* to set 'stripe_addr' which is where we will write to.
*/
if (mddev->reshape_backwards) {
- BUG_ON(conf->reshape_progress == 0);
+ if (WARN_ON(conf->reshape_progress == 0))
+ return MaxSector;
+
stripe_addr = writepos;
- BUG_ON((mddev->dev_sectors &
- ~((sector_t)reshape_sectors - 1))
- - reshape_sectors - stripe_addr
- != sector_nr);
+ if (WARN_ON((mddev->dev_sectors &
+ ~((sector_t)reshape_sectors - 1)) -
+ reshape_sectors - stripe_addr != sector_nr))
+ return MaxSector;
} else {
- BUG_ON(writepos != sector_nr + reshape_sectors);
+ if (WARN_ON(writepos != sector_nr + reshape_sectors))
+ return MaxSector;
+
stripe_addr = sector_nr;
}
--
2.39.2
2
1

[PATCH OLK-5.10] md/raid5: avoid BUG_ON() while continue reshape after reassembling
by Li Nan 04 Sep '24
by Li Nan 04 Sep '24
04 Sep '24
From: Yu Kuai <yukuai3(a)huawei.com>
stable inclusion
from stable-v5.10.224
commit c384dd4f1fb3b14a2fd199360701cc163ea88705
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAMNBN
CVE: CVE-2024-43914
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 305a5170dc5cf3d395bb4c4e9239bca6d0b54b49 ]
Currently, mdadm support --revert-reshape to abort the reshape while
reassembling, as the test 07revert-grow. However, following BUG_ON()
can be triggerred by the test:
kernel BUG at drivers/md/raid5.c:6278!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
irq event stamp: 158985
CPU: 6 PID: 891 Comm: md0_reshape Not tainted 6.9.0-03335-g7592a0b0049a #94
RIP: 0010:reshape_request+0x3f1/0xe60
Call Trace:
<TASK>
raid5_sync_request+0x43d/0x550
md_do_sync+0xb7a/0x2110
md_thread+0x294/0x2b0
kthread+0x147/0x1c0
ret_from_fork+0x59/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
Root cause is that --revert-reshape update the raid_disks from 5 to 4,
while reshape position is still set, and after reassembling the array,
reshape position will be read from super block, then during reshape the
checking of 'writepos' that is caculated by old reshape position will
fail.
Fix this panic the easy way first, by converting the BUG_ON() to
WARN_ON(), and stop the reshape if checkings fail.
Noted that mdadm must fix --revert-shape as well, and probably md/raid
should enhance metadata validation as well, however this means
reassemble will fail and there must be user tools to fix the wrong
metadata.
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Signed-off-by: Song Liu <song(a)kernel.org>
Link: https://lore.kernel.org/r/20240611132251.1967786-13-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Li Nan <linan122(a)huawei.com>
---
drivers/md/raid5.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3cb90d7e88d9..126b9ecfe750 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5997,7 +5997,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
safepos = conf->reshape_safe;
sector_div(safepos, data_disks);
if (mddev->reshape_backwards) {
- BUG_ON(writepos < reshape_sectors);
+ if (WARN_ON(writepos < reshape_sectors))
+ return MaxSector;
+
writepos -= reshape_sectors;
readpos += reshape_sectors;
safepos += reshape_sectors;
@@ -6015,14 +6017,18 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
* to set 'stripe_addr' which is where we will write to.
*/
if (mddev->reshape_backwards) {
- BUG_ON(conf->reshape_progress == 0);
+ if (WARN_ON(conf->reshape_progress == 0))
+ return MaxSector;
+
stripe_addr = writepos;
- BUG_ON((mddev->dev_sectors &
- ~((sector_t)reshape_sectors - 1))
- - reshape_sectors - stripe_addr
- != sector_nr);
+ if (WARN_ON((mddev->dev_sectors &
+ ~((sector_t)reshape_sectors - 1)) -
+ reshape_sectors - stripe_addr != sector_nr))
+ return MaxSector;
} else {
- BUG_ON(writepos != sector_nr + reshape_sectors);
+ if (WARN_ON(writepos != sector_nr + reshape_sectors))
+ return MaxSector;
+
stripe_addr = sector_nr;
}
--
2.39.2
2
1

[PATCH openEuler-22.03-LTS-SP1] md/raid5: avoid BUG_ON() while continue reshape after reassembling
by Li Nan 04 Sep '24
by Li Nan 04 Sep '24
04 Sep '24
From: Yu Kuai <yukuai3(a)huawei.com>
stable inclusion
from stable-v5.10.224
commit c384dd4f1fb3b14a2fd199360701cc163ea88705
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAMNBN
CVE: CVE-2024-43914
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 305a5170dc5cf3d395bb4c4e9239bca6d0b54b49 ]
Currently, mdadm support --revert-reshape to abort the reshape while
reassembling, as the test 07revert-grow. However, following BUG_ON()
can be triggerred by the test:
kernel BUG at drivers/md/raid5.c:6278!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
irq event stamp: 158985
CPU: 6 PID: 891 Comm: md0_reshape Not tainted 6.9.0-03335-g7592a0b0049a #94
RIP: 0010:reshape_request+0x3f1/0xe60
Call Trace:
<TASK>
raid5_sync_request+0x43d/0x550
md_do_sync+0xb7a/0x2110
md_thread+0x294/0x2b0
kthread+0x147/0x1c0
ret_from_fork+0x59/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
Root cause is that --revert-reshape update the raid_disks from 5 to 4,
while reshape position is still set, and after reassembling the array,
reshape position will be read from super block, then during reshape the
checking of 'writepos' that is caculated by old reshape position will
fail.
Fix this panic the easy way first, by converting the BUG_ON() to
WARN_ON(), and stop the reshape if checkings fail.
Noted that mdadm must fix --revert-shape as well, and probably md/raid
should enhance metadata validation as well, however this means
reassemble will fail and there must be user tools to fix the wrong
metadata.
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Signed-off-by: Song Liu <song(a)kernel.org>
Link: https://lore.kernel.org/r/20240611132251.1967786-13-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Li Nan <linan122(a)huawei.com>
---
drivers/md/raid5.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3cb90d7e88d9..126b9ecfe750 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5997,7 +5997,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
safepos = conf->reshape_safe;
sector_div(safepos, data_disks);
if (mddev->reshape_backwards) {
- BUG_ON(writepos < reshape_sectors);
+ if (WARN_ON(writepos < reshape_sectors))
+ return MaxSector;
+
writepos -= reshape_sectors;
readpos += reshape_sectors;
safepos += reshape_sectors;
@@ -6015,14 +6017,18 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
* to set 'stripe_addr' which is where we will write to.
*/
if (mddev->reshape_backwards) {
- BUG_ON(conf->reshape_progress == 0);
+ if (WARN_ON(conf->reshape_progress == 0))
+ return MaxSector;
+
stripe_addr = writepos;
- BUG_ON((mddev->dev_sectors &
- ~((sector_t)reshape_sectors - 1))
- - reshape_sectors - stripe_addr
- != sector_nr);
+ if (WARN_ON((mddev->dev_sectors &
+ ~((sector_t)reshape_sectors - 1)) -
+ reshape_sectors - stripe_addr != sector_nr))
+ return MaxSector;
} else {
- BUG_ON(writepos != sector_nr + reshape_sectors);
+ if (WARN_ON(writepos != sector_nr + reshape_sectors))
+ return MaxSector;
+
stripe_addr = sector_nr;
}
--
2.39.2
2
1
From: Denis Arefev <arefev(a)swemel.ru>
stable inclusion
from stable-v6.6.44
commit 90d41ebe0cd4635f6410471efc1dd71b33e894cf
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAKQ33
CVE: CVE-2024-43817
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit e269d79c7d35aa3808b1f3c1737d63dab504ddc8 ]
Two missing check in virtio_net_hdr_to_skb() allowed syzbot
to crash kernels again
1. After the skb_segment function the buffer may become non-linear
(nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere
the __skb_linearize function will not be executed, then the buffer will
remain non-linear. Then the condition (offset >= skb_headlen(skb))
becomes true, which causes WARN_ON_ONCE in skb_checksum_help.
2. The struct sk_buff and struct virtio_net_hdr members must be
mathematically related.
(gso_size) must be greater than (needed) otherwise WARN_ON_ONCE.
(remainder) must be greater than (needed) otherwise WARN_ON_ONCE.
(remainder) may be 0 if division is without remainder.
offset+2 (4191) > skb_headlen() (1116)
WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
Modules linked in:
CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef
RSP: 0018:ffffc90003a9f338 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff888025125780 RCX: ffffffff814db209
RDX: ffff888015393b80 RSI: ffffffff814db216 RDI: 0000000000000001
RBP: ffff8880251257f4 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000045c
R13: 000000000000105f R14: ffff8880251257f0 R15: 000000000000105d
FS: 0000555555c24380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000f000 CR3: 0000000023151000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777
ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584
ip_finish_output_gso net/ipv4/ip_output.c:286 [inline]
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295
ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433
dst_output include/net/dst.h:451 [inline]
ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129
iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline]
sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076
__netdev_start_xmit include/linux/netdevice.h:4940 [inline]
netdev_start_xmit include/linux/netdevice.h:4954 [inline]
xmit_one net/core/dev.c:3545 [inline]
dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561
__dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
packet_xmit+0x257/0x380 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Found by Linux Verification Center (linuxtesting.org) with Syzkaller
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
Message-Id: <20240613095448.27118-1-arefev(a)swemel.ru>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
---
include/linux/virtio_net.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 6c395a2..c824c52 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
unsigned int thlen = 0;
unsigned int p_off = 0;
unsigned int ip_proto;
+ u64 ret, remainder, gso_size;
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -98,6 +99,16 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
+ if (hdr->gso_size) {
+ gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
+ ret = div64_u64_rem(skb->len, gso_size, &remainder);
+ if (!(ret && (hdr->gso_size > needed) &&
+ ((remainder > needed) || (remainder == 0)))) {
+ return -EINVAL;
+ }
+ skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
+ }
+
if (!pskb_may_pull(skb, needed))
return -EINVAL;
--
2.9.5
2
1