Fix 'cgroup/cpuset: Prevent UAF in proc_cpuset_show()' issue with commmunity patches
Chen Ridong (3): Revert "cgroup: fix uaf when proc_cpuset_show" cgroup/cpuset: Prevent UAF in proc_cpuset_show() cgroup: add cgroup_root_ext to keep kabi
Yafang Shao (1): cgroup: Make operations on the cgroup root_list RCU safe
include/linux/cgroup-defs.h | 9 +++++++++ kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup-v1.c | 6 ++++-- kernel/cgroup/cgroup.c | 17 ++++++++++------- kernel/cgroup/cpuset.c | 33 +++++++-------------------------- 5 files changed, 32 insertions(+), 36 deletions(-)
Offering: HULK hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA9YQ9
--------------------------------
To keep the same with the mainline and backport the lts patch. This reverts commit 46daffa40e98c1c7b1221f19ddb701a05b348862.
Fixes: 46daffa40e98 ("[Huawei] cgroup: fix uaf when proc_cpuset_show") Signed-off-by: Chen Ridong chenridong@huawei.com --- kernel/cgroup/cpuset.c | 24 ------------------------ 1 file changed, 24 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ba8332982d51..5e8bdb459315 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3661,7 +3661,6 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, char *buf; struct cgroup_subsys_state *css; int retval; - struct cgroup *root_cgroup = NULL;
retval = -ENOMEM; buf = kmalloc(PATH_MAX, GFP_KERNEL); @@ -3669,32 +3668,9 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, goto out;
css = task_get_css(tsk, cpuset_cgrp_id); - rcu_read_lock(); - /* - * When the cpuset subsystem is mounted on the legacy hierarchy, - * the top_cpuset.css->cgroup does not hold a reference count of - * cgroup_root.cgroup. This makes accessing css->cgroup very - * dangerous because when the cpuset subsystem is remounted to the - * default hierarchy, the cgroup_root.cgroup that css->cgroup points - * to will be released, leading to a UAF issue. To avoid this problem, - * get the reference count of top_cpuset.css->cgroup first. - * - * This is ugly!! - */ - if (css == &top_cpuset.css) { - root_cgroup = css->cgroup; - if (!css_tryget_online(&root_cgroup->self)) { - rcu_read_unlock(); - retval = -EBUSY; - goto out_free; - } - } - rcu_read_unlock(); retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX, current->nsproxy->cgroup_ns); css_put(css); - if (root_cgroup) - css_put(&root_cgroup->self); if (retval >= PATH_MAX) retval = -ENAMETOOLONG; if (retval < 0)
From: Yafang Shao laoar.shao@gmail.com
mainline inclusion from mainline-v6.8-rc1 commit d23b5c577715892c87533b13923306acc6243f93 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
----------------------------------------------------------------------
[ Upstream commit d23b5c577715892c87533b13923306acc6243f93 ]
At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial.
Signed-off-by: Yafang Shao laoar.shao@gmail.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Chen Ridong chenridong@huawei.com
[Backport] cgroup: Move rcu_head up near the top of cgroup_root
mainline inclusion from mainline-v6.8-rc1 commit a7fb0423c201ba12815877a0b5a68a6a1710b23a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
----------------------------------------------------------------------
commit a7fb0423c201ba12815877a0b5a68a6a1710b23a upstream.
Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu() for freeing the cgroup_root.
The current implementation of kvfree_rcu(), however, has the limitation that the offset of the rcu_head structure within the larger data structure must be less than 4096 or the compilation will fail. See the macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h for more information.
By putting rcu_head below the large cgroup structure, any change to the cgroup structure that makes it larger run the risk of causing build failure under certain configurations. Commit 77070eeb8821 ("cgroup: Avoid false cacheline sharing of read mostly rstat_cpu") happens to be the last straw that breaks it. Fix this problem by moving the rcu_head structure up before the cgroup structure.
Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/ Signed-off-by: Waiman Long longman@redhat.com Acked-by: Yafang Shao laoar.shao@gmail.com Reviewed-by: Yosry Ahmed yosryahmed@google.com Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: include/linux/cgroup-defs.h kernel/cgroup/cgroup.c [Context is mismatched for wait_queue_head_t wait was merged. cgroup.c is mismatched for some comment.] Signed-off-by: Chen Ridong chenridong@huawei.com --- include/linux/cgroup-defs.h | 7 ++++--- kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 14 +++++++------- 3 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 47263cecb12f..2a3eed623aaa 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -526,6 +526,10 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id;
+ /* A list running through the active hierarchies */ + struct list_head root_list; + struct rcu_head rcu; /* Must be near the top */ + /* The root cgroup. Root is destroyed on its release. */ struct cgroup cgrp;
@@ -538,9 +542,6 @@ struct cgroup_root { /* Wait while cgroups are being destroyed */ wait_queue_head_t wait;
- /* A list running through the active hierarchies */ - struct list_head root_list; - /* Hierarchy-specific flags */ unsigned int flags;
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 241a2d8c16e7..5e23b75f226f 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -173,7 +173,8 @@ extern struct list_head cgroup_roots;
/* iterate across the hierarchies */ #define for_each_root(root) \ - list_for_each_entry((root), &cgroup_roots, root_list) + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ + lockdep_is_held(&cgroup_mutex))
/** * for_each_subsys - iterate all enabled cgroup subsystems diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1cd8deb876db..842b7bf6ce21 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1303,7 +1303,7 @@ static void cgroup_exit_root_id(struct cgroup_root *root)
void cgroup_free_root(struct cgroup_root *root) { - kfree(root); + kfree_rcu(root, rcu); }
static void cgroup_destroy_root(struct cgroup_root *root) @@ -1336,7 +1336,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) spin_unlock_irq(&css_set_lock);
if (!list_empty(&root->root_list)) { - list_del(&root->root_list); + list_del_rcu(&root->root_list); cgroup_root_count--; }
@@ -1382,7 +1382,6 @@ current_cgns_cgroup_from_root(struct cgroup_root *root) } rcu_read_unlock();
- BUG_ON(!res); return res; }
@@ -1392,7 +1391,6 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, { struct cgroup *res = NULL;
- lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock);
if (cset == &init_css_set) { @@ -1418,7 +1416,9 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
/* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held to prevent task's groups from being modified. + * Must be called with either cgroup_mutex or rcu read lock to prevent the + * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -1957,7 +1957,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) struct cgroup_root *root = ctx->root; struct cgroup *cgrp = &root->cgrp;
- INIT_LIST_HEAD(&root->root_list); + INIT_LIST_HEAD_RCU(&root->root_list); atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); @@ -2040,7 +2040,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ - list_add(&root->root_list, &cgroup_roots); + list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++;
/*
mainline inclusion from mainline-v6.11-rc1 commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
----------------------------------------------------------------------
commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a upstream.
An UAF can happen when /proc/cpuset is read as reported in [1].
This can be reproduced by the following methods: 1.add an mdelay(1000) before acquiring the cgroup_lock In the cgroup_path_ns function. 2.$cat /proc/<pid>/cpuset repeatly. 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/ $umount /sys/fs/cgroup/cpuset/ repeatly.
The race that cause this bug can be shown as below:
(umount) | (cat /proc/<pid>/cpuset) css_release | proc_cpuset_show css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id); css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...); cgroup_destroy_root | mutex_lock(&cgroup_mutex); rebind_subsystems | cgroup_free_root | | // cgrp was freed, UAF | cgroup_path_ns_locked(cgrp,..);
When the cpuset is initialized, the root node top_cpuset.css.cgrp will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated &cgroup_root.cgrp. When the umount operation is executed, top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
The problem is that when rebinding to cgrp_dfl_root, there are cases where the cgroup_root allocated by setting up the root for cgroup v1 is cached. This could lead to a Use-After-Free (UAF) if it is subsequently freed. The descendant cgroups of cgroup v1 can only be freed after the css is released. However, the css of the root will never be released, yet the cgroup_root should be freed when it is unmounted. This means that obtaining a reference to the css of the root does not guarantee that css.cgrp->root will not be freed.
Fix this problem by using rcu_read_lock in proc_cpuset_show(). As cgroup_root is kfree_rcu after commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe"), css->cgroup won't be freed during the critical section. To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to replace task_get_css with task_css.
[1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Chen Ridong chenridong@huawei.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Shivani Agarwal shivani.agarwal@broadcom.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: kernel/cgroup/cpuset.c [cgroup_path_ns_locked return error changed, mismatch context.] Signed-off-by: Chen Ridong chenridong@huawei.com --- kernel/cgroup/cpuset.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 5e8bdb459315..3467abdd1d72 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -22,6 +22,7 @@ * distribution for more details. */
+#include "cgroup-internal.h" #include <linux/cpu.h> #include <linux/cpumask.h> #include <linux/cpuset.h> @@ -3667,10 +3668,14 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, if (!buf) goto out;
- css = task_get_css(tsk, cpuset_cgrp_id); - retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX, - current->nsproxy->cgroup_ns); - css_put(css); + rcu_read_lock(); + spin_lock_irq(&css_set_lock); + css = task_css(tsk, cpuset_cgrp_id); + retval = cgroup_path_ns_locked(css->cgroup, buf, PATH_MAX, + current->nsproxy->cgroup_ns); + spin_unlock_irq(&css_set_lock); + rcu_read_unlock(); + if (retval >= PATH_MAX) retval = -ENAMETOOLONG; if (retval < 0)
Offering: HULK hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV
--------------------------------
Commit 1e8c422d6363 ("[Backport] cgroup: Make operations on the cgroup root_list RCU safe") added rcu member to struct cgroup_root, which broke KABI. To keep KABI, add cgroup_root_ext.
Fixes: 1e8c422d6363 ("[Backport] cgroup: Make operations on the cgroup root_list RCU safe") Signed-off-by: Chen Ridong chenridong@huawei.com --- include/linux/cgroup-defs.h | 16 ++++++++++++---- kernel/cgroup/cgroup-v1.c | 6 ++++-- kernel/cgroup/cgroup.c | 5 ++++- 3 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 2a3eed623aaa..b3ab93dc5f35 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -526,10 +526,6 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id;
- /* A list running through the active hierarchies */ - struct list_head root_list; - struct rcu_head rcu; /* Must be near the top */ - /* The root cgroup. Root is destroyed on its release. */ struct cgroup cgrp;
@@ -542,6 +538,9 @@ struct cgroup_root { /* Wait while cgroups are being destroyed */ wait_queue_head_t wait;
+ /* A list running through the active hierarchies */ + struct list_head root_list; + /* Hierarchy-specific flags */ unsigned int flags;
@@ -557,6 +556,15 @@ struct cgroup_root { KABI_RESERVE(4) };
+/* + * To keep kabi uncharged, add cgroup_root_ext, add rcu_head to make operations + * on the cgroup root_list RCU safe + */ +struct cgroup_root_ext { + struct rcu_head rcu; /* Must be near the top */ + struct cgroup_root root; +}; + /* * struct cftype: handler definitions for cgroup control files * diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 647d0891cff6..248e5e0fbe4f 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1145,6 +1145,7 @@ static int cgroup1_root_to_use(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); struct cgroup_root *root; + struct cgroup_root_ext *root_ext; struct cgroup_subsys *ss; int i, ret;
@@ -1217,10 +1218,11 @@ static int cgroup1_root_to_use(struct fs_context *fc) if (ctx->ns != &init_cgroup_ns) return -EPERM;
- root = kzalloc(sizeof(*root), GFP_KERNEL); - if (!root) + root_ext = kzalloc(sizeof(struct cgroup_root_ext), GFP_KERNEL); + if (!root_ext) return -ENOMEM;
+ root = &root_ext->root; ctx->root = root; init_cgroup_root(ctx);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 842b7bf6ce21..ce4ad748d2c9 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1303,7 +1303,10 @@ static void cgroup_exit_root_id(struct cgroup_root *root)
void cgroup_free_root(struct cgroup_root *root) { - kfree_rcu(root, rcu); + struct cgroup_root_ext *root_ext; + + root_ext = container_of(root, struct cgroup_root_ext, root); + kfree_rcu(root_ext, rcu); }
static void cgroup_destroy_root(struct cgroup_root *root)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/13136 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/3...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/13136 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/3...