[PATCH OLK-5.10 0/4] Fix 'cgroup/cpuset: Prevent UAF in proc_cpuset_show()' issue with commmunity patches

Chen Ridong

11 Nov 2024 11 Nov '24

9:30 p.m.

Fix 'cgroup/cpuset: Prevent UAF in proc_cpuset_show()' issue with commmunity patches Chen Ridong (3): Revert "cgroup: fix uaf when proc_cpuset_show" cgroup/cpuset: Prevent UAF in proc_cpuset_show() cgroup: add cgroup_root_ext to keep kabi Yafang Shao (1): cgroup: Make operations on the cgroup root_list RCU safe include/linux/cgroup-defs.h | 9 +++++++++ kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup-v1.c | 6 ++++-- kernel/cgroup/cgroup.c | 17 ++++++++++------- kernel/cgroup/cpuset.c | 33 +++++++-------------------------- 5 files changed, 32 insertions(+), 36 deletions(-) -- 2.34.1

Show replies by date

Chen Ridong

11 Nov 11 Nov

9:30 p.m.

New subject: [PATCH OLK-5.10 1/4] Revert "cgroup: fix uaf when proc_cpuset_show"

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV -------------------------------- To keep the same with the mainline and backport the lts patch. This reverts commit d08e914af3b1c29bae1677c038b2669a3e3ebb40. Fixes: d08e914af3b1 ("cgroup: fix uaf when proc_cpuset_show") Signed-off-by: Chen Ridong <chenridong@huawei.com> --- kernel/cgroup/cpuset.c | 24 ------------------------ 1 file changed, 24 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index f127b7569c36..038efca71f28 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3887,7 +3887,6 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, char *buf; struct cgroup_subsys_state *css; int retval; - struct cgroup *root_cgroup = NULL; retval = -ENOMEM; buf = kmalloc(PATH_MAX, GFP_KERNEL); @@ -3895,32 +3894,9 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, goto out; css = task_get_css(tsk, cpuset_cgrp_id); - rcu_read_lock(); - /* - * When the cpuset subsystem is mounted on the legacy hierarchy, - * the top_cpuset.css->cgroup does not hold a reference count of - * cgroup_root.cgroup. This makes accessing css->cgroup very - * dangerous because when the cpuset subsystem is remounted to the - * default hierarchy, the cgroup_root.cgroup that css->cgroup points - * to will be released, leading to a UAF issue. To avoid this problem, - * get the reference count of top_cpuset.css->cgroup first. - * - * This is ugly!! - */ - if (css == &top_cpuset.css) { - root_cgroup = css->cgroup; - if (!css_tryget_online(&root_cgroup->self)) { - rcu_read_unlock(); - retval = -EBUSY; - goto out_free; - } - } - rcu_read_unlock(); retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX, current->nsproxy->cgroup_ns); css_put(css); - if (root_cgroup) - css_put(&root_cgroup->self); if (retval >= PATH_MAX) retval = -ENAMETOOLONG; if (retval < 0) -- 2.34.1

Chen Ridong

9:30 p.m.

New subject: [PATCH OLK-5.10 2/4] cgroup: Make operations on the cgroup root_list RCU safe

From: Yafang Shao <laoar.shao@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit d23b5c577715892c87533b13923306acc6243f93 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... ---------------------------------------------------------------------- [ Upstream commit d23b5c577715892c87533b13923306acc6243f93 ] At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Ridong <chenridong@huawei.com> [Backport] cgroup: Move rcu_head up near the top of cgroup_root mainline inclusion from mainline-v6.8-rc1 commit a7fb0423c201ba12815877a0b5a68a6a1710b23a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... ---------------------------------------------------------------------- commit a7fb0423c201ba12815877a0b5a68a6a1710b23a upstream. Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu() for freeing the cgroup_root. The current implementation of kvfree_rcu(), however, has the limitation that the offset of the rcu_head structure within the larger data structure must be less than 4096 or the compilation will fail. See the macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h for more information. By putting rcu_head below the large cgroup structure, any change to the cgroup structure that makes it larger run the risk of causing build failure under certain configurations. Commit 77070eeb8821 ("cgroup: Avoid false cacheline sharing of read mostly rstat_cpu") happens to be the last straw that breaks it. Fix this problem by moving the rcu_head structure up before the cgroup structure. Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/ Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: include/linux/cgroup-defs.h kernel/cgroup/cgroup.c [Context is mismatched for wait_queue_head_t wait was merged. cgroup.c is mismatched for some comment.] Signed-off-by: Chen Ridong <chenridong@huawei.com> --- include/linux/cgroup-defs.h | 7 ++++--- kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 14 +++++++------- 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 09f2d58d119b..196d801d74b6 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -526,6 +526,10 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id; + /* A list running through the active hierarchies */ + struct list_head root_list; + struct rcu_head rcu; /* Must be near the top */ + /* The root cgroup. Root is destroyed on its release. */ struct cgroup cgrp; @@ -538,9 +542,6 @@ struct cgroup_root { /* Wait while cgroups are being destroyed */ wait_queue_head_t wait; - /* A list running through the active hierarchies */ - struct list_head root_list; - /* Hierarchy-specific flags */ unsigned int flags; diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 096cee0e111a..aabc2a89d6b5 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -173,7 +173,8 @@ extern struct list_head cgroup_roots; /* iterate across the hierarchies */ #define for_each_root(root) \ - list_for_each_entry((root), &cgroup_roots, root_list) + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ + lockdep_is_held(&cgroup_mutex)) /** * for_each_subsys - iterate all enabled cgroup subsystems diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 69b6bbaf28a3..48997f20636c 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1310,7 +1310,7 @@ static void cgroup_exit_root_id(struct cgroup_root *root) void cgroup_free_root(struct cgroup_root *root) { - kfree(root); + kfree_rcu(root, rcu); } static void cgroup_destroy_root(struct cgroup_root *root) @@ -1343,7 +1343,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) spin_unlock_irq(&css_set_lock); if (!list_empty(&root->root_list)) { - list_del(&root->root_list); + list_del_rcu(&root->root_list); cgroup_root_count--; } @@ -1389,7 +1389,6 @@ current_cgns_cgroup_from_root(struct cgroup_root *root) } rcu_read_unlock(); - BUG_ON(!res); return res; } @@ -1399,7 +1398,6 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, { struct cgroup *res = NULL; - lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock); if (cset == &init_css_set) { @@ -1425,7 +1423,9 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, /* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held to prevent task's groups from being modified. + * Must be called with either cgroup_mutex or rcu read lock to prevent the + * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -1964,7 +1964,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) struct cgroup_root *root = ctx->root; struct cgroup *cgrp = &root->cgrp; - INIT_LIST_HEAD(&root->root_list); + INIT_LIST_HEAD_RCU(&root->root_list); atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); @@ -2047,7 +2047,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ - list_add(&root->root_list, &cgroup_roots); + list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++; /* -- 2.34.1

Chen Ridong

9:30 p.m.

New subject: [PATCH OLK-5.10 3/4] cgroup/cpuset: Prevent UAF in proc_cpuset_show()

mainline inclusion from mainline-v6.11-rc1 commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... ---------------------------------------------------------------------- commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a upstream. An UAF can happen when /proc/cpuset is read as reported in [1]. This can be reproduced by the following methods: 1.add an mdelay(1000) before acquiring the cgroup_lock In the cgroup_path_ns function. 2.$cat /proc/<pid>/cpuset repeatly. 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/ $umount /sys/fs/cgroup/cpuset/ repeatly. The race that cause this bug can be shown as below: (umount) | (cat /proc/<pid>/cpuset) css_release | proc_cpuset_show css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id); css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...); cgroup_destroy_root | mutex_lock(&cgroup_mutex); rebind_subsystems | cgroup_free_root | | // cgrp was freed, UAF | cgroup_path_ns_locked(cgrp,..); When the cpuset is initialized, the root node top_cpuset.css.cgrp will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated &cgroup_root.cgrp. When the umount operation is executed, top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp. The problem is that when rebinding to cgrp_dfl_root, there are cases where the cgroup_root allocated by setting up the root for cgroup v1 is cached. This could lead to a Use-After-Free (UAF) if it is subsequently freed. The descendant cgroups of cgroup v1 can only be freed after the css is released. However, the css of the root will never be released, yet the cgroup_root should be freed when it is unmounted. This means that obtaining a reference to the css of the root does not guarantee that css.cgrp->root will not be freed. Fix this problem by using rcu_read_lock in proc_cpuset_show(). As cgroup_root is kfree_rcu after commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe"), css->cgroup won't be freed during the critical section. To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to replace task_get_css with task_css. [1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: kernel/cgroup/cpuset.c [cgroup_path_ns_locked return error changed, mismatch context.] Signed-off-by: Chen Ridong <chenridong@huawei.com> --- kernel/cgroup/cpuset.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 038efca71f28..7ecff06d2026 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -22,6 +22,7 @@ * distribution for more details. */ +#include "cgroup-internal.h" #include <linux/cpu.h> #include <linux/cpumask.h> #include <linux/cpuset.h> @@ -3893,10 +3894,14 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, if (!buf) goto out; - css = task_get_css(tsk, cpuset_cgrp_id); - retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX, - current->nsproxy->cgroup_ns); - css_put(css); + rcu_read_lock(); + spin_lock_irq(&css_set_lock); + css = task_css(tsk, cpuset_cgrp_id); + retval = cgroup_path_ns_locked(css->cgroup, buf, PATH_MAX, + current->nsproxy->cgroup_ns); + spin_unlock_irq(&css_set_lock); + rcu_read_unlock(); + if (retval >= PATH_MAX) retval = -ENAMETOOLONG; if (retval < 0) -- 2.34.1

Chen Ridong

9:30 p.m.

New subject: [PATCH OLK-5.10 4/4] cgroup: add cgroup_root_ext to keep kabi

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA7HMV -------------------------------- Commit 02ebfdc9f2bd ("[Backport] cgroup: Make operations on the cgroup root_list RCU safe") added rcu member to struct cgroup_root, which broke KABI. To keep KABI, add cgroup_root_ext. Fixes: 02ebfdc9f2bd ("[Backport] cgroup: Make operations on the cgroup root_list RCU safe") Signed-off-by: Chen Ridong <chenridong@huawei.com> --- include/linux/cgroup-defs.h | 16 ++++++++++++---- kernel/cgroup/cgroup-v1.c | 6 ++++-- kernel/cgroup/cgroup.c | 5 ++++- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 196d801d74b6..36103ca580dc 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -526,10 +526,6 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id; - /* A list running through the active hierarchies */ - struct list_head root_list; - struct rcu_head rcu; /* Must be near the top */ - /* The root cgroup. Root is destroyed on its release. */ struct cgroup cgrp; @@ -542,6 +538,9 @@ struct cgroup_root { /* Wait while cgroups are being destroyed */ wait_queue_head_t wait; + /* A list running through the active hierarchies */ + struct list_head root_list; + /* Hierarchy-specific flags */ unsigned int flags; @@ -557,6 +556,15 @@ struct cgroup_root { KABI_RESERVE(4) }; +/* + * To keep kabi uncharged, add cgroup_root_ext, add rcu_head to make operations + * on the cgroup root_list RCU safe + */ +struct cgroup_root_ext { + struct rcu_head rcu; /* Must be near the top */ + struct cgroup_root root; +}; + /* * struct cftype: handler definitions for cgroup control files * diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 879813117556..c8500c3a9340 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1143,6 +1143,7 @@ static int cgroup1_root_to_use(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); struct cgroup_root *root; + struct cgroup_root_ext *root_ext; struct cgroup_subsys *ss; int i, ret; @@ -1215,10 +1216,11 @@ static int cgroup1_root_to_use(struct fs_context *fc) if (ctx->ns != &init_cgroup_ns) return -EPERM; - root = kzalloc(sizeof(*root), GFP_KERNEL); - if (!root) + root_ext = kzalloc(sizeof(struct cgroup_root_ext), GFP_KERNEL); + if (!root_ext) return -ENOMEM; + root = &root_ext->root; ctx->root = root; init_cgroup_root(ctx); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 48997f20636c..34647f8d6778 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1310,7 +1310,10 @@ static void cgroup_exit_root_id(struct cgroup_root *root) void cgroup_free_root(struct cgroup_root *root) { - kfree_rcu(root, rcu); + struct cgroup_root_ext *root_ext; + + root_ext = container_of(root, struct cgroup_root_ext, root); + kfree_rcu(root_ext, rcu); } static void cgroup_destroy_root(struct cgroup_root *root) -- 2.34.1

patchwork bot

9:45 p.m.

反馈：您发送到kernel@openeuler.org的补丁/补丁集，已成功转换为PR！ PR链接地址： https://gitee.com/openeuler/kernel/pulls/13164 邮件列表地址：https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/13164 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X...

460

Age (days ago)

460

Last active (days ago)

List overview

5 comments

2 participants

participants (2)

Chen Ridong
patchwork bot