*** BLURB HERE ***
Chen Ridong (1): cgroup: fix uaf when proc_cpuset_show
kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA9YQ9
--------------------------------
We found a refcount UAF bug as follows:
BUG: KASAN: use-after-free in cgroup_path_ns+0x112/0x150 Read of size 8 at addr ffff8882a4b242b8 by task atop/19903
CPU: 27 PID: 19903 Comm: atop Kdump: loaded Tainted: GF Call Trace: dump_stack+0x7d/0xa7 print_address_description.constprop.0+0x19/0x170 ? cgroup_path_ns+0x112/0x150 __kasan_report.cold+0x6c/0x84 ? print_unreferenced+0x390/0x3b0 ? cgroup_path_ns+0x112/0x150 kasan_report+0x3a/0x50 cgroup_path_ns+0x112/0x150 proc_cpuset_show+0x164/0x530 proc_single_show+0x10f/0x1c0 seq_read_iter+0x405/0x1020 ? aa_path_link+0x2e0/0x2e0 seq_read+0x324/0x500 ? seq_read_iter+0x1020/0x1020 ? common_file_perm+0x2a1/0x4a0 ? fsnotify_unmount_inodes+0x380/0x380 ? bpf_lsm_file_permission_wrapper+0xa/0x30 ? security_file_permission+0x53/0x460 vfs_read+0x122/0x420 ksys_read+0xed/0x1c0 ? __ia32_sys_pwrite64+0x1e0/0x1e0 ? __audit_syscall_exit+0x741/0xa70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xcc
This is also reported by: https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
This can be reproduced by the following methods: 1.add an mdelay(1000) before acquiring the cgroup_lock In the cgroup_path_ns function. 2.$cat /proc/<pid>/cpuset repeatly. 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/ $umount /sys/fs/cgroup/cpuset/ repeatly.
The race that cause this bug can be shown as below:
(umount) | (cat /proc/<pid>/cpuset) css_release | proc_cpuset_show css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id); css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...); cgroup_destroy_root | mutex_lock(&cgroup_mutex); rebind_subsystems | cgroup_free_root | | // cgrp was freed, UAF | cgroup_path_ns_locked(cgrp,..);
When the cpuset is initialized, the root node top_cpuset.css.cgrp will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated &cgroup_root.cgrp. When the umount operation is executed, top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
The problem is that when rebinding to cgrp_dfl_root, there are cases where the cgroup_root allocated by setting up the root for cgroup v1 is cached. This could lead to a Use-After-Free (UAF) if it is subsequently freed. The descendant cgroups of cgroup v1 can only be freed after the css is released. However, the css of the root will never be released, yet the cgroup_root should be freed when it is unmounted. This means that obtaining a reference to the css of the root does not guarantee that css.cgrp->root will not be freed.
To solve this issue, we have added a cgroup reference count in the proc_cpuset_show function to ensure that css.cgrp->root will not be freed prematurely. This is a temporary solution. Let's see if anyone has a better solution.
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Chen Ridong chenridong@huawei.com --- kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 7d4805b27dae..7ad3094360d5 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -5089,6 +5089,7 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, char *buf; struct cgroup_subsys_state *css; int retval; + struct cgroup *root_cgroup = NULL;
retval = -ENOMEM; buf = kmalloc(PATH_MAX, GFP_KERNEL); @@ -5096,9 +5097,32 @@ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, goto out;
css = task_get_css(tsk, cpuset_cgrp_id); + rcu_read_lock(); + /* + * When the cpuset subsystem is mounted on the legacy hierarchy, + * the top_cpuset.css->cgroup does not hold a reference count of + * cgroup_root.cgroup. This makes accessing css->cgroup very + * dangerous because when the cpuset subsystem is remounted to the + * default hierarchy, the cgroup_root.cgroup that css->cgroup points + * to will be released, leading to a UAF issue. To avoid this problem, + * get the reference count of top_cpuset.css->cgroup first. + * + * This is ugly!! + */ + if (css == &top_cpuset.css) { + root_cgroup = css->cgroup; + if (!css_tryget_online(&root_cgroup->self)) { + rcu_read_unlock(); + retval = -EBUSY; + goto out_free; + } + } + rcu_read_unlock(); retval = cgroup_path_ns(css->cgroup, buf, PATH_MAX, current->nsproxy->cgroup_ns); css_put(css); + if (root_cgroup) + css_put(&root_cgroup->self); if (retval >= PATH_MAX) retval = -ENAMETOOLONG; if (retval < 0)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/9709 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/V...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/9709 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/V...