Fix NULL pointer dereference Liu Kai (5): xSched/cgroup: remove legacy cftypes due to cgroup v1 incompatibility xSched: fix incorrect error returns in vstream allocation xSched: prevent HBM allocation for non-AI tasks xSched/dmem: memory allocation size must larger than 0 xSched: fix NULL pointer dereference in cfs_rq during class switch include/linux/xsched.h | 7 ++++++- kernel/xsched/cgroup.c | 2 -- kernel/xsched/vstream.c | 27 +++++++++------------------ 3 files changed, 15 insertions(+), 21 deletions(-) -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- The xcu subsystem does not support cgroup v1, so the legacy cftypes are no longer applicable. Remove them to prevent potential conflicts and align with the cgroup v2-only design. Fixes: 43bbefc53356 ("xsched: Add XCU control group implementation and its backend in xsched CFS") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/cgroup.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/xsched/cgroup.c b/kernel/xsched/cgroup.c index 0127bf4fa270..8d22aa78ea81 100644 --- a/kernel/xsched/cgroup.c +++ b/kernel/xsched/cgroup.c @@ -930,7 +930,5 @@ struct cgroup_subsys xcu_cgrp_subsys = { #ifdef CONFIG_CGROUP_FREEZER .legacy_cftypes = files, .legacy_name = "freezer", -#else - .legacy_cftypes = xcu_cg_files, #endif }; -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- Replace hardcoded -EINVAL returns with the actual error codes generated during execution. This ensures that callers receive precise failure reasons, such as distinguishing between an invalid argument and a missing device (-ENODEV) in vstream_hbm_alloc, or preserving the original failure reason from xsched_xse_init(). Fixes: 8dde1f2e6bf6 ("xsched: Introduce vstream management") Fixes: dd2bb45851e5 ("xSched/dmem: introduce xsched_dmem_alloc()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/vstream.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/xsched/vstream.c b/kernel/xsched/vstream.c index bf9c047dbc79..6ece03d3ff22 100644 --- a/kernel/xsched/vstream.c +++ b/kernel/xsched/vstream.c @@ -245,7 +245,7 @@ static int alloc_ctx_from_vstream(struct vstream_info *vstream_info, XSCHED_ERR("Fail to initialize XSE for context @ %s\n", __func__); kfree(*ctx); - return -EINVAL; + return ret; } list_add(&(*ctx)->ctx_node, &xcu->ctx_list); @@ -415,10 +415,8 @@ static int sqcq_alloc(struct vstream_args *arg) vstream->task_type = arg->task_type; ret = vstream_bind_to_xcu(vstream); - if (ret < 0) { - ret = -EINVAL; + if (ret != 0) goto out_err_vstream_free; - } /* Allocates vstream's SQ and CQ memory on a XCU for processing. */ params.group = vstream->xcu->group; @@ -645,7 +643,7 @@ static int vstream_hbm_alloc(struct vstream_args *arg) xcu_found = xcu_find(XCU_TYPE_XPU, arg->dev_id, arg->channel_id); if (!xcu_found) - return -EINVAL; + return -ENODEV; /* it will either allocate or find a context */ mutex_lock(&xcu_found->ctx_list_lock); -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- In the vstream_hbm_alloc() path, the previous implementation would attempt to automatically allocate a new context if one was not found for the current task. However, HBM (High Bandwidth Memory) allocation should only be permitted for existing AI tasks that already have a valid context. This patch simplifies the control flow by removing the automatic context creation logic. Now, if a valid context is not found for the current task, the function will directly return -ENOENT, effectively preventing invalid HBM region allocations. Fixes: dd2bb45851e5 ("xSched/dmem: introduce xsched_dmem_alloc()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/vstream.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/kernel/xsched/vstream.c b/kernel/xsched/vstream.c index 6ece03d3ff22..1a30c205b366 100644 --- a/kernel/xsched/vstream.c +++ b/kernel/xsched/vstream.c @@ -633,7 +633,6 @@ int vstream_kick(struct vstream_args *arg) #ifdef CONFIG_CGROUP_DMEM static int vstream_hbm_alloc(struct vstream_args *arg) { - vstream_info_t vstream_info; struct xsched_cu *xcu_found; struct xsched_context *ctx; int ret = 0; @@ -648,22 +647,13 @@ static int vstream_hbm_alloc(struct vstream_args *arg) /* it will either allocate or find a context */ mutex_lock(&xcu_found->ctx_list_lock); ctx = ctx_find_by_tgid_and_xcu(current->tgid, xcu_found); - if (ctx) { + if (ctx) kref_get(&ctx->kref); - } else { - vstream_info.tgid = current->tgid; - vstream_info.xcu = xcu_found; - vstream_info.dev_id = arg->dev_id; - vstream_info.channel_id = arg->channel_id; - vstream_info.fd = arg->fd; - - ret = alloc_ctx_from_vstream(&vstream_info, &ctx); - } mutex_unlock(&xcu_found->ctx_list_lock); - if (ret != 0) { + if (!ctx) { XSCHED_ERR("Failed to find a context for HBM alloc"); - return ret; + return -ENOENT; } ret = xsched_dmem_alloc(ctx, arg); -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- Add a sanity check to ensure the requested memory allocation size is greater than zero in vstream_hbm_alloc(). Attempting to allocate a zero-sized HBM region is invalid and should be rejected early. This patch returns -EINVAL if arg->vm_args.size is zero, preventing potential issues in the subsequent memory allocation logic. Fixes: dd2bb45851e5 ("xSched/dmem: introduce xsched_dmem_alloc()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/vstream.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/xsched/vstream.c b/kernel/xsched/vstream.c index 1a30c205b366..e8173a7e10e2 100644 --- a/kernel/xsched/vstream.c +++ b/kernel/xsched/vstream.c @@ -640,6 +640,9 @@ static int vstream_hbm_alloc(struct vstream_args *arg) if (!dmem_cgroup_enabled()) return -EPERM; + if (arg->vm_args.size == 0) + return -EINVAL; + xcu_found = xcu_find(XCU_TYPE_XPU, arg->dev_id, arg->channel_id); if (!xcu_found) return -ENODEV; -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- A race condition exists when switching the scheduling class of an xcu group while tasks are being enqueued. The issue occurs due to the following interleaving: mkdir -p group -> create an xcu group echo cfs > group/xcu.sched_class echo pid > group/cgroup.procs -> xcu_sched_class_write -> mutex_lock(&xcg_mutex); -> xcu_cg_set_sched_class -> xcg_perxcu_cfs_rq_deinit -> mutex_lock(&xcu->xcu_lock); -> cfs_rq = NULL -> mutex_unlock(&xcu->xcu_lock); -> vstream_kick -> mutex_lock(&xcu->xcu_lock); -> enqueue_ctx -> NULL pointer dereference -> mutex_unlock(&xcu->xcu_lock); -> xcu_cfs_cg_init -> alloc cfs_rq -> mutex_unlock(&xcg_mutex); To prevent this, add a sanity check in xse_integrity_check() to ensure that the cfs_rq associated with the xse is valid when the scheduling class is CFS. This returns -EINVAL early if the cfs_rq is temporarily NULL during the transition, avoiding the kernel crash. Fixes: 43bbefc53356 ("xsched: Add XCU control group implementation and its backend in xsched CFS") Signed-off-by: Liu Kai <liukai284@huawei.com> --- include/linux/xsched.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/xsched.h b/include/linux/xsched.h index c3617386cb8d..fde7b3c8ee7d 100644 --- a/include/linux/xsched.h +++ b/include/linux/xsched.h @@ -65,7 +65,6 @@ extern struct list_head xsched_class_list; #define for_each_vstream_in_ctx(vs, ctx) \ list_for_each_entry((vs), &((ctx)->vstream_list), ctx_node) - /* Manages xsched RT-like class linked list based runqueue. * * Now RT-like class runqueue structs is identical @@ -354,6 +353,12 @@ static inline int xse_integrity_check(struct xsched_entity *xse) return -EINVAL; } +#ifdef CONFIG_XCU_SCHED_CFS + /* The cfs_rq of xse may be NULL in some scenarios */ + if (xse->class == &fair_xsched_class && !xsched_cfs_rq_of(xse)) + return -EINVAL; +#endif + #ifdef CONFIG_CGROUP_XCU if (xse->is_group && !xse_this_cfs_rq(xse)) { // Can only be in the free process -- 2.34.1
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/22745 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/DVV... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/22745 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/DVV...
participants (2)
-
Liu Kai -
patchwork bot