[PATCH v2 OLK-6.6 0/5] Fix NULL pointer dereference
Fix NULL pointer dereference Liu Kai (5): xSched/cgroup: remove legacy cftypes due to cgroup v1 incompatibility xSched: fix incorrect error returns in vstream allocation xSched: prevent HBM allocation for non-AI tasks xSched/dmem: memory allocation size must larger than 0 xSched: fix NULL pointer dereference in cfs_rq during class switch include/linux/xsched.h | 30 ++++++++++++++++++++++++------ kernel/xsched/cgroup.c | 3 +-- kernel/xsched/core.c | 12 ++++-------- kernel/xsched/vstream.c | 27 +++++++++------------------ 4 files changed, 38 insertions(+), 34 deletions(-) -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- The xcu subsystem does not support cgroup v1, so the legacy cftypes are no longer applicable. Remove them to prevent potential conflicts and align with the cgroup v2-only design. Fixes: 43bbefc53356 ("xsched: Add XCU control group implementation and its backend in xsched CFS") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/cgroup.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/xsched/cgroup.c b/kernel/xsched/cgroup.c index 0127bf4fa270..8d22aa78ea81 100644 --- a/kernel/xsched/cgroup.c +++ b/kernel/xsched/cgroup.c @@ -930,7 +930,5 @@ struct cgroup_subsys xcu_cgrp_subsys = { #ifdef CONFIG_CGROUP_FREEZER .legacy_cftypes = files, .legacy_name = "freezer", -#else - .legacy_cftypes = xcu_cg_files, #endif }; -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- Replace hardcoded -EINVAL returns with the actual error codes generated during execution. This ensures that callers receive precise failure reasons, such as distinguishing between an invalid argument and a missing device (-ENODEV) in vstream_hbm_alloc, or preserving the original failure reason from xsched_xse_init(). Fixes: 8dde1f2e6bf6 ("xsched: Introduce vstream management") Fixes: dd2bb45851e5 ("xSched/dmem: introduce xsched_dmem_alloc()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/vstream.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/xsched/vstream.c b/kernel/xsched/vstream.c index bf9c047dbc79..6ece03d3ff22 100644 --- a/kernel/xsched/vstream.c +++ b/kernel/xsched/vstream.c @@ -245,7 +245,7 @@ static int alloc_ctx_from_vstream(struct vstream_info *vstream_info, XSCHED_ERR("Fail to initialize XSE for context @ %s\n", __func__); kfree(*ctx); - return -EINVAL; + return ret; } list_add(&(*ctx)->ctx_node, &xcu->ctx_list); @@ -415,10 +415,8 @@ static int sqcq_alloc(struct vstream_args *arg) vstream->task_type = arg->task_type; ret = vstream_bind_to_xcu(vstream); - if (ret < 0) { - ret = -EINVAL; + if (ret != 0) goto out_err_vstream_free; - } /* Allocates vstream's SQ and CQ memory on a XCU for processing. */ params.group = vstream->xcu->group; @@ -645,7 +643,7 @@ static int vstream_hbm_alloc(struct vstream_args *arg) xcu_found = xcu_find(XCU_TYPE_XPU, arg->dev_id, arg->channel_id); if (!xcu_found) - return -EINVAL; + return -ENODEV; /* it will either allocate or find a context */ mutex_lock(&xcu_found->ctx_list_lock); -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- In the vstream_hbm_alloc() path, the previous implementation would attempt to automatically allocate a new context if one was not found for the current task. However, HBM (High Bandwidth Memory) allocation should only be permitted for existing AI tasks that already have a valid context. This patch simplifies the control flow by removing the automatic context creation logic. Now, if a valid context is not found for the current task, the function will directly return -ENOENT, effectively preventing invalid HBM region allocations. Fixes: dd2bb45851e5 ("xSched/dmem: introduce xsched_dmem_alloc()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/vstream.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/kernel/xsched/vstream.c b/kernel/xsched/vstream.c index 6ece03d3ff22..1a30c205b366 100644 --- a/kernel/xsched/vstream.c +++ b/kernel/xsched/vstream.c @@ -633,7 +633,6 @@ int vstream_kick(struct vstream_args *arg) #ifdef CONFIG_CGROUP_DMEM static int vstream_hbm_alloc(struct vstream_args *arg) { - vstream_info_t vstream_info; struct xsched_cu *xcu_found; struct xsched_context *ctx; int ret = 0; @@ -648,22 +647,13 @@ static int vstream_hbm_alloc(struct vstream_args *arg) /* it will either allocate or find a context */ mutex_lock(&xcu_found->ctx_list_lock); ctx = ctx_find_by_tgid_and_xcu(current->tgid, xcu_found); - if (ctx) { + if (ctx) kref_get(&ctx->kref); - } else { - vstream_info.tgid = current->tgid; - vstream_info.xcu = xcu_found; - vstream_info.dev_id = arg->dev_id; - vstream_info.channel_id = arg->channel_id; - vstream_info.fd = arg->fd; - - ret = alloc_ctx_from_vstream(&vstream_info, &ctx); - } mutex_unlock(&xcu_found->ctx_list_lock); - if (ret != 0) { + if (!ctx) { XSCHED_ERR("Failed to find a context for HBM alloc"); - return ret; + return -ENOENT; } ret = xsched_dmem_alloc(ctx, arg); -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- Add a sanity check to ensure the requested memory allocation size is greater than zero in vstream_hbm_alloc(). Attempting to allocate a zero-sized HBM region is invalid and should be rejected early. This patch returns -EINVAL if arg->vm_args.size is zero, preventing potential issues in the subsequent memory allocation logic. Fixes: dd2bb45851e5 ("xSched/dmem: introduce xsched_dmem_alloc()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- kernel/xsched/vstream.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/xsched/vstream.c b/kernel/xsched/vstream.c index 1a30c205b366..e8173a7e10e2 100644 --- a/kernel/xsched/vstream.c +++ b/kernel/xsched/vstream.c @@ -640,6 +640,9 @@ static int vstream_hbm_alloc(struct vstream_args *arg) if (!dmem_cgroup_enabled()) return -EPERM; + if (arg->vm_args.size == 0) + return -EINVAL; + xcu_found = xcu_find(XCU_TYPE_XPU, arg->dev_id, arg->channel_id); if (!xcu_found) return -ENODEV; -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9206 ---------------------------------------- A race condition exists when switching the scheduling class of an xcu group while tasks are being enqueued. The issue occurs due to the following interleaving: mkdir -p group -> create an xcu group echo cfs > group/xcu.sched_class echo pid > group/cgroup.procs -> xcu_sched_class_write -> mutex_lock(&xcg_mutex); -> xcu_cg_set_sched_class -> xcg_perxcu_cfs_rq_deinit -> mutex_lock(&xcu->xcu_lock); -> cfs_rq = NULL -> mutex_unlock(&xcu->xcu_lock); -> vstream_kick -> mutex_lock(&xcu->xcu_lock); -> enqueue_ctx -> NULL pointer dereference -> mutex_unlock(&xcu->xcu_lock); -> xcu_cfs_cg_init -> alloc cfs_rq -> mutex_unlock(&xcg_mutex); To prevent this, add a sanity check in xse_integrity_check() to ensure that the cfs_rq associated with the xse is valid when the scheduling class is CFS. This returns -EINVAL early if the cfs_rq is temporarily NULL during the transition, avoiding the kernel crash. Fixes: 43bbefc53356 ("xsched: Add XCU control group implementation and its backend in xsched CFS") Signed-off-by: Liu Kai <liukai284@huawei.com> --- include/linux/xsched.h | 30 ++++++++++++++++++++++++------ kernel/xsched/cgroup.c | 1 + kernel/xsched/core.c | 12 ++++-------- 3 files changed, 29 insertions(+), 14 deletions(-) diff --git a/include/linux/xsched.h b/include/linux/xsched.h index c3617386cb8d..e963f740536c 100644 --- a/include/linux/xsched.h +++ b/include/linux/xsched.h @@ -65,7 +65,6 @@ extern struct list_head xsched_class_list; #define for_each_vstream_in_ctx(vs, ctx) \ list_for_each_entry((vs), &((ctx)->vstream_list), ctx_node) - /* Manages xsched RT-like class linked list based runqueue. * * Now RT-like class runqueue structs is identical @@ -302,8 +301,6 @@ struct xsched_group { #define parent_xse_of(__xse) (&(xse_parent_grp_xcu((__xse))->xse)) -#define xsched_cfs_rq_of(xse) (xse_parent_grp_xcu((xse))->cfs_rq) - #define xsched_group_cfs_rq(__xg, __id) ((__xg)->perxcu_priv[(__id)].cfs_rq) #define for_each_xse(__xse) \ @@ -313,6 +310,15 @@ struct xsched_group { #define for_each_xsched_group(__xg) \ for (; (__xg) != root_xcg; (__xg) = (__xg)->parent) +static inline struct xsched_rq_cfs *xsched_cfs_rq_of(struct xsched_entity *xse) +{ + /* detach from group */ + if (unlikely(!xse->parent_grp)) + return xse->cfs.cfs_rq; + + return xse_parent_grp_xcu((xse))->cfs_rq; +} + /** * Only group xsched entities are permitted to call. * @@ -334,12 +340,15 @@ static inline bool xsched_entity_throttled(struct xsched_entity *xse) return cfs_rq->throttled; } -#else - -#define xsched_cfs_rq_of(xse) (&((xse)->xcu->xrq.cfs)) +#else /* !CONFIG_CGROUP_XCU */ #define for_each_xse(__xse) for (; (__xse); (__xse) = NULL) +static inline struct xsched_rq_cfs *xsched_cfs_rq_of(struct xsched_entity *xse) +{ + return &(xse->xcu->xrq.cfs); +} + #endif /* CONFIG_CGROUP_XCU */ static inline int xse_integrity_check(struct xsched_entity *xse) @@ -354,6 +363,15 @@ static inline int xse_integrity_check(struct xsched_entity *xse) return -EINVAL; } +#ifdef CONFIG_XCU_SCHED_CFS + /* The cfs_rq of xse may be NULL in some scenarios */ + if (xse->class == &fair_xsched_class && !xsched_cfs_rq_of(xse)) { + XSCHED_ERR("the cfs_rq of this xse [%d] can't be NULL @ %s\n", + xse->tgid, __func__); + return -EINVAL; + } +#endif + #ifdef CONFIG_CGROUP_XCU if (xse->is_group && !xse_this_cfs_rq(xse)) { // Can only be in the free process diff --git a/kernel/xsched/cgroup.c b/kernel/xsched/cgroup.c index 8d22aa78ea81..48b759f41655 100644 --- a/kernel/xsched/cgroup.c +++ b/kernel/xsched/cgroup.c @@ -377,6 +377,7 @@ void xsched_group_xse_detach(struct xsched_entity *xse) spin_lock(&xcg->lock); list_del(&xse->group_node); + xse->parent_grp = NULL; spin_unlock(&xcg->lock); } diff --git a/kernel/xsched/core.c b/kernel/xsched/core.c index 142f454fe8ad..40daa2f2d79e 100644 --- a/kernel/xsched/core.c +++ b/kernel/xsched/core.c @@ -121,10 +121,8 @@ void enqueue_ctx(struct xsched_entity *xse, struct xsched_cu *xcu) { lockdep_assert_held(&xcu->xcu_lock); - if (xse_integrity_check(xse)) { - XSCHED_ERR("Fail to check xse integrity @ %s\n", __func__); + if (xse_integrity_check(xse)) return; - } if (!xse->on_rq) { xse->xcu = xcu; @@ -135,13 +133,11 @@ void enqueue_ctx(struct xsched_entity *xse, struct xsched_cu *xcu) void dequeue_ctx(struct xsched_entity *xse) { - if (xse_integrity_check(xse)) { - XSCHED_ERR("Fail to check xse integrity @ %s\n", __func__); - return; - } - lockdep_assert_held(&xse->xcu->xcu_lock); + if (xse_integrity_check(xse)) + return; + if (xse->on_rq) { xse->class->dequeue_ctx(xse); XSCHED_DEBUG("Dequeue xse %d @ %s\n", xse->tgid, __func__); -- 2.34.1
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/22790 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/BMY... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/22790 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/BMY...
participants (2)
-
Liu Kai -
patchwork bot