[PATCH openEuler-1.0-LTS 0/2] nohz_full stall
Vincent Guittot (1): sched/fair: Prevent unlimited runtime on throttled group Zicheng Qu (1): sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups kernel/sched/core.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) -- 2.34.1
From: Vincent Guittot <vincent.guittot@linaro.org> mainline inclusion from mainline-v5.6-rc2 commit 2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13564 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When a running task is moved on a throttled task group and there is no other task enqueued on the CPU, the task can keep running using 100% CPU whatever the allocated bandwidth for the group and although its cfs rq is throttled. Furthermore, the group entity of the cfs_rq and its parents are not enqueued but only set as curr on their respective cfs_rqs. We have the following sequence: sched_move_task -dequeue_task: dequeue task and group_entities. -put_prev_task: put task and group entities. -sched_change_group: move task to new group. -enqueue_task: enqueue only task but not group entities because cfs_rq is throttled. -set_next_task : set task and group_entities as current sched_entity of their cfs_rq. Another impact is that the root cfs_rq runnable_load_avg at root rq stays null because the group_entities are not enqueued. This situation will stay the same until an "external" event triggers a reschedule. Let trigger it immediately instead. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ben Segall <bsegall@google.com> Link: https://lkml.kernel.org/r/1579011236-31256-1-git-send-email-vincent.guittot@... Conflicts: kernel/sched/core.c [Conflicts with set_curr_task(rq, tsk) due to lacking of 03b7fad167ef ("sched: Add task_struct pointer to sched_class::set_curr_task"), but this patch has no relevance to make running state resched_curr(rq).] Signed-off-by: Zicheng Qu <quzicheng@huawei.com> --- kernel/sched/core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f7fa066b05ba..765e437732be 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6565,8 +6565,15 @@ void sched_move_task(struct task_struct *tsk) if (queued) enqueue_task(rq, tsk, queue_flags); - if (running) + if (running) { set_curr_task(rq, tsk); + /* + * After changing group, the running task may have joined a + * throttled one but it's still the running task. Trigger a + * resched to make sure that task can still run. + */ + resched_curr(rq); + } task_rq_unlock(rq, tsk, &rf); } -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13564 CVE: NA ---------------------------------------- Consider the following sequence on a CPU configured with nohz_full: 1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS bandwidth control. The gse (cgroup A) where the task P attached is dequeued and the CPU switches to idle. 2) Before cgroup A is unthrottled, task P is migrated from cgroup A to another cgroup B (not throttled). During sched_move_task(), the task P is observed as queued but not running, and therefore no resched_curr() is triggered. 3) Since the CPU is nohz_full, it remains in do_idle() waiting for an explicit scheduling event, i.e., resched_curr(). 4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task P has already been migrated out of cgroup A, so unthrottle_cfs_rq() may observe load_weight == 0 and return early without resched_curr() called. For kernel >= 6.6: The unthrottling path normally triggers `resched_curr()` almost cases even when no runnable tasks remain in the unthrottled cgroup, preventing the idle stall described above. However, if cgroup A is removed before it gets unthrottled, the unthrottling path for cgroup A is never executed. In a result, no `resched_curr()` can be called. 5) At this point, the task P is runnable in cgroup B (not throttled), but the CPU remains in do_idle() with no pending reschedule point. The system stays in this state until an unrelated event (e.g. a new task wakeup or any cases) that can trigger a resched_curr() breaks the nohz_full idle state, and then the task P finally gets scheduled. The root cause is that sched_move_task() may classify the task as only queued, not running, and therefore fails to trigger a resched_curr(), while the later unthrottling path no longer has visibility of the migrated task. Preserve the existing behavior for running tasks by issuing resched_curr(), and explicitly invoke check_preempt_curr() for tasks that were queued at the time of migration. This ensures that runnable tasks are reconsidered for scheduling even when nohz_full suppresses periodic ticks. Fixes: 29f59db3a74b ("sched: group-scheduler core") Signed-off-by: Zicheng Qu <quzicheng@huawei.com> --- kernel/sched/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 765e437732be..9c4e280fb56f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6563,8 +6563,10 @@ void sched_move_task(struct task_struct *tsk) sched_change_group(tsk, TASK_MOVE_GROUP); - if (queued) + if (queued) { enqueue_task(rq, tsk, queue_flags); + check_preempt_curr(rq, tsk, 0); + } if (running) { set_curr_task(rq, tsk); /* -- 2.34.1
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/20618 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ADD... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/20618 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ADD...
participants (2)
-
patchwork bot -
Zicheng Qu