From: "Eric W. Biederman" ebiederm@xmission.com
mainline inclusion from mainline-5.4-rc1 commit 0ff7b2cfbae36ebcd216c6a5ad7f8534eebeaee2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I3UKOW CVE: NA
-------------------------------------------------
In the ordinary case today the RCU grace period for a task_struct is triggered when another process wait's for it's zombine and causes the kernel to call release_task(). As the waiting task has to receive a signal and then act upon it before this happens, typically this will occur after the original task as been removed from the runqueue.
Unfortunaty in some cases such as self reaping tasks it can be shown that release_task() will be called starting the grace period for task_struct long before the task leaves the runqueue.
Therefore use put_task_struct_rcu_user() in finish_task_switch() to guarantee that the there is a RCU lifetime after the task leaves the runqueue.
Besides the change in the start of the RCU grace period for the task_struct this change may cause perf_event_delayed_put and trace_sched_process_free. The function perf_event_delayed_put boils down to just a WARN_ON for cases that I assume never show happen. So I don't see any problem with delaying it.
The function trace_sched_process_free is a trace point and thus visible to user space. Occassionally userspace has the strangest dependencies so this has a miniscule chance of causing a regression. This change only changes the timing of when the tracepoint is called. The change in timing arguably gives userspace a more accurate picture of what is going on. So I don't expect there to be a regression.
In the case where a task self reaps we are pretty much guaranteed that the RCU grace period is delayed. So we should get quite a bit of coverage in of this worst case for the change in a normal threaded workload. So I expect any issues to turn up quickly or not at all.
I have lightly tested this change and everything appears to work fine.
Inspired-by: Linus Torvalds torvalds@linux-foundation.org Inspired-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Paul E. McKenney paulmck@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/87r24jdpl5.fsf_-_@x220.int.ebiederm.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Li Hua hucool.lihua@huawei.com Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com
Conflicts: kernel/fork.c Reviewed-by: Cheng Jian cj.chengjian@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com
Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/fork.c | 11 +++++++---- kernel/sched/core.c | 2 +- 2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c index 1aaf67e0f32ea..b4fee9799c153 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -896,10 +896,13 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) tsk->stack_canary = get_random_canary(); #endif
- /* One for the user space visible state that goes away when reaped. */ - refcount_set(&tsk->rcu_users, 1); - /* One for the rcu users, and one for the scheduler */ - atomic_set(&tsk->usage, 2); + /* + * One for the user space visible state that goes away when reaped. + * One for the scheduler. + */ + refcount_set(&tsk->rcu_users, 2); + /* One for the rcu users */ + atomic_set(&tsk->usage, 1); #ifdef CONFIG_BLK_DEV_IO_TRACE tsk->btrace_seq = 0; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6aee9acd9e609..09c31975e667f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2716,7 +2716,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) /* Task is done with its stack. */ put_task_stack(prev);
- put_task_struct(prev); + put_task_struct_rcu_user(prev); }
tick_nohz_task_switch();