From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 45178ac0cea853fe0e405bf11e101bdebea57b15 ]
Paul reported a very sporadic, rcutorture induced, workqueue failure. When the planets align, the workqueue rescuer's self-migrate fails and then triggers a WARN for running a work on the wrong CPU.
Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call could be ignored! When stopper->enabled is false, stop_machine will insta complete the work, without actually doing the work. Worse, it will not WARN about this (we really should fix this).
It turns out there is a small window where a freshly online'ed CPU is marked 'online' but doesn't yet have the stopper task running:
BP AP
bringup_cpu() __cpu_up(cpu, idle) --> start_secondary() ... cpu_startup_entry() bringup_wait_for_ap() wait_for_ap_thread() <-- cpuhp_online_idle() while (1) do_idle()
... available to run kthreads ...
stop_machine_unpark() stopper->enable = true;
Close this by moving the stop_machine_unpark() into cpuhp_online_idle(), such that the stopper thread is ready before we start the idle loop and schedule.
Reported-by: "Paul E. McKenney" paulmck@kernel.org Debugged-by: Tejun Heo tj@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: "Paul E. McKenney" paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/cpu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c index f8c5a18..1f646ba 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -493,8 +493,7 @@ static int bringup_wait_for_ap(unsigned int cpu) if (WARN_ON_ONCE((!cpu_online(cpu)))) return -ECANCELED;
- /* Unpark the stopper thread and the hotplug thread of the target cpu */ - stop_machine_unpark(cpu); + /* Unpark the hotplug thread of the target cpu */ kthread_unpark(st->thread);
/* @@ -1050,8 +1049,8 @@ void notify_cpu_starting(unsigned int cpu)
/* * Called from the idle task. Wake up the controlling task which brings the - * stopper and the hotplug thread of the upcoming CPU up and then delegates - * the rest of the online bringup to the hotplug thread. + * hotplug thread of the upcoming CPU up and then delegates the rest of the + * online bringup to the hotplug thread. */ void cpuhp_online_idle(enum cpuhp_state state) { @@ -1061,6 +1060,12 @@ void cpuhp_online_idle(enum cpuhp_state state) if (state != CPUHP_AP_ONLINE_IDLE) return;
+ /* + * Unpart the stopper thread before we start the idle loop (and start + * scheduling); this ensures the stopper task is always available. + */ + stop_machine_unpark(smp_processor_id()); + st->state = CPUHP_AP_ONLINE_IDLE; complete_ap_thread(st, true); }