hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8WTPH
--------------------------------
When cpu hotplug offline, if qos_overload_timer_handler() concurrently running, It could trigger an ABBA deadlock. As qos_overload_timer_handler() requires rq lock, while the cpu hotplug attached firstly and waiting for the qos_timer handling, This can cause Hard LOCKUP like:
[359230.788754] Call trace: [359230.788755] hrtimer_active+0x7c/0xec [359230.788757] hrtimer_cancel+0x3c/0x60 [359230.788758] unthrottle_qos_cfs_rqs+0xbc/0x110 [359230.788760] unthrottle_offline_cfs_rqs+0x40/0x150 [359230.788762] rq_offline_fair+0x60/0x70 [359230.788764] set_rq_offline.part.0+0x54/0xf4 [359230.788765] set_rq_offline+0x34/0x44 [359230.788767] rq_attach_root+0x1e8/0x260 [359230.788768] cpu_attach_domain+0x244/0x430 [359230.788770] detach_destroy_domains+0xbc/0x140 [359230.788772] partition_sched_domains_locked+0x23c/0x314 [359230.788774] rebuild_sched_domains_locked+0x1f0/0x270 [359230.788776] cpuset_hotplug_workfn+0x514/0x74c [359230.788777] process_one_work+0x34c/0x800 [359230.788779] worker_thread+0xa8/0x500 [359230.788780] kthread+0x1e0/0x220 [359230.788782] ret_from_fork+0x10/0x18 [359230.788783] Kernel panic - not syncing: Hard LOCKUP
Fix it by switch to use __unthrottle_qos_cfs_rqs(), instead of unthrottle_qos_cfs_rqs() in unthrottle_offline_cfs_rqs, so that it will not trigger cancel_qos_timer() when cpu hotplug offline.
Fixes: c62a5f1384b9 ("sched/qos: Add qos_tg_{throttle,unthrottle}_{up,down}") Signed-off-by: Zhao Wenhui zhaowenhui8@huawei.com --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 16f6720a244c..f39e7547523c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -131,6 +131,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct hrtimer, qos_overload_timer); static DEFINE_PER_CPU(int, qos_cpu_overload); unsigned int sysctl_overload_detect_period = 5000; /* in ms */ unsigned int sysctl_offline_wait_interval = 100; /* in ms */ +static int __unthrottle_qos_cfs_rqs(int cpu); static int unthrottle_qos_cfs_rqs(int cpu); static bool qos_smt_expelled(int this_cpu); #endif @@ -5733,7 +5734,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) lockdep_assert_rq_held(rq);
#ifdef CONFIG_QOS_SCHED - unthrottle_qos_cfs_rqs(cpu_of(rq)); + __unthrottle_qos_cfs_rqs(cpu_of(rq)); #endif
rcu_read_lock();