From: Wenyu Huang huangwenyu5@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB7GK5
--------------------------------
When cpu hotplug offline, if qos_overload_timer_handler() concurrently running, It could trigger an ABBA deadlock. As qos_overload_timer_handler() requires rq lock, while the cpu hotplug attached firstly and waiting for the qos_timer handling, This can cause Hard LOCKUP like:
[359230.788754] Call trace: [359230.788755] hrtimer_active+0x7c/0xec [359230.788757] hrtimer_cancel+0x3c/0x60 [359230.788758] unthrottle_qos_cfs_rqs+0xbc/0x110 [359230.788760] unthrottle_offline_cfs_rqs+0x40/0x150 [359230.788762] rq_offline_fair+0x60/0x70 [359230.788764] set_rq_offline.part.0+0x54/0xf4 [359230.788765] set_rq_offline+0x34/0x44 [359230.788767] rq_attach_root+0x1e8/0x260 [359230.788768] cpu_attach_domain+0x244/0x430 [359230.788770] detach_destroy_domains+0xbc/0x140 [359230.788772] partition_sched_domains_locked+0x23c/0x314 [359230.788774] rebuild_sched_domains_locked+0x1f0/0x270 [359230.788776] cpuset_hotplug_workfn+0x514/0x74c [359230.788777] process_one_work+0x34c/0x800 [359230.788779] worker_thread+0xa8/0x500 [359230.788780] kthread+0x1e0/0x220 [359230.788782] ret_from_fork+0x10/0x18 [359230.788783] Kernel panic - not syncing: Hard LOCKUP
Fix it by switch to use __unthrottle_qos_cfs_rqs(), instead of unthrottle_qos_cfs_rqs() in unthrottle_offline_cfs_rqs, so that it will not trigger cancel_qos_timer() when cpu hotplug offline.
Fixes: 926b9b0cd97e ("sched: Throttle qos cfs_rq when current cpu is running online task") Signed-off-by: Zhao Wenhui zhaowenhui8@huawei.com Signed-off-by: Wenyu Huang huangwenyu5@huawei.com Signed-off-by: Liu Kai liukai284@huawei.com --- kernel/sched/fair.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f1cd57e70f1f..2bf8b64182c5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -152,6 +152,7 @@ unsigned int sysctl_overload_detect_period = 5000; /* in ms */ unsigned int sysctl_offline_wait_interval = 100; /* in ms */ static int one_thousand = 1000; static int hundred_thousand = 100000; +static int __unthrottle_qos_cfs_rqs(int cpu); static int unthrottle_qos_cfs_rqs(int cpu); static bool qos_smt_expelled(int this_cpu); #endif @@ -6672,7 +6673,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) */ rq_clock_start_loop_update(rq); #ifdef CONFIG_QOS_SCHED - unthrottle_qos_cfs_rqs(cpu_of(rq)); + __unthrottle_qos_cfs_rqs(cpu_of(rq)); #endif
rcu_read_lock(); @@ -6699,9 +6700,6 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) rcu_read_unlock();
rq_clock_stop_loop_update(rq); -#ifdef CONFIG_QOS_SCHED - unthrottle_qos_cfs_rqs(cpu_of(rq)); -#endif }
bool cfs_task_bw_constrained(struct task_struct *p)