From: Zhang Qiao zhangqiao22@huawei.com
hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9RMHW CVE: NA
--------------------------------
When perform load balance, a NUMA imbalance is allowed if busy CPUs is less than the maximum threshold, it keeps a pair of communication tasks on the current node when the destination is lightly loaded.
1. But, calculate_imbalance() use local->sum_nr_running, it may not be accurate, because communication tasks is on busiest group, so should be busiest->sum_nr_running.
2. At the same time, idles cpus are used to calculate imbalance, but the group_weight may not be the same between local and busiest groups. In this case, even if both groups are very idle, imbalance will be calculated very large, so the imbalance is calculated by calculating the difference of busy cpus between groups.
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com
Conflicts: kernel/sched/fair.c Signed-off-by: Zhao Wenhui zhaowenhui8@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/sched/fair.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 45577cd1aa84..778fb388b2e8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11465,17 +11465,19 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
/* * If there is no overload, we just want to even the number of - * idle cpus. + * busy cpus. */ env->migration_type = migrate_task; - env->imbalance = max_t(long, 0, (local->idle_cpus - - busiest->idle_cpus) >> 1); + env->imbalance = max_t(long, 0, + ((busiest->group_weight - busiest->idle_cpus) + - (local->group_weight - local->idle_cpus)) >> 1); }
/* Consider allowing a small imbalance between NUMA groups */ if (env->sd->flags & SD_NUMA) { env->imbalance = adjust_numa_imbalance(env->imbalance, - local->sum_nr_running + 1, env->sd->imb_numa_nr); + busiest->sum_nr_running, + env->sd->imb_numa_nr); }
return;