Zhang Qiao (2): sched/numa: Fix numa imbalance in load_balance() config: Disable COBFIG_ARCH_CUSTOM_NUMA_DISTANCE for arm64
arch/arm64/configs/openeuler_defconfig | 2 +- kernel/sched/fair.c | 13 ++++++++----- 2 files changed, 9 insertions(+), 6 deletions(-)
hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9RMHW CVE: NA
--------------------------------
When perform load balance, a NUMA imbalance is allowed if busy CPUs is less than the maximum threshold, it keeps a pair of communication tasks on the current node when the destination is lightly loaded.
1. But, calculate_imbalance() use local->sum_nr_running, it may not be accurate, because communication tasks is on busiest group, so should be busiest->sum_nr_running.
2. At the same time, idles cpus are used to calculate imbalance, but the group_weight may not be the same between local and busiest groups. In this case, even if both groups are very idle, imbalance will be calculated very large, so the imbalance is calculated by calculating the difference of busy cpus between groups.
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com --- kernel/sched/fair.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3dcbf50b677f..cce2bbdff154 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1493,7 +1493,7 @@ adjust_numa_imbalance(int imbalance, int dst_running, int imb_numa_nr) * Allow a small imbalance based on a simple pair of communicating * tasks that remain local when the destination is lightly loaded. */ - if (imbalance <= NUMA_IMBALANCE_MIN) + if (imbalance <= imb_numa_nr) return 0;
return imbalance; @@ -12077,6 +12077,8 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu) #ifdef CONFIG_NUMA if (sd->flags & SD_NUMA) { int imb_numa_nr = sd->imb_numa_nr; + int local_busy_cpus = local_sgs.group_weight - local_sgs.idle_cpus; + int idlest_busy_cpus = idlest_sgs.group_weight - idlest_sgs.idle_cpus; #ifdef CONFIG_NUMA_BALANCING int idlest_cpu; /* @@ -12106,7 +12108,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu) imb_numa_nr = min(cpumask_weight(cpus), sd->imb_numa_nr); }
- imbalance = abs(local_sgs.idle_cpus - idlest_sgs.idle_cpus); + imbalance = abs(idlest_busy_cpus - local_busy_cpus); if (!adjust_numa_imbalance(imbalance, local_sgs.sum_nr_running + 1, imb_numa_nr)) { @@ -12382,18 +12384,19 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
/* * If there is no overload, we just want to even the number of - * idle cpus. + * busy cpus. */ env->migration_type = migrate_task; env->imbalance = max_t(long, 0, - (local->idle_cpus - busiest->idle_cpus)); + ((busiest->group_weight - busiest->idle_cpus) + - (local->group_weight - local->idle_cpus))); }
#ifdef CONFIG_NUMA /* Consider allowing a small imbalance between NUMA groups */ if (env->sd->flags & SD_NUMA) { env->imbalance = adjust_numa_imbalance(env->imbalance, - local->sum_nr_running + 1, + busiest->sum_nr_running, env->sd->imb_numa_nr); } #endif
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8PG0C CVE: NA
--------------------------------
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com --- arch/arm64/configs/openeuler_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 3b8d78f944a0..84873e64b32b 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -468,7 +468,7 @@ CONFIG_HOTPLUG_CPU=y CONFIG_NUMA=y CONFIG_NODES_SHIFT=8 CONFIG_NUMA_AWARE_SPINLOCKS=y -CONFIG_ARCH_CUSTOM_NUMA_DISTANCE=y +# CONFIG_ARCH_CUSTOM_NUMA_DISTANCE is not set # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set