[PATCH OLK-6.6 v5 0/5] Support soft domain

v5: - Fix build error on ppc/x86/arm v4: - Fix memleak issue v3: - Fix x86 build error v2: - fix __SCHED_FEAT_SOFT_DOMAIN build error Zhang Qiao (5): sched: topology: Build soft domain for LLC sched: Attach task group to soft domain sched: fair: Select idle cpu in soft domain sched: fair: Disable numa migrate for soft domian task arm64: Enable CONFIG_SCHED_SOFT_DOMAIN arch/arm64/configs/openeuler_defconfig | 2 + include/linux/sched/topology.h | 21 ++ init/Kconfig | 11 + kernel/sched/build_policy.c | 3 + kernel/sched/core.c | 72 ++++++ kernel/sched/fair.c | 91 +++++++ kernel/sched/features.h | 4 + kernel/sched/sched.h | 34 +++ kernel/sched/soft_domain.c | 334 +++++++++++++++++++++++++ 9 files changed, 572 insertions(+) create mode 100644 kernel/sched/soft_domain.c -- 2.18.0.huawei.25

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16504 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/BGF... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16504 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/BGF...

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IC8X6H -------------------------------- On Kunpeng server, each LLC domain contains multiple clusters. When multiple services are deployed within the same LLC domain, their tasks become distributed across all clusters. This results in: 1. High cache synchronization overheadbetween different tasks of the same service. 2. Severe cache contention among tasks from different services. The Soft Domain architecture partitions resources by clusters. Under low-load conditions, each service operates exclusively within its dedicated domain to prevent cross-service interference, thereby enhancing both CPU isolation and improving cache locality. Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> --- include/linux/sched/topology.h | 21 ++++++ init/Kconfig | 12 ++++ kernel/sched/build_policy.c | 3 + kernel/sched/core.c | 1 + kernel/sched/sched.h | 12 ++++ kernel/sched/soft_domain.c | 113 +++++++++++++++++++++++++++++++++ 6 files changed, 162 insertions(+) create mode 100644 kernel/sched/soft_domain.c diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 8e4d9bbdaa40..7f37b5caad42 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -77,6 +77,27 @@ extern int sched_domain_level_max; struct sched_group; +#ifdef CONFIG_SCHED_SOFT_DOMAIN + +struct soft_subdomain { + /* the count of task group attached this sub domain. */ + int attached; + struct list_head node; + unsigned long span[]; +}; + +/* + * Each LLC builds a soft domain: + * A soft scheduling domain is divided into multiple subdomains, + * typically based on the physical structure of CPU clusters. + */ +struct soft_domain { + struct list_head child_domain; + int nr_available_cpus; + unsigned long span[]; +}; +#endif + struct sched_domain_shared { atomic_t ref; atomic_t nr_busy_cpus; diff --git a/init/Kconfig b/init/Kconfig index ac095bad73b5..62522e06fc2d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1148,6 +1148,18 @@ config QOS_SCHED_DYNAMIC_AFFINITY of taskgroup is below threshold setted, otherwise make taskgroup to use cpus allowed. +config SCHED_SOFT_DOMAIN + bool "Soft domain scheduler" + depends on FAIR_GROUP_SCHED + depends on SCHED_CLUSTER + default n + help + This feature builds a CPU soft domain for each task group. Tasks are + prioritized and aggregated to execute within soft domains, which optimizes + resource allocation and enhances cache locality. + + If in doubt, say N. + config SCHED_MM_CID def_bool n depends on SMP && RSEQ diff --git a/kernel/sched/build_policy.c b/kernel/sched/build_policy.c index d9dc9ab3773f..6aff6482b1e7 100644 --- a/kernel/sched/build_policy.c +++ b/kernel/sched/build_policy.c @@ -52,3 +52,6 @@ #include "cputime.c" #include "deadline.c" +#ifdef CONFIG_SCHED_SOFT_DOMAIN +#include "soft_domain.c" +#endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b41f3f30ef57..f891e06f14fd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9965,6 +9965,7 @@ void __init sched_init_smp(void) sched_smp_initialized = true; sched_grid_zone_init(); + build_soft_domain(); #ifdef CONFIG_QOS_SCHED_SMART_GRID init_auto_affinity(&root_task_group); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 761870540a21..056a680ae9ed 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3734,4 +3734,16 @@ extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se); bool bpf_sched_is_cpu_allowed(struct task_struct *p, int cpu); #endif +#ifdef CONFIG_SCHED_SOFT_DOMAIN +void build_soft_domain(void); +static inline struct cpumask *soft_domain_span(unsigned long span[]) +{ + return to_cpumask(span); +} +#else + +static inline void build_soft_domain(void) { } + +#endif + #endif /* _KERNEL_SCHED_SCHED_H */ diff --git a/kernel/sched/soft_domain.c b/kernel/sched/soft_domain.c new file mode 100644 index 000000000000..1be52b056cad --- /dev/null +++ b/kernel/sched/soft_domain.c @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Common code for Soft Domain Scheduling + * + * Copyright (C) 2025-2025 Huawei Technologies Co., Ltd + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + + +static DEFINE_PER_CPU(struct soft_domain *, g_sf_d); + +static void free_sub_soft_domain(struct soft_domain *sf_d); + +static int build_soft_sub_domain(struct sched_domain *sd, struct cpumask *cpus) +{ + struct cpumask *span = sched_domain_span(sd); + int nid = cpu_to_node(cpumask_first(span)); + struct soft_domain *sf_d = NULL; + int i; + + sf_d = kzalloc_node(sizeof(struct soft_domain) + cpumask_size(), + GFP_KERNEL, nid); + if (!sf_d) + return -ENOMEM; + + INIT_LIST_HEAD(&sf_d->child_domain); + sf_d->nr_available_cpus = cpumask_weight(span); + cpumask_copy(to_cpumask(sf_d->span), span); + + for_each_cpu_and(i, sched_domain_span(sd), cpus) { + struct soft_subdomain *sub_d = NULL; + + sub_d = kzalloc_node(sizeof(struct soft_subdomain) + cpumask_size(), + GFP_KERNEL, nid); + if (!sub_d) { + free_sub_soft_domain(sf_d); + return -ENOMEM; + } + + list_add_tail(&sub_d->node, &sf_d->child_domain); + cpumask_copy(soft_domain_span(sub_d->span), cpu_clustergroup_mask(i)); + cpumask_andnot(cpus, cpus, cpu_clustergroup_mask(i)); + } + + for_each_cpu(i, sched_domain_span(sd)) { + rcu_assign_pointer(per_cpu(g_sf_d, i), sf_d); + } + + return 0; +} + +static void free_sub_soft_domain(struct soft_domain *sf_d) +{ + struct list_head *children = &sf_d->child_domain; + struct soft_subdomain *entry = NULL, *next = NULL; + int i; + + list_for_each_entry_safe(entry, next, children, node) { + list_del(&entry->node); + kfree(entry); + } + + for_each_cpu(i, to_cpumask(sf_d->span)) { + rcu_assign_pointer(per_cpu(g_sf_d, i), NULL); + } + + kfree(sf_d); +} + +static void free_soft_domain(void) +{ + struct soft_domain *sf_d = NULL; + int i; + + for_each_cpu(i, cpu_active_mask) { + sf_d = rcu_dereference(per_cpu(g_sf_d, i)); + if (sf_d) + free_sub_soft_domain(sf_d); + } +} + +void build_soft_domain(void) +{ + struct sched_domain *sd; + static struct cpumask cpus; + int i, ret; + + cpumask_copy(&cpus, cpu_active_mask); + rcu_read_lock(); + for_each_cpu(i, &cpus) { + /* build soft domain for each llc domain. */ + sd = rcu_dereference(per_cpu(sd_llc, i)); + if (sd) { + ret = build_soft_sub_domain(sd, &cpus); + if (ret) { + free_soft_domain(); + goto out; + } + } + } + +out: + rcu_read_unlock(); +} -- 2.18.0.huawei.25

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IC8X6H -------------------------------- In the previous patch, we have introduced soft domain. Now, we attach task group to soft domain, task will be preferentially scheduled into their associated soft scheduling domains during low-load periods. To enable the soft domain scheduling feature for a task group, we need to write '1' to the cpu.soft_domain file in the CPU cgroup subsystem. This operation will allocate sub-soft_domains matching the CPU quota of the cgroup(if cpu.cfs_quota_us is -1, Treat it as a 1) to the task group, subsequently establishing a preferred scheduling domain dedicated to this group. Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> --- kernel/sched/core.c | 71 ++++++++++ kernel/sched/fair.c | 20 +++ kernel/sched/sched.h | 22 ++++ kernel/sched/soft_domain.c | 260 +++++++++++++++++++++++++++++++++++++ 4 files changed, 373 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f891e06f14fd..4b6188abe01f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11727,6 +11727,58 @@ static inline s64 cpu_tag_read(struct cgroup_subsys_state *css, } #endif +#ifdef CONFIG_SCHED_SOFT_DOMAIN + +static int cpu_soft_domain_write_s64(struct cgroup_subsys_state *css, + struct cftype *cftype, + s64 val) +{ + return sched_group_set_soft_domain(css_tg(css), val); +} + +static s64 cpu_soft_domain_read_s64(struct cgroup_subsys_state *css, + struct cftype *cftype) +{ + struct task_group *tg = css_tg(css); + + return (s64)tg->sf_ctx->policy; +} + +static int cpu_soft_domain_quota_write_u64(struct cgroup_subsys_state *css, + struct cftype *cftype, u64 val) +{ + struct task_group *tg = css_tg(css); + + if (tg->sf_ctx->policy != 0) + return -EINVAL; + + if (val > cpumask_weight(cpumask_of_node(0))) + return -EINVAL; + + tg->sf_ctx->nr_cpus = (int)val; + + return 0; +} + +static u64 cpu_soft_domain_quota_read_u64(struct cgroup_subsys_state *css, + struct cftype *cftype) +{ + struct task_group *tg = css_tg(css); + + return (u64)tg->sf_ctx->nr_cpus; +} + +static int soft_domain_cpu_list_seq_show(struct seq_file *sf, void *v) +{ + struct task_group *tg = css_tg(seq_css(sf)); + + seq_printf(sf, "%*pbl\n", cpumask_pr_args(to_cpumask(tg->sf_ctx->span))); + + return 0; +} + +#endif + static struct cftype cpu_legacy_files[] = { #ifdef CONFIG_FAIR_GROUP_SCHED { @@ -11765,6 +11817,25 @@ static struct cftype cpu_legacy_files[] = { .write_u64 = cpu_rebuild_affinity_domain_u64, }, #endif +#ifdef CONFIG_SCHED_SOFT_DOMAIN + { + .name = "soft_domain", + .flags = CFTYPE_NOT_ON_ROOT, + .read_s64 = cpu_soft_domain_read_s64, + .write_s64 = cpu_soft_domain_write_s64, + }, + { + .name = "soft_domain_nr_cpu", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = cpu_soft_domain_quota_read_u64, + .write_u64 = cpu_soft_domain_quota_write_u64, + }, + { + .name = "soft_domain_cpu_list", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = soft_domain_cpu_list_seq_show, + }, +#endif #ifdef CONFIG_CFS_BANDWIDTH { .name = "cfs_quota_us", diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ddaa8dd71c3e..d5d6fd21842f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -14765,6 +14765,22 @@ void free_fair_sched_group(struct task_group *tg) kfree(tg->se); } +#ifdef CONFIG_SCHED_SOFT_DOMAIN +int init_soft_domain(struct task_group *tg) +{ + struct soft_domain_ctx *sf_ctx = NULL; + + sf_ctx = kzalloc(sizeof(*sf_ctx) + cpumask_size(), GFP_KERNEL); + if (!sf_ctx) + return -ENOMEM; + + sf_ctx->policy = 0; + tg->sf_ctx = sf_ctx; + + return 0; +} +#endif + int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) { struct sched_entity *se; @@ -14785,6 +14801,10 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) if (ret) goto err; + ret = init_soft_domain(tg); + if (ret) + goto err; + for_each_possible_cpu(i) { cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, cpu_to_node(i)); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 056a680ae9ed..0dc1fccde30b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -404,6 +404,16 @@ struct auto_affinity { }; #endif +#ifdef CONFIG_SCHED_SOFT_DOMAIN + +struct soft_domain_ctx { + int policy; + int nr_cpus; + struct soft_domain *sf_d; + unsigned long span[]; +}; +#endif + /* Task group related information */ struct task_group { struct cgroup_subsys_state css; @@ -469,7 +479,11 @@ struct task_group { struct auto_affinity *auto_affinity; #endif +#ifdef CONFIG_SCHED_SOFT_DOMAIN + KABI_USE(1, struct soft_domain_ctx *sf_ctx) +#else KABI_RESERVE(1) +#endif KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4) @@ -3736,6 +3750,10 @@ bool bpf_sched_is_cpu_allowed(struct task_struct *p, int cpu); #ifdef CONFIG_SCHED_SOFT_DOMAIN void build_soft_domain(void); +int init_soft_domain(struct task_group *tg); + +int sched_group_set_soft_domain(struct task_group *tg, long val); + static inline struct cpumask *soft_domain_span(unsigned long span[]) { return to_cpumask(span); @@ -3743,6 +3761,10 @@ static inline struct cpumask *soft_domain_span(unsigned long span[]) #else static inline void build_soft_domain(void) { } +static inline int init_soft_domain(struct task_group *tg) +{ + return 0; +} #endif diff --git a/kernel/sched/soft_domain.c b/kernel/sched/soft_domain.c index 1be52b056cad..5c56428833d1 100644 --- a/kernel/sched/soft_domain.c +++ b/kernel/sched/soft_domain.c @@ -15,6 +15,7 @@ * */ +#include <linux/sort.h> static DEFINE_PER_CPU(struct soft_domain *, g_sf_d); @@ -111,3 +112,262 @@ void build_soft_domain(void) out: rcu_read_unlock(); } + +static DEFINE_MUTEX(soft_domain_mutex); + +#define NR_MAX_CLUSTER 16 + +struct domain_node { + struct soft_subdomain *sud_d; + unsigned int attached; + unsigned long util; +}; + +static int subdomain_cmp(const void *a, const void *b) +{ + struct domain_node *ca = (struct domain_node *)a; + struct domain_node *cb = (struct domain_node *)b; + + if (ca->attached < cb->attached || + (ca->attached == cb->attached && ca->util < cb->util)) + return -1; + + return 1; +} + +struct soft_domain_args { + int policy; + struct cpumask *cpus; +}; + +static int tg_set_soft_domain(struct task_group *tg, void *data) +{ + struct soft_domain_args *args = (struct soft_domain_args *)data; + + tg->sf_ctx->policy = args->policy; + if (args->policy) + cpumask_copy(to_cpumask(tg->sf_ctx->span), args->cpus); + else + cpumask_clear(to_cpumask(tg->sf_ctx->span)); + + return 0; +} + +static int __calc_cpu(struct task_group *tg) +{ + int nr_cpu = 1; + + if (tg->sf_ctx->nr_cpus) + nr_cpu = tg->sf_ctx->nr_cpus; +#ifdef CONFIG_CFS_BANDWIDTH + else if (tg->cfs_bandwidth.quota != RUNTIME_INF) + nr_cpu = DIV_ROUND_UP_ULL(tg->cfs_bandwidth.quota, tg->cfs_bandwidth.period); +#endif + + tg->sf_ctx->nr_cpus = nr_cpu; + + return nr_cpu; +} + +static unsigned long sum_util(struct cpumask *mask) +{ + unsigned long sum = 0; + int cpu; + + for_each_cpu(cpu, mask) + sum += cpu_util_cfs(cpu); + + return sum; +} + +static int __check_policy(struct task_group *tg, void *data) +{ + return !!tg->sf_ctx->policy; +} + +static int check_policy(struct task_group *tg, long policy) +{ + int ret; + + rcu_read_lock(); + ret = walk_tg_tree_from(tg, __check_policy, tg_nop, NULL); + rcu_read_unlock(); + + return ret; +} + +static struct soft_domain *find_idlest_llc(long policy, + int nr_cpu, cpumask_var_t cpus) +{ + int cpu; + int max_cpu = 0; + struct soft_domain *idlest = NULL; + + /* The user has specified the llc. */ + if (policy > 0) { + cpu = cpumask_first(cpumask_of_node(policy-1)); + idlest = rcu_dereference(per_cpu(g_sf_d, cpu)); + return idlest; + } + + cpumask_copy(cpus, cpu_active_mask); + for_each_cpu(cpu, cpus) { + struct soft_domain *sf_d = NULL; + unsigned long min_util = ULONG_MAX; + + sf_d = rcu_dereference(per_cpu(g_sf_d, cpu)); + if (sf_d == NULL) + continue; + + /* + * LLC selection order: + * 1. When the number of idle cpus meet the requirements, + * the one with more idles cpus is better; + * 2. Under the condition of insufficient idle cpus, util + * is lower, the better. + */ + if (sf_d->nr_available_cpus > max_cpu && + nr_cpu <= sf_d->nr_available_cpus) { + max_cpu = sf_d->nr_available_cpus; + idlest = sf_d; + } else if (max_cpu == 0) { /* No llc meets the demand */ + unsigned long util = sum_util(to_cpumask(sf_d->span)); + + if (idlest == NULL || util < min_util) { + idlest = sf_d; + min_util = util; + } + } + + cpumask_andnot(cpus, cpus, to_cpumask(sf_d->span)); + } + + return idlest; +} + +static int __sched_group_set_soft_domain(struct task_group *tg, long policy) +{ + int cpu; + int ret = 0; + cpumask_var_t cpus; + int nr_cpu = __calc_cpu(tg); + struct soft_domain_args args; + struct domain_node nodes[NR_MAX_CLUSTER] = {0}; + + if (check_policy(tg, policy)) + return -EINVAL; + + if (!zalloc_cpumask_var(&cpus, GFP_KERNEL)) + return -EINVAL; + + scoped_guard (cpus_read_lock) { + struct soft_domain *sf_d = NULL; + + rcu_read_lock(); + /* 1. Find a idlest llc. */ + sf_d = find_idlest_llc(policy, nr_cpu, cpus); + if (sf_d != NULL) { + /* 2. select idlest clusters. */ + struct list_head *children = &sf_d->child_domain; + struct soft_subdomain *sub_d = NULL; + int nr = 0, i; + struct cpumask *tmpmask = NULL; + int tmp_cpu = nr_cpu; + + list_for_each_entry(sub_d, children, node) { + nodes[nr].sud_d = sub_d; + nodes[nr].attached = sub_d->attached; + tmpmask = to_cpumask(sub_d->span); + cpu = cpumask_first(tmpmask); + nodes[nr].util = sum_util(tmpmask); + nr++; + } + + cpumask_clear(cpus); + + sort(nodes, nr, sizeof(struct domain_node), subdomain_cmp, NULL); + sf_d->nr_available_cpus -= min(sf_d->nr_available_cpus, tmp_cpu); + for (i = 0; i < nr; i++) { + sub_d = nodes[i].sud_d; + tmpmask = to_cpumask(sub_d->span); + cpumask_or(cpus, cpus, tmpmask); + sub_d->attached++; + nr_cpu -= cpumask_weight(tmpmask); + if (nr_cpu <= 0) + break; + } + + /* 3. attach task group to softdomain. */ + args.policy = policy; + args.cpus = cpus; + walk_tg_tree_from(tg, tg_set_soft_domain, tg_nop, &args); + + /* + * 4. TODO + * add tg to llc domain task_groups list for load balance. + */ + tg->sf_ctx->sf_d = sf_d; + } else { + ret = -EINVAL; + } + rcu_read_unlock(); + } + + free_cpumask_var(cpus); + + return ret; +} + +static int __sched_group_unset_soft_domain(struct task_group *tg) +{ + struct soft_domain_args args = { + .policy = 0, + }; + struct soft_domain *sf_d = NULL; + struct soft_subdomain *sub_d = NULL; + struct list_head *children = NULL; + + /* If parent has set soft domain, child group can't unset itself. */ + if (tg->parent->sf_ctx->policy != 0) + return -EINVAL; + + sf_d = tg->sf_ctx->sf_d; + sf_d->nr_available_cpus += __calc_cpu(tg); + children = &sf_d->child_domain; + + list_for_each_entry(sub_d, children, node) { + if (cpumask_intersects(to_cpumask(tg->sf_ctx->span), to_cpumask(sub_d->span))) + sub_d->attached--; + } + + walk_tg_tree_from(tg, tg_set_soft_domain, tg_nop, &args); + + return 0; +} + +int sched_group_set_soft_domain(struct task_group *tg, long val) +{ + int ret = 0; + + if (val < -1 || val > nr_node_ids) + return -EINVAL; + + mutex_lock(&soft_domain_mutex); + + /* If enable or disable is repeated, directly return. */ + if (!!tg->sf_ctx->policy == !!val) + goto out; + + if (val == 0) + ret = __sched_group_unset_soft_domain(tg); + else + ret = __sched_group_set_soft_domain(tg, val); + + if (!ret) + tg->sf_ctx->policy = val; + +out: + mutex_unlock(&soft_domain_mutex); + + return ret; +} -- 2.18.0.huawei.25

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IC8X6H -------------------------------- Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> --- kernel/sched/fair.c | 63 +++++++++++++++++++++++++++++++++++++++++ kernel/sched/features.h | 4 +++ 2 files changed, 67 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d5d6fd21842f..6bfaf9b2e0e7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8222,6 +8222,40 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool } } +#ifdef CONFIG_SCHED_SOFT_DOMAIN + if (sched_feat(SOFT_DOMAIN)) { + struct task_group *tg = task_group(p); + + if (tg->sf_ctx && tg->sf_ctx->policy != 0) { + struct cpumask *tmpmask = to_cpumask(tg->sf_ctx->span); + + for_each_cpu_wrap(cpu, tmpmask, target + 1) { + if (!cpumask_test_cpu(cpu, tmpmask)) + continue; + + if (has_idle_core) { + i = select_idle_core(p, cpu, cpus, &idle_cpu); + if ((unsigned int)i < nr_cpumask_bits) + return i; + + } else { + if (--nr <= 0) + return -1; + idle_cpu = __select_idle_cpu(cpu, p); + if ((unsigned int)idle_cpu < nr_cpumask_bits) + return idle_cpu; + } + } + + if (idle_cpu != -1) + return idle_cpu; + + cpumask_andnot(cpus, cpus, tmpmask); + } + + } +#endif + if (static_branch_unlikely(&sched_cluster_active)) { struct sched_group *sg = sd->groups; @@ -9132,6 +9166,30 @@ static void set_task_select_cpus(struct task_struct *p, int *idlest_cpu, } #endif +#ifdef CONFIG_SCHED_SOFT_DOMAIN +static int wake_soft_domain(struct task_struct *p, int target) +{ + struct cpumask *mask = NULL; + struct soft_domain_ctx *ctx = NULL; + + rcu_read_lock(); + ctx = task_group(p)->sf_ctx; + if (!ctx || ctx->policy == 0) + goto unlock; + + mask = to_cpumask(ctx->span); + if (cpumask_test_cpu(target, mask)) + goto unlock; + else + target = cpumask_any_distribute(mask); + +unlock: + rcu_read_unlock(); + + return target; +} +#endif + /* * select_task_rq_fair: Select target runqueue for the waking task in domains * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAKE, @@ -9186,6 +9244,11 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) new_cpu = prev_cpu; } +#ifdef CONFIG_SCHED_SOFT_DOMAIN + if (sched_feat(SOFT_DOMAIN)) + new_cpu = prev_cpu = wake_soft_domain(p, prev_cpu); +#endif + #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->select_cpus); #else diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 26b1a03bd3d2..02577ddf10bd 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -105,3 +105,7 @@ SCHED_FEAT(HZ_BW, true) */ SCHED_FEAT(DA_UTIL_TASKGROUP, true) #endif + +#ifdef CONFIG_SCHED_SOFT_DOMAIN +SCHED_FEAT(SOFT_DOMAIN, false) +#endif -- 2.18.0.huawei.25

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IC8X6H -------------------------------- Currently, for soft-domain task, numa migration is not yet implemented. Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> --- kernel/sched/fair.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6bfaf9b2e0e7..d3f8e6ce7e6b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10690,6 +10690,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu)) return 0; +#ifdef CONFIG_SCHED_SOFT_DOMAIN + /* Do not migrate soft domain tasks between numa. */ + if (sched_feat(SOFT_DOMAIN)) { + if (task_group(p)->sf_ctx && task_group(p)->sf_ctx->policy && + (env->sd->flags & SD_NUMA) != 0) + return 0; + } +#endif + /* Disregard pcpu kthreads; they are where they need to be. */ if (kthread_is_per_cpu(p)) return 0; -- 2.18.0.huawei.25

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IC8X6H -------------------------------- Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> --- arch/arm64/configs/openeuler_defconfig | 2 ++ arch/x86/configs/openeuler_defconfig | 1 + 2 files changed, 3 insertions(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index a263c6d16897..04babddcf126 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -181,6 +181,8 @@ CONFIG_RT_GROUP_SCHED=y CONFIG_QOS_SCHED_DYNAMIC_AFFINITY=y # CONFIG_SCHED_MM_CID is not set CONFIG_QOS_SCHED_SMART_GRID=y +CONFIG_SCHED_SOFT_DOMAIN=y + CONFIG_CGROUP_PIDS=y CONFIG_CGROUP_RDMA=y CONFIG_CGROUP_FREEZER=y diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index df4aae87c5c5..64a5880e18aa 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -199,6 +199,7 @@ CONFIG_CFS_BANDWIDTH=y CONFIG_RT_GROUP_SCHED=y CONFIG_QOS_SCHED_DYNAMIC_AFFINITY=y # CONFIG_QOS_SCHED_SMART_GRID is not set +# CONFIG_SCHED_SOFT_DOMAIN is not set CONFIG_CGROUP_PIDS=y CONFIG_CGROUP_RDMA=y CONFIG_CGROUP_FREEZER=y -- 2.18.0.huawei.25
participants (2)
-
patchwork bot
-
Zhang Qiao