-----Original Message----- From: Valentin Schneider [mailto:valentin.schneider@arm.com] Sent: Thursday, February 4, 2021 4:55 AM To: linux-kernel@vger.kernel.org Cc: vincent.guittot@linaro.org; mgorman@suse.de; mingo@kernel.org; peterz@infradead.org; dietmar.eggemann@arm.com; morten.rasmussen@arm.com; linuxarm@openeuler.org; xuwei (O) xuwei5@huawei.com; Liguozhu (Kenneth) liguozhu@hisilicon.com; tiantao (H) tiantao6@hisilicon.com; wanghuiqiang wanghuiqiang@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; Jonathan Cameron jonathan.cameron@huawei.com; guodong.xu@linaro.org; Song Bao Hua (Barry Song) song.bao.hua@hisilicon.com; Meelis Roos mroos@linux.ee Subject: [RFC PATCH 2/2] Revert "sched/topology: Warn when NUMA diameter > 2"
The scheduler topology code can now figure out what to do with such topologies.
This reverts commit b5b217346de85ed1b03fdecd5c5076b34fbb2f0b.
Signed-off-by: Valentin Schneider valentin.schneider@arm.com
Yes, this is fine. I actually have seen some other problems we need to consider.
The current code is probably well consolidated for machines with 2 hops or less. Thus, even after we fix the 3-hops span issue, I can still see some other issue.
For example, if we change the sd flags and remove the SD_BALANCE flags for the last hops in sd_init(), we are able to see large score increase in unixbench.
if (sched_domains_numa_distance[tl->numa_level] > node_reclaim_distance || is_3rd_hops_domain(...)) { sd->flags &= ~(SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE); }
So guess something needs to be tuned for machines with 3 hops or more.
But we need a kernel which has the fix of 3-hops issue before we can do more work.
kernel/sched/topology.c | 33 --------------------------------- 1 file changed, 33 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index a8f69f234258..0fa41aab74e0 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -688,7 +688,6 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) { struct rq *rq = cpu_rq(cpu); struct sched_domain *tmp;
int numa_distance = 0;
/* Remove the sched domains which do not contribute to scheduling. */ for (tmp = sd; tmp; ) {
@@ -720,38 +719,6 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) sd->child = NULL; }
for (tmp = sd; tmp; tmp = tmp->parent)
numa_distance += !!(tmp->flags & SD_NUMA);
/*
* FIXME: Diameter >=3 is misrepresented.
*
* Smallest diameter=3 topology is:
*
* node 0 1 2 3
* 0: 10 20 30 40
* 1: 20 10 20 30
* 2: 30 20 10 20
* 3: 40 30 20 10
*
* 0 --- 1 --- 2 --- 3
*
* NUMA-3 0-3 N/A N/A 0-3
* groups: {0-2},{1-3} {1-3},{0-2}
*
* NUMA-2 0-2 0-3 0-3 1-3
* groups: {0-1},{1-3} {0-2},{2-3} {1-3},{0-1} {2-3},{0-2}
*
* NUMA-1 0-1 0-2 1-3 2-3
* groups: {0},{1} {1},{2},{0} {2},{3},{1} {3},{2}
*
* NUMA-0 0 1 2 3
*
* The NUMA-2 groups for nodes 0 and 3 are obviously buggered, as the
* group span isn't a subset of the domain span.
*/
WARN_ONCE(numa_distance > 2, "Shortest NUMA path spans too many nodes\n");
sched_domain_debug(sd, cpu);
rq_attach_root(rq, rd);
-- 2.27.0
Thanks Barry