
Hi Barry, On 08/02/21 10:04, Song Bao Hua (Barry Song) wrote:
-----Original Message----- From: Valentin Schneider [mailto:valentin.schneider@arm.com]
Hi Valentin,
While I like your approach, this will require more time to evaluate possible influence as the approach also affects all machines without 3-hops issue. So x86 platforms need to be tested and benchmark is required.
What about we firstly finish the review of "grandchild" approach v2 and have a solution for kunpeng920 and Sun Fire X4600-M2 while not impacting other machines which haven't 3-hops issues first?
I figured I'd toss this out while the iron was hot (and I had the topology crud paged in), but I ultimately agree that it's better to first go with something that fixes the diameter > 2 topologies and leaves the other ones untouched, which is exactly what you have.
I would appreciate very much if you could comment on v2: https://lore.kernel.org/lkml/20210203111201.20720-1-song.bao.hua@hisilicon.c...
See my comment below on domain degeneration; with that taken care of I would say it's good to go. Have a look at what patch1+patch3 squashed together looks like, passing the right sd to init_overlap_sched_group() looks a bit neater IMO.
+static struct sched_domain *find_node_domain(struct sched_domain *sd) +{ + struct sched_domain *parent; + + BUG_ON(!(sd->flags & SD_NUMA)); + + /* Get to the level above NODE */ + while (sd && sd->child) { + parent = sd; + sd = sd->child; + + if (!(sd->flags & SD_NUMA)) + break; + } + /* + * We're going to create cross topology level sched_group_capacity + * references. This can only work if the domains resulting from said + * levels won't be degenerated, as we need said sgc to be periodically + * updated: it needs to be attached to the local group of a domain + * that didn't get degenerated. + * + * Of course, groups aren't available yet, so we can't call the usual + * sd_degenerate(). Checking domain spans is the closest we get. + * Start from NODE's parent, and keep going up until we get a domain + * we're sure won't be degenerated. + */ + while (sd->parent && + cpumask_equal(sched_domain_span(sd), sched_domain_span(parent))) { + sd = parent; + parent = sd->parent; + }
So this is because the sched_domain which doesn't contribute to scheduler will be destroyed during cpu_attach_domain() since sd and parent span the seam mask?
Yes; let's take your topology for instance: node 0 1 2 3 0: 10 12 20 22 1: 12 10 22 24 2: 20 22 10 12 3: 22 24 12 10 2 10 2 0 <---> 1 <---> 2 <---> 3 Domains for node1 will look like (before any fixes are applied): NUMA<=10: span=1 groups=(1) NUMA<=12: span=0-1 groups=(1)->(0) NUMA<=20: span=0-1 groups=(0,1) NUMA<=22: span=0-2 groups=(0,1)->(0,2-3) NUMA<=24: span=0-3 groups=(0-2)->(0,2-3) As you can see, the domain representing distance <= 20 will be degenerated (it has a single group). If we were to e.g. add some more nodes to the left of node0, then we would trigger the "grandchildren logic" for node1 and would end up creating a reference to node1 NUMA<=20's sgc, which is a mistake: that domain will be degenerated, and that sgc will never be updated. The right thing to do here would be reference node1 NUMA<=12's sgc, which the above snippet does.
+ + return parent; +} +