-----Original Message----- From: Meelis Roos [mailto:mroos@linux.ee] Sent: Thursday, February 4, 2021 12:58 AM To: Song Bao Hua (Barry Song) song.bao.hua@hisilicon.com; valentin.schneider@arm.com; vincent.guittot@linaro.org; mgorman@suse.de; mingo@kernel.org; peterz@infradead.org; dietmar.eggemann@arm.com; morten.rasmussen@arm.com; linux-kernel@vger.kernel.org Cc: linuxarm@openeuler.org; xuwei (O) xuwei5@huawei.com; Liguozhu (Kenneth) liguozhu@hisilicon.com; tiantao (H) tiantao6@hisilicon.com; wanghuiqiang wanghuiqiang@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; Jonathan Cameron jonathan.cameron@huawei.com; guodong.xu@linaro.org Subject: Re: [PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2
03.02.21 13:12 Barry Song wrote:
kernel/sched/topology.c | 85 +++++++++++++++++++++++++---------------- 1 file changed, 53 insertions(+), 32 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 5d3675c7a76b..964ed89001fe 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c
This one still works on the Sun X4600-M2, on top of v5.11-rc6-55-g3aaf0a27ffc2.
Performance-wise - is the some simple benhmark to run to meaure the impact? Compared to what - 5.10.0 or the kernel with the warning?
Hi Meelis, Thanks for retesting.
Comparing to the kernel with the warning is enough. As I mentioned here: https://lore.kernel.org/lkml/20210115203632.34396-1-song.bao.hua@hisilicon.c...
I have seen two major issues the broken sched_group has:
* in load_balance() and find_busiest_group() kernel is calculating the avg_load and group_type by:
sum(load of cpus within sched_domain) ------------------------------------ capacity of the whole sched_group
since sched_group isn't a subset of sched_domain, so the load of the problematic group is severely underestimated.
sched_domain
+----------------------------------+ | | | +-------------------------------------------+ | | +-------+ +------+ | | | | | cpu0 | | cpu1 | | | | | +-------+ +------+ | | +----------------------------------+ | | | | +-------+ +-------+ | | |cpu2 | |cpu3 | | | +-------+ +-------+ | | | +-------------------------------------------+ problematic sched_group
For the above example, kernel will divide "the sum load of cpu0 and cpu1" by "the capacity of the whole group including cpu0,1,2 and 3".
* in select_task_rq_fair() and find_idlest_group() Kernel could push a forked/exec-ed task to the outside of the sched_domain, but still inside the sched_group. For the above diagram, while kernel wants to find the idlest cpu in the sched_domain, it can result in picking cpu2 or cpu3.
I guess these two issues can potentially affect many benchmarks. Our team have seen 5% unixbench score increase with the fix in some machines though the real impact might be case-by-case.
drop caches and time the build time of linux kernel with make -j64?
-- Meelis Roos
Thanks Barry