-----Original Message----- From: Meelis Roos [mailto:mroos@linux.ee] Sent: Wednesday, February 10, 2021 1:40 AM To: Song Bao Hua (Barry Song) song.bao.hua@hisilicon.com; valentin.schneider@arm.com; vincent.guittot@linaro.org; mgorman@suse.de; mingo@kernel.org; peterz@infradead.org; dietmar.eggemann@arm.com; morten.rasmussen@arm.com; linux-kernel@vger.kernel.org Cc: linuxarm@openeuler.org; xuwei (O) xuwei5@huawei.com; Liguozhu (Kenneth) liguozhu@hisilicon.com; tiantao (H) tiantao6@hisilicon.com; wanghuiqiang wanghuiqiang@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; Jonathan Cameron jonathan.cameron@huawei.com; guodong.xu@linaro.org Subject: Re: [PATCH v3] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2
I did a rudimentary benchmark on the same 8-node Sun Fire X4600-M2, on top of todays 5.11.0-rc7-00002-ge0756cfc7d7c.
The test: building clean kernel with make -j64 after make clean and drop_caches.
While running clean kernel / 3 tries):
real 2m38.574s user 46m18.387s sys 6m8.724s
real 2m37.647s user 46m34.171s sys 6m11.993s
real 2m37.832s user 46m34.910s sys 6m12.013s
While running patched kernel:
real 2m40.072s user 46m22.610s sys 6m6.658s
for real time, seems to be 1.5s-2s slower out of 160s (noise?) User and system time are slightly less, on the other hand, so seems good to me.
I ran the same test on the machine with the below topology: numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0-31 node 0 size: 64144 MB node 0 free: 62356 MB node 1 cpus: 32-63 node 1 size: 64509 MB node 1 free: 62996 MB node 2 cpus: 64-95 node 2 size: 64509 MB node 2 free: 63020 MB node 3 cpus: 96-127 node 3 size: 63991 MB node 3 free: 62647 MB node distances: node 0 1 2 3 0: 10 12 20 22 1: 12 10 22 24 2: 20 22 10 12 3: 22 24 12 10
Basically the influence to kernel build is noise by the commands I ran a couple of rounds:
make clean echo 3 > /proc/sys/vm/drop_caches make Image -j100
w/ patch: w/o patch:
real 1m17.644s real 1m19.510s user 32m12.074s user 32m14.133s sys 4m35.827s sys 4m38.198s
real 1m15.855s real 1m17.303s user 32m7.700s user 32m14.128s sys 4m35.868s sys 4m40.094s
real 1m18.918s real 1m19.583s user 32m13.352s user 32m13.205s sys 4m40.161s sys 4m40.696s
real 1m20.329s real 1m17.819s user 32m7.255s user 32m11.753s sys 4m36.706s sys 4m41.371s
real 1m17.773s real 1m16.763s user 32m19.912s user 32m15.607s sys 4m36.989s sys 4m41.297s
real 1m14.943s real 1m18.551s user 32m14.549s user 32m18.521s sys 4m38.670s sys 4m41.392s
real 1m16.439s real 1m18.154s user 32m12.864s user 32m14.540s sys 4m39.424s sys 4m40.364s
our team guys who used the 3-hops-fix patch to run unixbench reported some data of unixbench score as below(3 rounds):
w/o patch: w/ patch: 1228.6 1254.9 1231.4 1265.7 1226.1 1266.1
One interesting thing is that if we change the kernel to disallow the below BALANCING flags for the last hop, sd->flags &= ~(SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE);
We are seeing further increase of unixbench. So sounds like those balancing shouldn't go that far. But it is a different topic.
-- Meelis Roos mroos@linux.ee
Thanks Barry