[Linuxarm] Re: [RFC PATCH v4 3/3] scheduler: Add cluster scheduler level for x86

9 Mar 2021

...
-----Original Message-----
From: Tim Chen [mailto:tim.c.chen@linux.intel.com]
Sent: Thursday, March 4, 2021 7:34 AM
To: Peter Zijlstra <peterz@infradead.org>; Song Bao Hua (Barry Song)
<song.bao.hua@hisilicon.com>
Cc: catalin.marinas@arm.com; will@kernel.org; rjw@rjwysocki.net;
vincent.guittot@linaro.org; bp@alien8.de; tglx@linutronix.de;
mingo@redhat.com; lenb@kernel.org; dietmar.eggemann@arm.com;
rostedt@goodmis.org; bsegall@google.com; mgorman@suse.de;
msys.mizuma@gmail.com; valentin.schneider@arm.com;
gregkh@linuxfoundation.org; Jonathan Cameron <jonathan.cameron@huawei.com>;
juri.lelli@redhat.com; mark.rutland@arm.com; sudeep.holla@arm.com;
aubrey.li@linux.intel.com; linux-arm-kernel@lists.infradead.org;
linux-kernel@vger.kernel.org; linux-acpi@vger.kernel.org; x86@kernel.org;
xuwei (O) <xuwei5@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
guodong.xu@linaro.org; yangyicong <yangyicong@huawei.com>; Liguozhu (Kenneth)
<liguozhu@hisilicon.com>; linuxarm@openeuler.org; hpa@zytor.com
Subject: [Linuxarm] Re: [RFC PATCH v4 3/3] scheduler: Add cluster scheduler
level for x86
On 3/2/21 2:30 AM, Peter Zijlstra wrote:
...
On Tue, Mar 02, 2021 at 11:59:40AM +1300, Barry Song wrote:
...
From: Tim Chen <tim.c.chen@linux.intel.com>
There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
is shared among a cluster of cores instead of being exclusive
to one single core.
Isn't that most atoms one way or another? Tremont seems to have it per 4
cores, but earlier it was per 2 cores.
Yes, older Atoms have 2 cores sharing L2.  I probably should
rephrase my comments to not leave the impression that sharing
L2 among cores is new for Atoms.
Tremont based Atom CPUs increases the possible load imbalance more
with 4 cores per L2 instead of 2.  And also with more overall cores on a die,
the
chance increases for packing running tasks on a few clusters while leaving
others empty on light/medium loaded systems.  We did see
this effect on Jacobsville.
So load balancing between the L2 clusters is more
useful on Tremont based Atom CPUs compared to the older Atoms.
It seems sensible the more CPU we get in the cluster, the more
we need the kernel to be aware of its existence.

Tim, it is possible for you to bring up the cpu_cluster_mask and
cluster_sibling for x86 so that the topology can be represented
in sysfs and be used by scheduler? It seems your patch lacks this
part.

BTW, I wonder if x86 can do some improvement on your KMP_AFFINITY
by leveraging the cluster topology level.
https://software.intel.com/content/www/us/en/develop/documentation/cpp-compi...

KMP_AFFINITY has thread affinity modes like compact and scatter,
it seems this "compact" and "scatter" can also use the cluster
information as you see we are also struggling with the "compact"
and "scatter" issues here in this patchset :-)

Thanks
Barry