-----Original Message----- From: Tim Chen [mailto:tim.c.chen@linux.intel.com] Sent: Thursday, March 4, 2021 7:34 AM To: Peter Zijlstra peterz@infradead.org; Song Bao Hua (Barry Song) song.bao.hua@hisilicon.com Cc: catalin.marinas@arm.com; will@kernel.org; rjw@rjwysocki.net; vincent.guittot@linaro.org; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com; lenb@kernel.org; dietmar.eggemann@arm.com; rostedt@goodmis.org; bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com; valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron jonathan.cameron@huawei.com; juri.lelli@redhat.com; mark.rutland@arm.com; sudeep.holla@arm.com; aubrey.li@linux.intel.com; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) xuwei5@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; guodong.xu@linaro.org; yangyicong yangyicong@huawei.com; Liguozhu (Kenneth) liguozhu@hisilicon.com; linuxarm@openeuler.org; hpa@zytor.com Subject: [Linuxarm] Re: [RFC PATCH v4 3/3] scheduler: Add cluster scheduler level for x86
On 3/2/21 2:30 AM, Peter Zijlstra wrote:
On Tue, Mar 02, 2021 at 11:59:40AM +1300, Barry Song wrote:
From: Tim Chen tim.c.chen@linux.intel.com
There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is shared among a cluster of cores instead of being exclusive to one single core.
Isn't that most atoms one way or another? Tremont seems to have it per 4 cores, but earlier it was per 2 cores.
Yes, older Atoms have 2 cores sharing L2. I probably should rephrase my comments to not leave the impression that sharing L2 among cores is new for Atoms.
Tremont based Atom CPUs increases the possible load imbalance more with 4 cores per L2 instead of 2. And also with more overall cores on a die, the chance increases for packing running tasks on a few clusters while leaving others empty on light/medium loaded systems. We did see this effect on Jacobsville.
So load balancing between the L2 clusters is more useful on Tremont based Atom CPUs compared to the older Atoms.
It seems sensible the more CPU we get in the cluster, the more we need the kernel to be aware of its existence.
Tim, it is possible for you to bring up the cpu_cluster_mask and cluster_sibling for x86 so that the topology can be represented in sysfs and be used by scheduler? It seems your patch lacks this part.
BTW, I wonder if x86 can do some improvement on your KMP_AFFINITY by leveraging the cluster topology level. https://software.intel.com/content/www/us/en/develop/documentation/cpp-compi...
KMP_AFFINITY has thread affinity modes like compact and scatter, it seems this "compact" and "scatter" can also use the cluster information as you see we are also struggling with the "compact" and "scatter" issues here in this patchset :-)
Thanks Barry