On Tue, Feb 09, 2021 at 08:58:15PM +0000, Song Bao Hua (Barry Song) wrote:
I've finally had a moment to think about this, would it make sense to also break up group: node0+1, such that we then end up with 3 groups of equal size?
Since the sched_domain[n-1] of a part of node[m]'s siblings are able to cover the whole span of sched_domain[n] of node[m], there is no necessity to scan over all siblings of node[m], once sched_domain[n] of node[m] has been covered, we can stop making more sched_groups. So the number of sched_groups is small.
So historically, the code has never tried to make sched_groups result in equal size. And it permits the overlapping of local group and remote groups.
Histrorically groups have (typically) always been the same size though.
The reason I did ask is because when you get one large and a bunch of smaller groups, the load-balancing 'pull' is relatively smaller to the large groups.
That is, IIRC should_we_balance() ensures only 1 CPU out of the group continues the load-balancing pass. So if, for example, we have one group of 4 CPUs and one group of 2 CPUs, then the group of 2 CPUs will pull 1/2 times, while the group of 4 CPUs will pull 1/4 times.
By making sure all groups are of the same level, and thus of equal size, this doesn't happen.