hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I8MV01
--------------------------------
In the prologue of osq_lock(), it set `cpu` member of percpu struct optimistic_spin_node with the local cpu id, after that the value is never changed any more. Consequently, we can regard the `cpu` member as a constant variable.
In the meanwhile, other members like next, prev and locked are frequently modified by osq_lock() and osq_unlock() which come from rwsem, mutex and so on.
So here we can place padding and split them into different cache lines, so that avoid cache misses when a cpu is spinning to check previous node's cpu field by vcpu_is_preempted() in the following.
Here provide the UnixBench full-core test result as below: Machine Intel(R) Xeon(R) Gold 6248 CPU, 40 cores, 80 threads Run the command of "./Run -c 80 -i 3" 10 times and take the average.
System Benchmarks Index Values Without Patch With Patch Diff Dhrystone 2 using register variables 185876.43 185945.41 0.04% Double-Precision Whetstone 79637.27 79659.29 0.03% Execl Throughput 9909.61 10576.06 6.73% File Copy 1024 bufsize 2000 maxblocks 1723.01 2086.08 21.07% File Copy 256 bufsize 500 maxblocks 1150.24 1338.21 16.34% File Copy 4096 bufsize 8000 maxblocks 3719.19 4011.99 7.87% Pipe Throughput 66184.84 66025.25 -0.24% Pipe-based Context Switching 30606.18 31074.21 1.53% Process Creation 9442.48 9450.77 0.09% Shell Scripts (1 concurrent) 44526.52 46548.54 4.54% Shell Scripts (8 concurrent) 42903.96 45718.56 6.56% System Call Overhead 3645.20 3717.42 1.98% ======== System Benchmarks Index Score 15126.87 15931.29 5.32%
Signed-off-by: Zeng Heng zengheng4@huawei.com --- include/linux/osq_lock.h | 2 +- kernel/locking/osq_lock.c | 8 +++++++- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h index 5581dbd3bd34..deb90ad5f560 100644 --- a/include/linux/osq_lock.h +++ b/include/linux/osq_lock.h @@ -9,7 +9,7 @@ struct optimistic_spin_node { struct optimistic_spin_node *next, *prev; int locked; /* 1 if lock acquired */ - int cpu; /* encoded CPU # + 1 value */ + int cpu ____cacheline_aligned; /* encoded CPU # + 1 value */ };
struct optimistic_spin_queue { diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 1de006ed3aa8..1e1a79580e96 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -96,7 +96,13 @@ bool osq_lock(struct optimistic_spin_queue *lock)
node->locked = 0; node->next = NULL; - node->cpu = curr; + /* + * After this cpu member is initialized for the first time, it + * would no longer change in fact. That could avoid cache misses + * when spin and access the cpu member by other CPUs. + */ + if (!node->cpu) + node->cpu = curr;
/* * We need both ACQUIRE (pairs with corresponding RELEASE in