hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA63KX?from=project-issue
--------------------------------
When running the context1 test case in the unixbench test suite, the tool reports the presence of flase sharing in task_group:
Cache Line PA PC Function Count Percent 2099e66900 56345 24.2% 2099e66918 ffd584306f7690 calc_group_shares+0x10 2099e66940 ffd584306f7730 calc_group_shares+0xb0 2099e66910 ffd584306e94b0 set_task_cpu+0xc8 2099e66950 ffd584306e94f0 set_task_cpu+0x108 2099e66908 ffd584306e94d8 set_task_cpu+0xf0
After analysis, it was found that some members of the struct task_group share LLC cacheline, such as members se and load_avg. load_avg is modified frequently, resulting in frequent updates of the LLC cacheline, affecting the read efficiency of other members.
Here provide the UnixBench test result as below:
1) Run the command of "perf stat -d -d -d -- ./Run -c `nproc` -i 10 context1" 10 times and take the average.
Performance counter stats for './Run -c 128 -i 10 context1': Without Patch With Patch Diff TopDownL1 # retiring 0.08 0.18 125% # backend_bound 0.67 0.39 -41% L1-dcache-loads # 257.714 M/s 594.973 M/s 130% L1-dcache-load-misses # 0.86% 0.38% 55%
System Benchmarks Partial Index INDEX INDEX System Benchmarks Index Score (Partial Only) 10141.8 28338.5 179%
2) Run the command of "./Run -c `nproc` -i 10" 10 times and take the average. Without Patch With Patch Diff Dhrystone 2 using register variables 477300.9 477476.9 0.03% Double-Precision Whetstone 102026.2 102025.3 -0.08% Execl Throughput 6457.9 6369.6 -1.37% File Copy 1024 bufsize 2000 maxblocks 1260.4 1274.7 1.13% File Copy 256 bufsize 500 maxblocks 827.9 811.5 -1.98% File Copy 4096 bufsize 8000 maxblocks 2669.8 2427.0 -9.09% Pipe Throughput 77295.8 68384.2 -11.5% Pipe-based Context Switching 6733.1 21636.8 221% Process Creation 5466.4 5495.0 0.05% Shell Scripts (1 concurrent) 25669.0 25649.9 -0.07% Shell Scripts (8 concurrent) 24760.7 24544.6 -0.87% System Call Overhead 1735.4 1747.2 0.68% ======== ======== System Benchmarks Index Score 10879.3 11755.8 8.06%
Signed-off-by: Li Zetao lizetao1@huawei.com --- kernel/sched/sched.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9de2bac6444f0..1d19f4a9fd1d7 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -419,6 +419,11 @@ struct task_group { int idle;
#ifdef CONFIG_SMP +#ifdef CONFIG_ARCH_LLC_128_LINE_SIZE + /* load_avg is modified frequently, put it in a separate LLC cacheline. */ + CACHELINE_PADDING(_pad1_); + u8 padding[1]; +#endif /* * load_avg can be heavily contended at clock tick time, so put * it in its own cacheline separated from the fields above which