hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9MSNE?from=project-issue CVE: NA
--------------------------------
Considering that the high-frequency function cpu_util without is only called when waking up or creating for the first time, in this scenario, the performance can be optimized by simplifying the function.
Here are the detailed test results of unixbench.
Command: ./Run -c 1 -i 3
Without Patch ------------------------------------------------------------------------ System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 41898849.1 3590.3 Double-Precision Whetstone 55.0 4426.3 804.8 Execl Throughput 43.0 2828.8 657.9 File Copy 1024 bufsize 2000 maxblocks 3960.0 837180.0 2114.1 File Copy 256 bufsize 500 maxblocks 1655.0 256669.0 1550.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 2264169.0 3903.7 Pipe Throughput 12440.0 1101364.7 885.3 Pipe-based Context Switching 4000.0 136573.4 341.4 Process Creation 126.0 6031.7 478.7 Shell Scripts (1 concurrent) 42.4 5875.9 1385.8 Shell Scripts (8 concurrent) 6.0 2567.1 4278.5 System Call Overhead 15000.0 1065481.3 710.3 ======== System Benchmarks Index Score 1252.0
With Patch ------------------------------------------------------------------------ System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 41832459.9 3584.6 Double-Precision Whetstone 55.0 4426.6 804.8 Execl Throughput 43.0 2675.8 622.3 File Copy 1024 bufsize 2000 maxblocks 3960.0 919862.0 2322.9 File Copy 256 bufsize 500 maxblocks 1655.0 274966.0 1661.4 File Copy 4096 bufsize 8000 maxblocks 5800.0 2350539.0 4052.7 Pipe Throughput 12440.0 1182284.3 950.4 Pipe-based Context Switching 4000.0 155034.4 387.6 Process Creation 126.0 6371.9 505.7 Shell Scripts (1 concurrent) 42.4 5797.9 1367.4 Shell Scripts (8 concurrent) 6.0 2576.7 4294.4 System Call Overhead 15000.0 1128173.1 752.1 ======== System Benchmarks Index Score 1299.1
After lmbench test, we can get 0% ~ 6% performance improvement for lmbench fork_proc/exec_proc/shell_proc.
The test results are as follows: base base+this patch
fork_proc 457ms 427ms (6.5%) exec_proc 2008ms 1991ms (0.8%) shell_proc 3062ms 2985ms (0.2%)
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Signed-off-by: Li Zetao lizetao1@huawei.com --- kernel/sched/fair.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7b0cb2f090da..010dbf2047e5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8530,13 +8530,23 @@ unsigned long cpu_util_cfs_boost(int cpu) * utilization of the specified task, whenever the task is currently * contributing to the CPU utilization. */ -static unsigned long cpu_util_without(int cpu, struct task_struct *p) +static inline unsigned long cpu_util_without(int cpu, struct task_struct *p) { - /* Task has no contribution or is new */ - if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time)) - p = NULL; + struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs; + unsigned long util = READ_ONCE(cfs_rq->avg.util_avg); + /* + * If @dst_cpu is -1 or @p migrates from @cpu to @dst_cpu remove its + * contribution. If @p migrates from another CPU to @cpu add its + * contribution. In all the other cases @cpu is not impacted by the + * migration so its util_avg is already correct. + */ + if (sched_feat(UTIL_EST)) { + unsigned long util_est; + util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); + util = max(util, util_est); + }
- return cpu_util(cpu, p, -1, 0); + return min(util, capacity_orig_of(cpu)); }
/*