hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9COT9
-----------------------------
We have found significant differences in the latency of cpc_read() between regular scenarios and scenarios with high memory access pressure. Ignoring this error can result in getting rate interface occasionally returning absurd values.
Here provides a high memory access sample test by stress-ng. My local testing platform includes 160 CPUs, the CPC registers is accessed by mmio method, and the cpuidle feature is disabled (the AMU always works online):
~~~ ./stress-ng --memrate 160 --timeout 180 ~~~
The following data is sourced from ftrace statistics towards cppc_get_perf_ctrs():
Regular scenarios || High memory access pressure scenarios 104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() { 104) 0.800 us | cpc_read.isra.0(); || 133) 4.580 us | cpc_read.isra.0(); 104) 0.640 us | cpc_read.isra.0(); || 133) 7.780 us | cpc_read.isra.0(); 104) 0.450 us | cpc_read.isra.0(); || 133) 2.550 us | cpc_read.isra.0(); 104) 0.430 us | cpc_read.isra.0(); || 133) 0.570 us | cpc_read.isra.0(); 104) 4.610 us | } || 133) ! 157.610 us | } 104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() { 104) 0.720 us | cpc_read.isra.0(); || 133) 0.760 us | cpc_read.isra.0(); 104) 0.720 us | cpc_read.isra.0(); || 133) 4.480 us | cpc_read.isra.0(); 104) 0.510 us | cpc_read.isra.0(); || 133) 0.520 us | cpc_read.isra.0(); 104) 0.500 us | cpc_read.isra.0(); || 133) + 10.100 us | cpc_read.isra.0(); 104) 3.460 us | } || 133) ! 120.850 us | } 108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() { 108) 0.820 us | cpc_read.isra.0(); || 87) ! 255.200 us | cpc_read.isra.0(); 108) 0.850 us | cpc_read.isra.0(); || 87) 2.910 us | cpc_read.isra.0(); 108) 0.590 us | cpc_read.isra.0(); || 87) 5.160 us | cpc_read.isra.0(); 108) 0.610 us | cpc_read.isra.0(); || 87) 4.340 us | cpc_read.isra.0(); 108) 5.080 us | } || 87) ! 315.790 us | } 108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() { 108) 0.630 us | cpc_read.isra.0(); || 87) 0.800 us | cpc_read.isra.0(); 108) 0.630 us | cpc_read.isra.0(); || 87) 6.310 us | cpc_read.isra.0(); 108) 0.420 us | cpc_read.isra.0(); || 87) 1.190 us | cpc_read.isra.0(); 108) 0.430 us | cpc_read.isra.0(); || 87) + 11.620 us | cpc_read.isra.0(); 108) 3.780 us | } || 87) ! 207.010 us | }
My local testing platform works under 3000000hz, but the cpuinfo_cur_freq interface returns values that are not even close to the actual frequency:
[root@localhost ~]# cd /sys/devices/system/cpu [root@localhost cpu]# for i in {0..159}; do cat cpu$i/cpufreq/cpuinfo_cur_freq; done 5127812 2952127 3069001 3496183 922989768 2419194 3427042 2331869 3594611 8238499 ...
The reason is when under heavy memory access pressure, the execution of cpc_read() delay has increased from sub-microsecond to several hundred microseconds. Moving the cpc_read function into a critical section by irq disable/enable has minimal impact on the result.
cppc_get_perf_ctrs()[0] cppc_get_perf_ctrs()[1] / \ / \ cpc_read cpc_read cpc_read cpc_read ref[0] delivered[0] ref[1] delivered[1] | | | | v v v v -----------------------------------------------------------------------> time <--delta[0]--> <------sample_period------> <-----delta[1]----->
Since that, freq = ref_freq * (delivered[1] - delivered[0]) / (ref[1] - ref[0]) and delivered[1] - delivered[0] = freq * (delta[1] + sample_period), ref[1] - ref[0] = ref_freq * (delta[0] + sample_period)
To eliminate the impact of system memory access latency, setting a sampling period of 2us is far from sufficient. Consequently, we suggest cppc_cpufreq_get_rate() only can be called in the process context, and adopt a longer sampling period to neutralize the impact of random latency.
Here we call the cond_resched() function instead of sleep-like functions to ensure that `taskset -c $i cat cpu$i/cpufreq/cpuinfo_cur_freq` could work when cpuidle feature is enabled.
Reported-by: Yang Shi yang@os.amperecomputing.com Link: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.co... Signed-off-by: Zeng Heng zengheng4@huawei.com --- drivers/cpufreq/cppc_cpufreq.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3ab0eb8e2424..6f06e9a80816 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -357,12 +357,26 @@ static int cppc_get_perf_ctrs_sample(void *val) struct fb_ctr_pair *fb_ctrs = val; int cpu = fb_ctrs->cpu; int ret; + unsigned long timeout;
ret = cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t0); if (ret) return ret;
- udelay(2); /* 2usec delay between sampling */ + if (likely(!irqs_disabled())) { + /* + * Set 1ms as sampling interval, but never schedule + * to the idle task to prevent the AMU counters from + * stopping working. + */ + timeout = jiffies + msecs_to_jiffies(1); + while (!time_after(jiffies, timeout)) + cond_resched(); + + } else { + pr_warn_once("CPU%d: Get rate in atomic context", cpu); + udelay(2); /* 2usec delay between sampling */ + }
return cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t1); }