From: Alexandra Winter <wintera(a)linux.ibm.com>
stable inclusion
from stable-v5.10.224
commit c65f72eec60a34ace031426e04e9aff8e5f04895
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAKPRZ
CVE: CVE-2024-42271
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit f558120cd709682b739207b48cf7479fd9568431 ]
iucv_sever_path() is called from process context and from bh context.
iucv->path is used as indicator whether somebody else is taking care of
severing the path (or it is already removed / never existed).
This needs to be done with atomic compare and swap, otherwise there is a
small window where iucv_sock_close() will try to work with a path that has
already been severed and freed by iucv_callback_connrej() called by
iucv_tasklet_fn().
Example:
[452744.123844] Call Trace:
[452744.123845] ([<0000001e87f03880>] 0x1e87f03880)
[452744.123966] [<00000000d593001e>] iucv_path_sever+0x96/0x138
[452744.124330] [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv]
[452744.124336] [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv]
[452744.124341] [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv]
[452744.124345] [<00000000d574794e>] __sock_release+0x5e/0xe8
[452744.124815] [<00000000d5747a0c>] sock_close+0x34/0x48
[452744.124820] [<00000000d5421642>] __fput+0xba/0x268
[452744.124826] [<00000000d51b382c>] task_work_run+0xbc/0xf0
[452744.124832] [<00000000d5145710>] do_notify_resume+0x88/0x90
[452744.124841] [<00000000d5978096>] system_call+0xe2/0x2c8
[452744.125319] Last Breaking-Event-Address:
[452744.125321] [<00000000d5930018>] iucv_path_sever+0x90/0x138
[452744.125324]
[452744.125325] Kernel panic - not syncing: Fatal exception in interrupt
Note that bh_lock_sock() is not serializing the tasklet context against
process context, because the check for sock_owned_by_user() and
corresponding handling is missing.
Ideas for a future clean-up patch:
A) Correct usage of bh_lock_sock() in tasklet context, as described in
Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/
Re-enqueue, if needed. This may require adding return values to the
tasklet functions and thus changes to all users of iucv.
B) Change iucv tasklet into worker and use only lock_sock() in af_iucv.
Fixes: 7d316b945352 ("af_iucv: remove IUCV-pathes completely")
Reviewed-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera(a)linux.ibm.com>
Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
---
net/iucv/af_iucv.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 7c73faa5336c..3d0424e4ae6c 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -359,8 +359,8 @@ static void iucv_sever_path(struct sock *sk, int with_user_data)
struct iucv_sock *iucv = iucv_sk(sk);
struct iucv_path *path = iucv->path;
- if (iucv->path) {
- iucv->path = NULL;
+ /* Whoever resets the path pointer, must sever and free it. */
+ if (xchg(&iucv->path, NULL)) {
if (with_user_data) {
low_nmcpy(user_data, iucv->src_name);
high_nmcpy(user_data, iucv->dst_name);
--
2.25.1
hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9COT9
Reference: https://gitee.com/openeuler/kernel/commit/7404f3db35e32493b64696a2ba1457226…
----------------------------------------
We have found significant differences in the latency of cpc_read() between
regular scenarios and scenarios with high memory access pressure. Ignoring
this error can result in getting rate interface occasionally returning
absurd values.
Here provides a high memory access sample test by stress-ng. My local
testing platform includes 160 CPUs, the CPC registers is accessed by mmio
method, and the cpuidle feature is disabled (the AMU always works online):
~~~
./stress-ng --memrate 160 --timeout 180
~~~
The following data is sourced from ftrace statistics towards
cppc_get_perf_ctrs():
Regular scenarios || High memory access pressure scenarios
104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() {
104) 0.800 us | cpc_read.isra.0(); || 133) 4.580 us | cpc_read.isra.0();
104) 0.640 us | cpc_read.isra.0(); || 133) 7.780 us | cpc_read.isra.0();
104) 0.450 us | cpc_read.isra.0(); || 133) 2.550 us | cpc_read.isra.0();
104) 0.430 us | cpc_read.isra.0(); || 133) 0.570 us | cpc_read.isra.0();
104) 4.610 us | } || 133) ! 157.610 us | }
104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() {
104) 0.720 us | cpc_read.isra.0(); || 133) 0.760 us | cpc_read.isra.0();
104) 0.720 us | cpc_read.isra.0(); || 133) 4.480 us | cpc_read.isra.0();
104) 0.510 us | cpc_read.isra.0(); || 133) 0.520 us | cpc_read.isra.0();
104) 0.500 us | cpc_read.isra.0(); || 133) + 10.100 us | cpc_read.isra.0();
104) 3.460 us | } || 133) ! 120.850 us | }
108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() {
108) 0.820 us | cpc_read.isra.0(); || 87) ! 255.200 us | cpc_read.isra.0();
108) 0.850 us | cpc_read.isra.0(); || 87) 2.910 us | cpc_read.isra.0();
108) 0.590 us | cpc_read.isra.0(); || 87) 5.160 us | cpc_read.isra.0();
108) 0.610 us | cpc_read.isra.0(); || 87) 4.340 us | cpc_read.isra.0();
108) 5.080 us | } || 87) ! 315.790 us | }
108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() {
108) 0.630 us | cpc_read.isra.0(); || 87) 0.800 us | cpc_read.isra.0();
108) 0.630 us | cpc_read.isra.0(); || 87) 6.310 us | cpc_read.isra.0();
108) 0.420 us | cpc_read.isra.0(); || 87) 1.190 us | cpc_read.isra.0();
108) 0.430 us | cpc_read.isra.0(); || 87) + 11.620 us | cpc_read.isra.0();
108) 3.780 us | } || 87) ! 207.010 us | }
My local testing platform works under 3000000hz, but the cpuinfo_cur_freq
interface returns values that are not even close to the actual frequency:
[root@localhost ~]# cd /sys/devices/system/cpu
[root@localhost cpu]# for i in {0..159}; do cat cpu$i/cpufreq/cpuinfo_cur_freq; done
5127812
2952127
3069001
3496183
922989768
2419194
3427042
2331869
3594611
8238499
...
The reason is when under heavy memory access pressure, the execution of
cpc_read() delay has increased from sub-microsecond to several hundred
microseconds. Moving the cpc_read function into a critical section by irq
disable/enable has minimal impact on the result.
cppc_get_perf_ctrs()[0] cppc_get_perf_ctrs()[1]
/ \ / \
cpc_read cpc_read cpc_read cpc_read
ref[0] delivered[0] ref[1] delivered[1]
| | | |
v v v v
-----------------------------------------------------------------------> time
<--delta[0]--> <------sample_period------> <-----delta[1]----->
Since that,
freq = ref_freq * (delivered[1] - delivered[0]) / (ref[1] - ref[0])
and
delivered[1] - delivered[0] = freq * (delta[1] + sample_period),
ref[1] - ref[0] = ref_freq * (delta[0] + sample_period)
To eliminate the impact of system memory access latency, setting a
sampling period of 2us is far from sufficient. Consequently, we suggest
cppc_cpufreq_get_rate() only can be called in the process context, and
adopt a longer sampling period to neutralize the impact of random latency.
Here we call the cond_resched() function instead of sleep-like functions
to ensure that AMU remains in work state but not cstate during sample
period.
Fixes: 33477d84c26b ("cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC")
Reported-by: Yang Shi <yang(a)os.amperecomputing.com>
Link: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.c…
Signed-off-by: Zeng Heng <zengheng4(a)huawei.com>
---
drivers/cpufreq/cppc_cpufreq.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 321a9dc9484d..66e8c850e6f2 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -851,12 +851,25 @@ static int cppc_get_perf_ctrs_pair(void *val)
struct fb_ctr_pair *fb_ctrs = val;
int cpu = fb_ctrs->cpu;
int ret;
+ unsigned long timeout;
ret = cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t0);
if (ret)
return ret;
- udelay(2); /* 2usec delay between sampling */
+ if (likely(!in_atomic() && !irqs_disabled())) {
+ /*
+ * Set 1ms as sampling interval, but never schedule
+ * to the idle task to prevent the AMU counters from
+ * stopping working.
+ */
+ timeout = jiffies + msecs_to_jiffies(1);
+ while (!time_after(jiffies, timeout))
+ cond_resched();
+ } else {
+ pr_warn_once("CPU%d: Get rate in atomic context", cpu);
+ udelay(2); /* 2usec delay between sampling */
+ }
return cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t1);
}
--
2.25.1
openEuler inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9COT9
Reference: https://gitee.com/openeuler/kernel/commit/7404f3db35e32493b64696a2ba1457226…
----------------------------------------
We have found significant differences in the latency of cpc_read() between
regular scenarios and scenarios with high memory access pressure. Ignoring
this error can result in getting rate interface occasionally returning
absurd values.
Here provides a high memory access sample test by stress-ng. My local
testing platform includes 160 CPUs, the CPC registers is accessed by mmio
method, and the cpuidle feature is disabled (the AMU always works online):
~~~
./stress-ng --memrate 160 --timeout 180
~~~
The following data is sourced from ftrace statistics towards
cppc_get_perf_ctrs():
Regular scenarios || High memory access pressure scenarios
104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() {
104) 0.800 us | cpc_read.isra.0(); || 133) 4.580 us | cpc_read.isra.0();
104) 0.640 us | cpc_read.isra.0(); || 133) 7.780 us | cpc_read.isra.0();
104) 0.450 us | cpc_read.isra.0(); || 133) 2.550 us | cpc_read.isra.0();
104) 0.430 us | cpc_read.isra.0(); || 133) 0.570 us | cpc_read.isra.0();
104) 4.610 us | } || 133) ! 157.610 us | }
104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() {
104) 0.720 us | cpc_read.isra.0(); || 133) 0.760 us | cpc_read.isra.0();
104) 0.720 us | cpc_read.isra.0(); || 133) 4.480 us | cpc_read.isra.0();
104) 0.510 us | cpc_read.isra.0(); || 133) 0.520 us | cpc_read.isra.0();
104) 0.500 us | cpc_read.isra.0(); || 133) + 10.100 us | cpc_read.isra.0();
104) 3.460 us | } || 133) ! 120.850 us | }
108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() {
108) 0.820 us | cpc_read.isra.0(); || 87) ! 255.200 us | cpc_read.isra.0();
108) 0.850 us | cpc_read.isra.0(); || 87) 2.910 us | cpc_read.isra.0();
108) 0.590 us | cpc_read.isra.0(); || 87) 5.160 us | cpc_read.isra.0();
108) 0.610 us | cpc_read.isra.0(); || 87) 4.340 us | cpc_read.isra.0();
108) 5.080 us | } || 87) ! 315.790 us | }
108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() {
108) 0.630 us | cpc_read.isra.0(); || 87) 0.800 us | cpc_read.isra.0();
108) 0.630 us | cpc_read.isra.0(); || 87) 6.310 us | cpc_read.isra.0();
108) 0.420 us | cpc_read.isra.0(); || 87) 1.190 us | cpc_read.isra.0();
108) 0.430 us | cpc_read.isra.0(); || 87) + 11.620 us | cpc_read.isra.0();
108) 3.780 us | } || 87) ! 207.010 us | }
My local testing platform works under 3000000hz, but the cpuinfo_cur_freq
interface returns values that are not even close to the actual frequency:
[root@localhost ~]# cd /sys/devices/system/cpu
[root@localhost cpu]# for i in {0..159}; do cat cpu$i/cpufreq/cpuinfo_cur_freq; done
5127812
2952127
3069001
3496183
922989768
2419194
3427042
2331869
3594611
8238499
...
The reason is when under heavy memory access pressure, the execution of
cpc_read() delay has increased from sub-microsecond to several hundred
microseconds. Moving the cpc_read function into a critical section by irq
disable/enable has minimal impact on the result.
cppc_get_perf_ctrs()[0] cppc_get_perf_ctrs()[1]
/ \ / \
cpc_read cpc_read cpc_read cpc_read
ref[0] delivered[0] ref[1] delivered[1]
| | | |
v v v v
-----------------------------------------------------------------------> time
<--delta[0]--> <------sample_period------> <-----delta[1]----->
Since that,
freq = ref_freq * (delivered[1] - delivered[0]) / (ref[1] - ref[0])
and
delivered[1] - delivered[0] = freq * (delta[1] + sample_period),
ref[1] - ref[0] = ref_freq * (delta[0] + sample_period)
To eliminate the impact of system memory access latency, setting a
sampling period of 2us is far from sufficient. Consequently, we suggest
cppc_cpufreq_get_rate() only can be called in the process context, and
adopt a longer sampling period to neutralize the impact of random latency.
Here we call the cond_resched() function instead of sleep-like functions
to ensure that AMU remains in work state but not cstate during sample
period.
Fixes: 33477d84c26b ("cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC")
Reported-by: Yang Shi <yang(a)os.amperecomputing.com>
Link: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.c…
Signed-off-by: Zeng Heng <zengheng4(a)huawei.com>
---
drivers/cpufreq/cppc_cpufreq.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 321a9dc9484d..66e8c850e6f2 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -851,12 +851,25 @@ static int cppc_get_perf_ctrs_pair(void *val)
struct fb_ctr_pair *fb_ctrs = val;
int cpu = fb_ctrs->cpu;
int ret;
+ unsigned long timeout;
ret = cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t0);
if (ret)
return ret;
- udelay(2); /* 2usec delay between sampling */
+ if (likely(!in_atomic() && !irqs_disabled())) {
+ /*
+ * Set 1ms as sampling interval, but never schedule
+ * to the idle task to prevent the AMU counters from
+ * stopping working.
+ */
+ timeout = jiffies + msecs_to_jiffies(1);
+ while (!time_after(jiffies, timeout))
+ cond_resched();
+ } else {
+ pr_warn_once("CPU%d: Get rate in atomic context", cpu);
+ udelay(2); /* 2usec delay between sampling */
+ }
return cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t1);
}
--
2.25.1