From: Michael Ellerman mpe@ellerman.id.au
mainline inclusion from mainline-v5.19-rc2 commit 8e1278444446fc97778a5e5c99bca1ce0bbc5ec9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5C43D CVE: CVE-2022-32981
--------------------------------
The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process.
To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion.
The kernel then maps that index to a particular register in its own data structures and gets/sets the value.
The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels.
The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area.
Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness.
To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced.
Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array.
On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable.
It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@ke...
Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future.
Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled.
Fixes: 87fec0514f61 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas ariel.miculas@belden.com Tested-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au Conflicts: arch/powerpc/kernel/ptrace/ptrace-fpu.c arch/powerpc/kernel/ptrace/ptrace.c Signed-off-by: Yipeng Zou zouyipeng@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- arch/powerpc/kernel/ptrace.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index e08b32ccf1d9..d245f0af412a 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -2980,6 +2980,9 @@ long arch_ptrace(struct task_struct *child, long request, void __user *datavp = (void __user *) data; unsigned long __user *datalp = datavp;
+ // ptrace_get/put_fpr() rely on PPC32 and VSX being incompatible + BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC32) && IS_ENABLED(CONFIG_VSX)); + switch (request) { /* read the word at location addr in the USER area. */ case PTRACE_PEEKUSR: { @@ -3006,10 +3009,13 @@ long arch_ptrace(struct task_struct *child, long request, unsigned int fpidx = index - PT_FPR0;
flush_fp_to_thread(child); - if (fpidx < (PT_FPSCR - PT_FPR0)) - memcpy(&tmp, &child->thread.TS_FPR(fpidx), - sizeof(long)); - else + if (fpidx < (PT_FPSCR - PT_FPR0)) { + if (IS_ENABLED(CONFIG_PPC32)) + // On 32-bit the index we are passed refers to 32-bit words + tmp = ((u32 *)child->thread.fp_state.fpr)[fpidx]; + else + memcpy(&tmp, &child->thread.TS_FPR(fpidx), sizeof(long)); + } else tmp = child->thread.fp_state.fpscr; } ret = put_user(tmp, datalp); @@ -3039,10 +3045,13 @@ long arch_ptrace(struct task_struct *child, long request, unsigned int fpidx = index - PT_FPR0;
flush_fp_to_thread(child); - if (fpidx < (PT_FPSCR - PT_FPR0)) - memcpy(&child->thread.TS_FPR(fpidx), &data, - sizeof(long)); - else + if (fpidx < (PT_FPSCR - PT_FPR0)) { + if (IS_ENABLED(CONFIG_PPC32)) + // On 32-bit the index we are passed refers to 32-bit words + ((u32 *)child->thread.fp_state.fpr)[fpidx] = data; + else + memcpy(&child->thread.TS_FPR(fpidx), &data, sizeof(long)); + } else child->thread.fp_state.fpscr = data; ret = 0; }
From: Zhang Qiao zhangqiao22@huawei.com
hulk inclusion category: bugfix bugzilla: 186973, https://gitee.com/openeuler/kernel/issues/I5CA6K CVE: NA
--------------------------------
This reverts commit af98db5ff58f3657d68ac5f744de3c9ad69388ac. the patch af98db5ff58f("sched: Fix yet more sched_fork()") may be cause a process sleep at cgroup_post_fork()->freezer_fork() while taking group_threadgroup_rwsem lock long time, it cause a problem that other tasks will wait while fork child process and the system will stall.
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: Chen Hui judy.chenhui@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/sched/task.h | 2 +- kernel/fork.c | 12 +----------- kernel/sched/core.c | 6 +----- 3 files changed, 3 insertions(+), 17 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 1c2c099e393b..c8b52b3ec865 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -33,7 +33,7 @@ extern asmlinkage void schedule_tail(struct task_struct *prev); extern void init_idle(struct task_struct *idle, int cpu);
extern int sched_fork(unsigned long clone_flags, struct task_struct *p); -extern void sched_cgroup_fork(struct task_struct *p); +extern void sched_post_fork(struct task_struct *p); extern void sched_dead(struct task_struct *p);
void __noreturn do_task_dead(void); diff --git a/kernel/fork.c b/kernel/fork.c index 231b01eba6e1..88463fd56930 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2063,17 +2063,6 @@ static __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_free_futex_mutex;
- /* - * Now that the cgroups are pinned, re-clone the parent cgroup and put - * the new task on the correct runqueue. All this *before* the task - * becomes visible. - * - * This isn't part of ->can_fork() because while the re-cloning is - * cgroup specific, it unconditionally needs to place the task on a - * runqueue. - */ - sched_cgroup_fork(p); - /* * From this point on we must avoid any synchronous user-space * communication until we take the tasklist-lock. In particular, we do @@ -2182,6 +2171,7 @@ static __latent_entropy struct task_struct *copy_process(
proc_fork_connector(p); cgroup_post_fork(p); + sched_post_fork(p); cgroup_threadgroup_change_end(current); perf_event_fork(p);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b09153710259..496ce71f93a7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2357,9 +2357,8 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) return 0; }
-void sched_cgroup_fork(struct task_struct *p) +void sched_post_fork(struct task_struct *p) { - unsigned long flags;
/* @@ -2370,9 +2369,6 @@ void sched_cgroup_fork(struct task_struct *p) * Silence PROVE_RCU. */ raw_spin_lock_irqsave(&p->pi_lock, flags); -#ifdef CONFIG_CGROUP_SCHED - p->sched_task_group = task_group(current); - #endif rseq_migrate(p); /* * We're setting the CPU for the first time, we don't migrate,
From: Zhang Qiao zhangqiao22@huawei.com
hulk inclusion category: bugfix bugzilla: 186973, https://gitee.com/openeuler/kernel/issues/I5CA6K CVE: NA
--------------------------------
This reverts commit 74bd9b82dd7d3797be4b04f682dbdc7899ee3b23.
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: Chen Hui judy.chenhui@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/sched/task.h | 1 - kernel/fork.c | 1 - kernel/sched/core.c | 36 ++++++++++++++++-------------------- 3 files changed, 16 insertions(+), 22 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index c8b52b3ec865..8b02ee42348b 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -33,7 +33,6 @@ extern asmlinkage void schedule_tail(struct task_struct *prev); extern void init_idle(struct task_struct *idle, int cpu);
extern int sched_fork(unsigned long clone_flags, struct task_struct *p); -extern void sched_post_fork(struct task_struct *p); extern void sched_dead(struct task_struct *p);
void __noreturn do_task_dead(void); diff --git a/kernel/fork.c b/kernel/fork.c index 88463fd56930..7608869f4f1e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2171,7 +2171,6 @@ static __latent_entropy struct task_struct *copy_process(
proc_fork_connector(p); cgroup_post_fork(p); - sched_post_fork(p); cgroup_threadgroup_change_end(current); perf_event_fork(p);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 496ce71f93a7..36d7422da0ac 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2298,6 +2298,8 @@ static inline void init_schedstats(void) {} */ int sched_fork(unsigned long clone_flags, struct task_struct *p) { + unsigned long flags; + __sched_fork(clone_flags, p); /* * We mark the process as NEW here. This guarantees that @@ -2341,26 +2343,6 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
init_entity_runnable_average(&p->se);
-#ifdef CONFIG_SCHED_INFO - if (likely(sched_info_on())) - memset(&p->sched_info, 0, sizeof(p->sched_info)); -#endif -#if defined(CONFIG_SMP) - p->on_cpu = 0; -#endif - init_task_preempt_count(p); -#ifdef CONFIG_SMP - plist_node_init(&p->pushable_tasks, MAX_PRIO); - RB_CLEAR_NODE(&p->pushable_dl_tasks); -#endif - - return 0; -} - -void sched_post_fork(struct task_struct *p) -{ - unsigned long flags; - /* * The child is not yet in the pid-hash so no cgroup attach races, * and the cgroup is pinned to this child due to cgroup_fork() @@ -2378,6 +2360,20 @@ void sched_post_fork(struct task_struct *p) if (p->sched_class->task_fork) p->sched_class->task_fork(p); raw_spin_unlock_irqrestore(&p->pi_lock, flags); + +#ifdef CONFIG_SCHED_INFO + if (likely(sched_info_on())) + memset(&p->sched_info, 0, sizeof(p->sched_info)); +#endif +#if defined(CONFIG_SMP) + p->on_cpu = 0; +#endif + init_task_preempt_count(p); +#ifdef CONFIG_SMP + plist_node_init(&p->pushable_tasks, MAX_PRIO); + RB_CLEAR_NODE(&p->pushable_dl_tasks); +#endif + return 0; }
unsigned long to_ratio(u64 period, u64 runtime)