[PATCH OLK-6.6 0/3] CVE-2024-50140

CVE-2024-50140 Liu Kai (1): task_work: Fix kabi breakage in enum task_work_notify_mode Sebastian Andrzej Siewior (1): task_work: Add TWA_NMI_CURRENT as an additional notify mode. Waiman Long (1): sched/core: Disable page allocation in task_tick_mm_cid() include/linux/task_work.h | 7 ++++++- kernel/sched/core.c | 4 +++- kernel/task_work.c | 35 ++++++++++++++++++++++++++++++++--- 3 files changed, 41 insertions(+), 5 deletions(-) -- 2.34.1

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> stable inclusion from stable-v6.6.59 commit 380681a29066c1f5402053f62862931f1cf6b305 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IB2STY CVE: CVE-2024-50140 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... -------------------------------- [ Upstream commit 466e4d801cd438a1ab2c8a2cce1bef6b65c31bbb ] Adding task_work from NMI context requires the following: - The kasan_record_aux_stack() is not NMU safe and must be avoided. - Using TWA_RESUME is NMI safe. If the NMI occurs while the CPU is in userland then it will continue in userland and not invoke the `work' callback. Add TWA_NMI_CURRENT as an additional notify mode. In this mode skip kasan and use irq_work in hardirq-mode to for needed interrupt. Set TIF_NOTIFY_RESUME within the irq_work callback due to k[ac]san instrumentation in test_and_set_bit() which does not look NMI safe in case of a report. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240704170424.1466941-3-bigeasy@linutronix.de Stable-dep-of: 73ab05aa46b0 ("sched/core: Disable page allocation in task_tick_mm_cid()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Liu Kai <liukai284@huawei.com> --- include/linux/task_work.h | 1 + kernel/task_work.c | 24 +++++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 26b8a47f41fca..cf5e7e891a776 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -18,6 +18,7 @@ enum task_work_notify_mode { TWA_RESUME, TWA_SIGNAL, TWA_SIGNAL_NO_IPI, + TWA_NMI_CURRENT, }; static inline bool task_work_pending(struct task_struct *task) diff --git a/kernel/task_work.c b/kernel/task_work.c index 2134ac8057a94..5c2daa7ad3f90 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -1,10 +1,18 @@ // SPDX-License-Identifier: GPL-2.0 +#include <linux/irq_work.h> #include <linux/spinlock.h> #include <linux/task_work.h> #include <linux/resume_user_mode.h> static struct callback_head work_exited; /* all we need is ->next == NULL */ +static void task_work_set_notify_irq(struct irq_work *entry) +{ + test_and_set_tsk_thread_flag(current, TIF_NOTIFY_RESUME); +} +static DEFINE_PER_CPU(struct irq_work, irq_work_NMI_resume) = + IRQ_WORK_INIT_HARD(task_work_set_notify_irq); + /** * task_work_add - ask the @task to execute @work->func() * @task: the task which should run the callback @@ -12,7 +20,7 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */ * @notify: how to notify the targeted task * * Queue @work for task_work_run() below and notify the @task if @notify - * is @TWA_RESUME, @TWA_SIGNAL, or @TWA_SIGNAL_NO_IPI. + * is @TWA_RESUME, @TWA_SIGNAL, @TWA_SIGNAL_NO_IPI or @TWA_NMI_CURRENT. * * @TWA_SIGNAL works like signals, in that the it will interrupt the targeted * task and run the task_work, regardless of whether the task is currently @@ -24,6 +32,8 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */ * kernel anyway. * @TWA_RESUME work is run only when the task exits the kernel and returns to * user mode, or before entering guest mode. + * @TWA_NMI_CURRENT works like @TWA_RESUME, except it can only be used for the + * current @task and if the current context is NMI. * * Fails if the @task is exiting/exited and thus it can't process this @work. * Otherwise @work->func() will be called when the @task goes through one of @@ -44,8 +54,13 @@ int task_work_add(struct task_struct *task, struct callback_head *work, { struct callback_head *head; - /* record the work call stack in order to print it in KASAN reports */ - kasan_record_aux_stack(work); + if (notify == TWA_NMI_CURRENT) { + if (WARN_ON_ONCE(task != current)) + return -EINVAL; + } else { + /* record the work call stack in order to print it in KASAN reports */ + kasan_record_aux_stack(work); + } head = READ_ONCE(task->task_works); do { @@ -66,6 +81,9 @@ int task_work_add(struct task_struct *task, struct callback_head *work, case TWA_SIGNAL_NO_IPI: __set_notify_signal(task); break; + case TWA_NMI_CURRENT: + irq_work_queue(this_cpu_ptr(&irq_work_NMI_resume)); + break; default: WARN_ON_ONCE(1); break; -- 2.34.1

From: Waiman Long <longman@redhat.com> stable inclusion from stable-v6.6.59 commit 509c29d0d26f68a6f6d0a05cb1a89725237e2b87 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IB2STY CVE: CVE-2024-50140 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... -------------------------------- [ Upstream commit 73ab05aa46b02d96509cb029a8d04fca7bbde8c7 ] With KASAN and PREEMPT_RT enabled, calling task_work_add() in task_tick_mm_cid() may cause the following splat. [ 63.696416] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 63.696416] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 610, name: modprobe [ 63.696416] preempt_count: 10001, expected: 0 [ 63.696416] RCU nest depth: 1, expected: 1 This problem is caused by the following call trace. sched_tick() [ acquire rq->__lock ] -> task_tick_mm_cid() -> task_work_add() -> __kasan_record_aux_stack() -> kasan_save_stack() -> stack_depot_save_flags() -> alloc_pages_mpol_noprof() -> __alloc_pages_noprof() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() -> __rmqueue_pcplist() -> rmqueue_bulk() -> rt_spin_lock() The rq lock is a raw_spinlock_t. We can't sleep while holding it. IOW, we can't call alloc_pages() in stack_depot_save_flags(). The task_tick_mm_cid() function with its task_work_add() call was introduced by commit 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") in v6.4 kernel. Fortunately, there is a kasan_record_aux_stack_noalloc() variant that calls stack_depot_save_flags() while not allowing it to allocate new pages. To allow task_tick_mm_cid() to use task_work without page allocation, a new TWAF_NO_ALLOC flag is added to enable calling kasan_record_aux_stack_noalloc() instead of kasan_record_aux_stack() if set. The task_tick_mm_cid() function is modified to add this new flag. The possible downside is the missing stack trace in a KASAN report due to new page allocation required when task_work_add_noallloc() is called which should be rare. Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241010014432.194742-1-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Liu Kai <liukai284@huawei.com> --- include/linux/task_work.h | 5 ++++- kernel/sched/core.c | 4 +++- kernel/task_work.c | 15 +++++++++++++-- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/include/linux/task_work.h b/include/linux/task_work.h index cf5e7e891a776..2964171856e00 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -14,11 +14,14 @@ init_task_work(struct callback_head *twork, task_work_func_t func) } enum task_work_notify_mode { - TWA_NONE, + TWA_NONE = 0, TWA_RESUME, TWA_SIGNAL, TWA_SIGNAL_NO_IPI, TWA_NMI_CURRENT, + + TWA_FLAGS = 0xff00, + TWAF_NO_ALLOC = 0x0100, }; static inline bool task_work_pending(struct task_struct *task) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 17122e99b9330..48547182d9f57 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -12678,7 +12678,9 @@ void task_tick_mm_cid(struct rq *rq, struct task_struct *curr) return; if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan))) return; - task_work_add(curr, work, TWA_RESUME); + + /* No page allocation under rq lock */ + task_work_add(curr, work, TWA_RESUME | TWAF_NO_ALLOC); } void sched_mm_cid_exit_signals(struct task_struct *t) diff --git a/kernel/task_work.c b/kernel/task_work.c index 5c2daa7ad3f90..8aa43204cb7dd 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -53,13 +53,24 @@ int task_work_add(struct task_struct *task, struct callback_head *work, enum task_work_notify_mode notify) { struct callback_head *head; + int flags = notify & TWA_FLAGS; + notify &= ~TWA_FLAGS; if (notify == TWA_NMI_CURRENT) { if (WARN_ON_ONCE(task != current)) return -EINVAL; } else { - /* record the work call stack in order to print it in KASAN reports */ - kasan_record_aux_stack(work); + /* + * Record the work call stack in order to print it in KASAN + * reports. + * + * Note that stack allocation can fail if TWAF_NO_ALLOC flag + * is set and new page is needed to expand the stack buffer. + */ + if (flags & TWAF_NO_ALLOC) + kasan_record_aux_stack_noalloc(work); + else + kasan_record_aux_stack(work); } head = READ_ONCE(task->task_works); -- 2.34.1

hulk inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IB2STY -------------------------------- fix kabi breakage by adding KABI_EXTEND_ENUM macro. Fixes: bee1a0cf30b6 ("sched/core: Disable page allocation in task_tick_mm_cid()") Signed-off-by: Liu Kai <liukai284@huawei.com> --- include/linux/task_work.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 2964171856e00..c3bdcf3c42ffd 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -4,6 +4,7 @@ #include <linux/list.h> #include <linux/sched.h> +#include <linux/kabi.h> typedef void (*task_work_func_t)(struct callback_head *); @@ -18,10 +19,10 @@ enum task_work_notify_mode { TWA_RESUME, TWA_SIGNAL, TWA_SIGNAL_NO_IPI, - TWA_NMI_CURRENT, + KABI_EXTEND_ENUM(TWA_NMI_CURRENT) - TWA_FLAGS = 0xff00, - TWAF_NO_ALLOC = 0x0100, + KABI_EXTEND_ENUM(TWA_FLAGS = 0xff00) + KABI_EXTEND_ENUM(TWAF_NO_ALLOC = 0x0100) }; static inline bool task_work_pending(struct task_struct *task) -- 2.34.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16415 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/Z2W... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16415 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/Z2W...
participants (2)
-
Liu Kai
-
patchwork bot