From: Waiman Long longman@redhat.com
stable inclusion from stable-v6.6.59 commit 509c29d0d26f68a6f6d0a05cb1a89725237e2b87 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IB2STY CVE: CVE-2024-50140
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 73ab05aa46b02d96509cb029a8d04fca7bbde8c7 ]
With KASAN and PREEMPT_RT enabled, calling task_work_add() in task_tick_mm_cid() may cause the following splat.
[ 63.696416] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 63.696416] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 610, name: modprobe [ 63.696416] preempt_count: 10001, expected: 0 [ 63.696416] RCU nest depth: 1, expected: 1
This problem is caused by the following call trace.
sched_tick() [ acquire rq->__lock ] -> task_tick_mm_cid() -> task_work_add() -> __kasan_record_aux_stack() -> kasan_save_stack() -> stack_depot_save_flags() -> alloc_pages_mpol_noprof() -> __alloc_pages_noprof() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() -> __rmqueue_pcplist() -> rmqueue_bulk() -> rt_spin_lock()
The rq lock is a raw_spinlock_t. We can't sleep while holding it. IOW, we can't call alloc_pages() in stack_depot_save_flags().
The task_tick_mm_cid() function with its task_work_add() call was introduced by commit 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") in v6.4 kernel.
Fortunately, there is a kasan_record_aux_stack_noalloc() variant that calls stack_depot_save_flags() while not allowing it to allocate new pages. To allow task_tick_mm_cid() to use task_work without page allocation, a new TWAF_NO_ALLOC flag is added to enable calling kasan_record_aux_stack_noalloc() instead of kasan_record_aux_stack() if set. The task_tick_mm_cid() function is modified to add this new flag.
The possible downside is the missing stack trace in a KASAN report due to new page allocation required when task_work_add_noallloc() is called which should be rare.
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20241010014432.194742-1-longman@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
Conflicts: include/linux/task_work.h kernel/task_work.c [Some contexts different. No functional impact.] Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com --- include/linux/task_work.h | 4 +++- kernel/sched/core.c | 4 +++- kernel/task_work.c | 13 +++++++++++-- 3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 26b8a47f41fc..8cb9f4a95d7a 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -14,10 +14,12 @@ init_task_work(struct callback_head *twork, task_work_func_t func) }
enum task_work_notify_mode { - TWA_NONE, + TWA_NONE = 0, TWA_RESUME, TWA_SIGNAL, TWA_SIGNAL_NO_IPI, + TWA_FLAGS = 0xff00, + TWAF_NO_ALLOC = 0x0100, };
static inline bool task_work_pending(struct task_struct *task) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 462571b26f88..0224089e1105 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -12678,7 +12678,9 @@ void task_tick_mm_cid(struct rq *rq, struct task_struct *curr) return; if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan))) return; - task_work_add(curr, work, TWA_RESUME); + + /* No page allocation under rq lock */ + task_work_add(curr, work, TWA_RESUME | TWAF_NO_ALLOC); }
void sched_mm_cid_exit_signals(struct task_struct *t) diff --git a/kernel/task_work.c b/kernel/task_work.c index 2134ac8057a9..45f9963aa2d4 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -43,9 +43,18 @@ int task_work_add(struct task_struct *task, struct callback_head *work, enum task_work_notify_mode notify) { struct callback_head *head; + int flags = notify & TWA_FLAGS; + notify &= ~TWA_FLAGS;
- /* record the work call stack in order to print it in KASAN reports */ - kasan_record_aux_stack(work); + /* + * Record the work call stack in order to print it in KASAN reports. + * Note that stack allocation can fail if TWAF_NO_ALLOC flag + * is set and new page is needed to expand the stack buffer. + */ + if (flags & TWAF_NO_ALLOC) + kasan_record_aux_stack_noalloc(work); + else + kasan_record_aux_stack(work);
head = READ_ONCE(task->task_works); do {