From: Tang Yizhou tangyizhou@huawei.com
ascend inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I4EUVI CVE: NA
-------------------------------------------------
We encounter a problem as follows:
[ 3057. 75094] share pool: task add group failed, current thread is killed [ 3057. 75152] [ascend] [drv_buff] [buff_mv_pid_node_to_recycle_list 872] rosnode:12273,12273 release empty list node pid 12273, group_id 1 [ 3057. 76380] [ascend] [ERROR] [drv_buff] [buff_req_ioctl_pid_add_group 443] rosnode:12297,12297 pid add group failed, pid:12297, grp_id:1, ret -512 [ 3057. 76382] [ascend] [drv_buff] [buff_ioctl 841] rosnode:12297,12297 buff_req_ioctl_handlers failed. ret:-512 [ 3057. 76452] Unable to handle kernel paging request at virtual address dead000000000108 [ 3057. 76454] Mem abort info: [ 3057. 76456] ESR = 0x96000044 [ 3057. 76457] Exception class = DABT (current EL), IL = 32 bits [ 3057. 76458] SET = 0, FnV = 0 [ 3057. 76459] EA = 0, S1PTW = 0 [ 3057. 76460] Data abort info: [ 3057. 76461] ISV = 0, ISS = 0x00000044 [ 3057. 76462] CM = 0, WnR = 1 [ 3057. 76463] [dead000000000108] address between user and kernel address ranges [ 3057. 76466] Internal error: Oops: 96000044 [#1] SMP [ 3057. 76469] Process rosnode (pid: 12308, stack limit = 0x0000000012aa85df) [ 3057. 76473] CPU: 10 PID: 12308 Comm: rosnode Tainted: P C O 4.19.95-1.h1.AOS2.0.aarch64 #1 [ 3057. 76474] Hardware name: evb (DT) [ 3057. 76476] pstate: 20400009 (nzCv daif +PAN -UAO) [ 3057. 76483] pc : sp_group_exit+0x94/0x130 [ 3057. 76486] lr : sp_group_exit+0x48/0x130 [ 3057. 76486] sp : ffff00001a163c10 [ 3057. 76487] pmr_save: 000000e0 [ 3057. 76489] x29: ffff00001a163c10 x28: ffff800887e2a940 [ 3057. 76491] x27: 0000000000000000 x26: ffff800d8098ca40 [ 3057. 76492] x25: ffff80089a879168 x24: ffff00001a163dd0 [ 3057. 76494] x23: 0000000000000000 x22: 0000000000000002 [ 3057. 76495] x21: ffff800896e73088 x20: ffff80089a879100 [ 3057. 76496] x19: ffff800896e73000 x18: ffff7e002ca9a4f4 [ 3057. 76498] x17: 0000000000000001 x16: 0000000000000001 [ 3057. 76499] x15: 0400000000000000 x14: ffff800bd5d0d050 [ 3057. 76500] x13: 0000000000000001 x12: 0000000000000000 [ 3057. 76502] x11: 0000000000000000 x10: 00000000000009e0 [ 3057. 76503] x9 : ffff00001a163a90 x8 : ffff800887e2b380 [ 3057. 76505] x7 : 00000000000000b4 x6 : 0000001b5b9081bb [ 3057. 76506] x5 : dead000000000100 x4 : dead000000000200 [ 3057. 76507] x3 : dead000000000100 x2 : dead000000000200 [ 3057. 76508] x1 : ffff800d81365400 x0 : ffff800896e73088 [ 3057. 76510] Call trace: [ 3057. 76513] sp_group_exit+0x94/0x130 [ 3057. 76517] mmput+0x20/0x170 [ 3057. 76519] do_exit+0x338/0xb38 [ 3057. 76520] do_group_exit+0x3c/0xe8 [ 3057. 76522] get_signal+0x14c/0x7d8 [ 3057. 76524] do_signal+0x88/0x290 [ 3057. 76525] do_notify_resume+0x150/0x3c8 [ 3057. 76528] work_pending+0x8/0x10 [ 3057. 76530] Code: d2804004 f2fbd5a5 f2fbd5a4 aa1503e0 (f9000462) [ 3057. 76534] [kbox] unable to set sctrl register, maybe the domain is not SD, continue [ 3057. 76535] [kbox] catch die event on cpu 10 [ 3057. 76537] [kbox] catch die event, start logging [ 3057. 76540] [kbox] die info:Oops:0044 [ 3057. 76540] [kbox] start to collect
If process A adds process B into an sp_group and B is killed at the mean time, then the calling of sp_group_add_task for B is failed and
list_del(&mm->sp_node);
is executed. Notice there is also an execution of this code in sp_group_exit for B, so mm->sp_node is double freed.
The addr of sp_node->next is LIST_POISON1, which is dead000000000108 in arm64.
Signed-off-by: Tang Yizhou tangyizhou@huawei.com Reviewed-by: Ding Tianhong dingtianhong@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/share_pool.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/share_pool.c b/mm/share_pool.c index 995db20a1d3b9..b44af9a7c233e 100644 --- a/mm/share_pool.c +++ b/mm/share_pool.c @@ -3089,22 +3089,22 @@ void sp_group_exit(struct mm_struct *mm) * because the last owner of this mm is in exiting procedure: * do_exit() -> exit_mm() -> mmput() -> THIS function. */ - down_write(&spg->rw_lock); - if (spg_valid(spg) && atomic_read(&mm->mm_users) == MM_WOULD_FREE) { + if (atomic_read(&mm->mm_users) == MM_WOULD_FREE) { + down_write(&spg->rw_lock); /* a dead group should NOT be reactive again */ - if (list_is_singular(&spg->procs)) + if (spg_valid(spg) && list_is_singular(&spg->procs)) is_alive = spg->is_alive = false; - list_del(&mm->sp_node); /* affect spg->procs */ + if (mm->sp_group) /* concurrency handle of sp_group_add_task */ + list_del(&mm->sp_node); /* affect spg->procs */ up_write(&spg->rw_lock);
if (!is_alive) blocking_notifier_call_chain(&sp_notifier_chain, 0, mm->sp_group); + /* match with get_task_mm() in sp_group_add_task() */ atomic_dec(&mm->mm_users); - return; } - up_write(&spg->rw_lock); }
void sp_group_post_exit(struct mm_struct *mm)