From: Xiaoguang Wang xiaoguang.wang@linux.alibaba.com
mainline inclusion from mainline-5.11-rc1 commit c07e6719511e77c4b289f62bfe96423eb6ea061d category: bugfix bugzilla: 182869 CVE: NA ---------------------------
io_iopoll_complete() does not hold completion_lock to complete polled io, so in io_wq_submit_work(), we can not call io_req_complete() directly, to complete polled io, otherwise there maybe concurrent access to cqring, defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io") has fixed this issue, but Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests( IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is not good.
Given that io_iopoll_complete() is always called under uring_lock, so here for polled io, we can also get uring_lock to fix this issue.
Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io") Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Xiaoguang Wang xiaoguang.wang@linux.alibaba.com Reviewed-by: Pavel Begunkov asml.silence@gmail.com [axboe: don't deref 'req' after completing it'] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/io_uring.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index c944a27178a48..b5fb86e2ed597 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5782,19 +5782,28 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work) }
if (ret) { + struct io_ring_ctx *lock_ctx = NULL; + + if (req->ctx->flags & IORING_SETUP_IOPOLL) + lock_ctx = req->ctx; + /* - * io_iopoll_complete() does not hold completion_lock to complete - * polled io, so here for polled io, just mark it done and still let - * io_iopoll_complete() complete it. + * io_iopoll_complete() does not hold completion_lock to + * complete polled io, so here for polled io, we can not call + * io_req_complete() directly, otherwise there maybe concurrent + * access to cqring, defer_list, etc, which is not safe. Given + * that io_iopoll_complete() is always called under uring_lock, + * so here for polled io, we also get uring_lock to complete + * it. */ - if (req->ctx->flags & IORING_SETUP_IOPOLL) { - struct kiocb *kiocb = &req->rw.kiocb; + if (lock_ctx) + mutex_lock(&lock_ctx->uring_lock);
- kiocb_done(kiocb, ret, NULL); - } else { - req_set_fail_links(req); - io_req_complete(req, ret); - } + req_set_fail_links(req); + io_req_complete(req, ret); + + if (lock_ctx) + mutex_unlock(&lock_ctx->uring_lock); }
return io_steal_work(req);
From: Pavel Begunkov asml.silence@gmail.com
mainline inclusion from mainline-5.12-rc1 commit 792bb6eb862333658bf1bd2260133f0507e2da8d category: bugfix bugzilla: 182869 CVE: NA ---------------------------
[ 97.866748] a.out/2890 is trying to acquire lock: [ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_wq_submit_work+0x155/0x240 [ 97.869735] [ 97.869735] but task is already holding lock: [ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.873074] [ 97.873074] other info that might help us debug this: [ 97.874520] Possible unsafe locking scenario: [ 97.874520] [ 97.875845] CPU0 [ 97.876440] ---- [ 97.877048] lock(&ctx->uring_lock); [ 97.877961] lock(&ctx->uring_lock); [ 97.878881] [ 97.878881] *** DEADLOCK *** [ 97.878881] [ 97.880341] May be due to missing lock nesting notation [ 97.880341] [ 97.881952] 1 lock held by a.out/2890: [ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.885108] [ 97.885108] stack backtrace: [ 97.890457] Call Trace: [ 97.891121] dump_stack+0xac/0xe3 [ 97.891972] __lock_acquire+0xab6/0x13a0 [ 97.892940] lock_acquire+0x2c3/0x390 [ 97.894894] __mutex_lock+0xae/0x9f0 [ 97.901101] io_wq_submit_work+0x155/0x240 [ 97.902112] io_wq_cancel_cb+0x162/0x490 [ 97.904126] io_async_find_and_cancel+0x3b/0x140 [ 97.905247] io_issue_sqe+0x86d/0x13e0 [ 97.909122] __io_queue_sqe+0x10b/0x550 [ 97.913971] io_queue_sqe+0x235/0x470 [ 97.914894] io_submit_sqes+0xcce/0xf10 [ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 97.921424] do_syscall_64+0x2d/0x40 [ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9
While holding uring_lock, e.g. from inline execution, async cancel request may attempt cancellations through io_wq_submit_work, which may try to grab a lock. Delay it to task_work, so we do it from a clean context and don't have to worry about locking.
Cc: stable@vger.kernel.org # 5.5+ Fixes: c07e6719511e ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()") Reported-by: Abaci abaci@linux.alibaba.com Reported-by: Hao Xu haoxu@linux.alibaba.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: fs/io_uring.c [ eab30c4d20dc76("io_uring: deduplicate failing task_work_add") is not applied. 87ceb6a6b81ec("io_uring: drop 'ctx' ref on task work cancelation") is not applied. 91989c707884ec("task_work: cleanup notification modes") is not applied.]
Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/io_uring.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index b5fb86e2ed597..226e9f123f654 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1858,6 +1858,16 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb, return ret; }
+static void io_req_task_work_add_fallback(struct io_kiocb *req, + void (*cb)(struct callback_head *)) +{ + struct task_struct *tsk = io_wq_get_task(req->ctx->io_wq); + + init_task_work(&req->task_work, cb); + task_work_add(tsk, &req->task_work, 0); + wake_up_process(tsk); +} + static void __io_req_task_cancel(struct io_kiocb *req, int error) { struct io_ring_ctx *ctx = req->ctx; @@ -1875,8 +1885,11 @@ static void __io_req_task_cancel(struct io_kiocb *req, int error) static void io_req_task_cancel(struct callback_head *cb) { struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work); + struct io_ring_ctx *ctx = req->ctx;
+ mutex_lock(&ctx->uring_lock); __io_req_task_cancel(req, -ECANCELED); + mutex_unlock(&ctx->uring_lock); }
static void __io_req_task_submit(struct io_kiocb *req) @@ -5764,7 +5777,10 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work) /* if NO_CANCEL is set, we must still run the work */ if ((work->flags & (IO_WQ_WORK_CANCEL|IO_WQ_WORK_NO_CANCEL)) == IO_WQ_WORK_CANCEL) { - ret = -ECANCELED; + /* io-wq is going to take down one */ + refcount_inc(&req->refs); + io_req_task_work_add_fallback(req, io_req_task_cancel); + return io_steal_work(req); }
if (!ret) {