
From: Jens Axboe <axboe@kernel.dk> mainline inclusion from mainline-v6.3-rc2 commit 01e68ce08a30db3d842ce7a55f7f6e0474a55f9a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC6ES1 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Every now and then reports come in that are puzzled on why changing affinity on the io-wq workers fails with EINVAL. This happens because they set PF_NO_SETAFFINITY as part of their creation, as io-wq organizes workers into groups based on what CPU they are running on. However, this is purely an optimization and not a functional requirement. We can allow setting affinity, and just lazily update our worker to wqe mappings. If a given io-wq thread times out, it normally exits if there's no more work to do. The exception is if it's the last worker available. For the timeout case, check the affinity of the worker against group mask and exit even if it's the last worker. New workers should be created with the right mask and in the right location. Reported-by:Daniel Dao <dqminh@cloudflare.com> Link: https://lore.kernel.org/io-uring/CA+wXwBQwgxB3_UphSny-yAP5b26meeOu1W4TwYVcD_... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io-wq.c [Commit 8a565304927f ("io_uring/io-wq: Use set_bit() and test_bit() at worker->flags") modified the way worker->flag is set; commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") move io_acct_run_queue out of the protect of wqe->lock in io_wqe_worker; commit e13fb1fe1483 ("io-wq: reduce acct->lock crossing functions lock/unlock") remove the use of acct->lock in io_wqe_worker.] Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> --- io_uring/io-wq.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 066f9ab708c6..e25ab32414f4 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -622,7 +622,7 @@ static int io_wqe_worker(void *data) struct io_wqe_acct *acct = io_wqe_get_acct(worker); struct io_wqe *wqe = worker->wqe; struct io_wq *wq = wqe->wq; - bool last_timeout = false; + bool exit_mask = false, last_timeout = false; char buf[TASK_COMM_LEN]; set_mask_bits(&worker->flags, 0, @@ -641,8 +641,11 @@ static int io_wqe_worker(void *data) io_worker_handle_work(worker); goto loop; } - /* timed out, exit unless we're the last worker */ - if (last_timeout && acct->nr_workers > 1) { + /* + * Last sleep timed out. Exit if we're not the last worker, + * or if someone modified our affinity. + */ + if (last_timeout && (exit_mask || acct->nr_workers > 1)) { acct->nr_workers--; raw_spin_unlock(&wqe->lock); __set_current_state(TASK_RUNNING); @@ -661,7 +664,11 @@ static int io_wqe_worker(void *data) continue; break; } - last_timeout = !ret; + if (!ret) { + last_timeout = true; + exit_mask = !cpumask_test_cpu(raw_smp_processor_id(), + wqe->cpu_mask); + } } if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) { @@ -718,7 +725,6 @@ static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker, tsk->pf_io_worker = worker; worker->task = tsk; set_cpus_allowed_ptr(tsk, wqe->cpu_mask); - tsk->flags |= PF_NO_SETAFFINITY; raw_spin_lock(&wqe->lock); hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list); -- 2.31.1