[PATCH OLK-6.6 v2 00/43] fuse/uring: support fuse over io_uring

fuse/uring: support fuse over io_uring. Bernd Schubert (19): fuse: rename to fuse_dev_end_requests and make non-static fuse: Move fuse_get_dev to header file fuse: Move request bits fuse: Add fuse-io-uring design documentation fuse: make args->in_args[0] to be always the header fuse: {io-uring} Handle SQEs - register commands fuse: Make fuse_copy non static fuse: Add fuse-io-uring handling into fuse_copy fuse: {io-uring} Make hash-list req unique finding functions non-static fuse: Add io-uring sqe commit and fetch support fuse: {io-uring} Handle teardown of ring entries fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static fuse: Allow to queue fg requests through io-uring fuse: Allow to queue bg requests through io-uring fuse: {io-uring} Prevent mount point hang on fuse-server termination fuse: block request allocation until io-uring init is complete fuse: enable fuse-over-io-uring fuse: prevent disabling io-uring on active connections fuse: {io-uring} Fix a possible req cancellation race Jens Axboe (1): io_uring: remove dead struct io_submit_state member Joanne Koong (2): fuse: fix uring race condition for null dereference of fc fuse: remove unneeded atomic set in uring creation Luis Henriques (2): fuse: fix possible deadlock if rings are never initialized fuse: removed unused function fuse_uring_create() from header Ming Lei (2): io_uring: retain top 8bits of uring_cmd flags for kernel internal use io_uring: cancelable uring_cmd Pavel Begunkov (17): io_uring: split out cmd api into a separate header io_uring/cmd: inline io_uring_cmd_do_in_task_lazy io_uring/cmd: inline io_uring_cmd_get_task io_uring/cmd: expose iowq to cmds io_uring/cmd: give inline space in request to cmds io_uring/cmd: move io_uring_try_cancel_uring_cmd() io_uring/cmd: kill one issue_flags to tw conversion io_uring/cmd: fix tw <-> issue_flags conversion io_uring/cmd: document some uring_cmd related helpers io_uring: force tw ctx locking io_uring: remove struct io_tw_state::locked io_uring: refactor io_fill_cqe_req_aux io_uring: get rid of intermediate aux cqe caches io_uring: remove current check from complete_post io_uring: refactor io_req_complete_post() io_uring: clean up io_lockdep_assert_cq_locked io_uring/cmd: let cmds to know about dying task Documentation/filesystems/fuse-io-uring.rst | 99 ++ Documentation/filesystems/index.rst | 1 + MAINTAINERS | 1 + drivers/block/ublk_drv.c | 2 +- drivers/nvme/host/ioctl.c | 2 +- fs/fuse/Kconfig | 12 + fs/fuse/Makefile | 1 + fs/fuse/dax.c | 11 +- fs/fuse/dev.c | 172 ++- fs/fuse/dev_uring.c | 1325 +++++++++++++++++++ fs/fuse/dev_uring_i.h | 205 +++ fs/fuse/dir.c | 32 +- fs/fuse/fuse_dev_i.h | 69 + fs/fuse/fuse_i.h | 35 +- fs/fuse/inode.c | 14 +- fs/fuse/xattr.c | 7 +- include/linux/io_uring.h | 72 +- include/linux/io_uring/cmd.h | 110 ++ include/linux/io_uring_types.h | 43 +- include/uapi/linux/fuse.h | 77 +- include/uapi/linux/io_uring.h | 5 +- io_uring/io_uring.c | 175 +-- io_uring/io_uring.h | 34 +- io_uring/net.c | 6 +- io_uring/poll.c | 3 +- io_uring/rw.c | 8 +- io_uring/timeout.c | 8 +- io_uring/uring_cmd.c | 108 +- io_uring/uring_cmd.h | 3 + security/selinux/hooks.c | 2 +- security/smack/smack_lsm.c | 2 +- 31 files changed, 2300 insertions(+), 344 deletions(-) create mode 100644 Documentation/filesystems/fuse-io-uring.rst create mode 100644 fs/fuse/dev_uring.c create mode 100644 fs/fuse/dev_uring_i.h create mode 100644 fs/fuse/fuse_dev_i.h create mode 100644 include/linux/io_uring/cmd.h -- 2.39.2

From: Ming Lei <ming.lei@redhat.com> mainline inclusion from mainline-v6.6-rc2 commit 528ce6781726e022bc5dc84034360e6e8f1b89bd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Retain top 8bits of uring_cmd flags for kernel internal use, so that we can move IORING_URING_CMD_POLLED out of uapi header. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring.h | 3 +++ include/uapi/linux/io_uring.h | 5 ++--- io_uring/io_uring.c | 3 +++ io_uring/uring_cmd.c | 2 +- 4 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 108cffa3ae5b..4ed59118a74c 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -22,6 +22,9 @@ enum io_uring_cmd_flags { IO_URING_F_IOPOLL = (1 << 10), }; +/* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */ +#define IORING_URING_CMD_POLLED (1U << 31) + struct io_uring_cmd { struct file *file; const struct io_uring_sqe *sqe; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 8e61f8b7c2ce..de77ad08b123 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -246,13 +246,12 @@ enum io_uring_op { }; /* - * sqe->uring_cmd_flags + * sqe->uring_cmd_flags top 8bits aren't available for userspace * IORING_URING_CMD_FIXED use registered buffer; pass this flag * along with setting sqe->buf_index. - * IORING_URING_CMD_POLLED driver use only */ #define IORING_URING_CMD_FIXED (1U << 0) -#define IORING_URING_CMD_POLLED (1U << 31) +#define IORING_URING_CMD_MASK IORING_URING_CMD_FIXED /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 8b15e9dc340f..489ed84b39b8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4807,6 +4807,9 @@ static int __init io_uring_init(void) BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); + /* top 8bits are for internal use */ + BUILD_BUG_ON((IORING_URING_CMD_MASK & 0xff000000) != 0); + io_uring_optable_init(); /* diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 2cbd1c24414c..15eeb8cdf2d0 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -91,7 +91,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); - if (ioucmd->flags & ~IORING_URING_CMD_FIXED) + if (ioucmd->flags & ~IORING_URING_CMD_MASK) return -EINVAL; if (ioucmd->flags & IORING_URING_CMD_FIXED) { -- 2.39.2

From: Ming Lei <ming.lei@redhat.com> mainline inclusion from mainline-v6.6-rc2 commit 93b8cc60c37b9d17732b7a297e5dca29b50a990d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- uring_cmd may never complete, such as ublk, in which uring cmd isn't completed until one new block request is coming from ublk block device. Add cancelable uring_cmd to provide mechanism to driver for cancelling pending commands in its own way. Add API of io_uring_cmd_mark_cancelable() for driver to mark one command as cancelable, then io_uring will cancel this command in io_uring_cancel_generic(). ->uring_cmd() callback is reused for canceling command in driver's way, then driver gets notified with the cancelling from io_uring. Add API of io_uring_cmd_get_task() to help driver cancel handler deal with the canceling. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: include/linux/io_uring.h io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring.h | 15 +++++++++++ include/linux/io_uring_types.h | 6 +++++ io_uring/io_uring.c | 33 ++++++++++++++++++++++++ io_uring/uring_cmd.c | 47 ++++++++++++++++++++++++++++++++++ 4 files changed, 101 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 4ed59118a74c..058a6be303f5 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,9 +20,13 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + + /* set when uring wants to cancel a previously issued command */ + IO_URING_F_CANCEL = (1 << 11), }; /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */ +#define IORING_URING_CMD_CANCELABLE (1U << 30) #define IORING_URING_CMD_POLLED (1U << 31) struct io_uring_cmd { @@ -83,6 +87,9 @@ static inline void io_uring_free(struct task_struct *tsk) } int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); bool io_is_uring_fops(struct file *file); +void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, + unsigned int issue_flags); +struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd); #else static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) @@ -123,6 +130,14 @@ static inline bool io_is_uring_fops(struct file *file) { return false; } +static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ +} +static inline struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) +{ + return NULL; +} #endif #endif diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 8215e193178a..0cfea3980fe3 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -264,6 +264,12 @@ struct io_ring_ctx { */ struct io_wq_work_list iopoll_list; bool poll_multi_queue; + + /* + * Any cancelable uring_cmd is added to this list in + * ->uring_cmd() by io_uring_cmd_insert_cancelable() + */ + struct hlist_head cancelable_uring_cmd; } ____cacheline_aligned_in_smp; struct { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 489ed84b39b8..a74cb5e4afc8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -340,6 +340,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd); return ctx; err: kfree(ctx->cancel_table.hbs); @@ -3386,6 +3387,37 @@ static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx) return ret; } +static bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, + struct task_struct *task, bool cancel_all) +{ + struct hlist_node *tmp; + struct io_kiocb *req; + bool ret = false; + + lockdep_assert_held(&ctx->uring_lock); + + hlist_for_each_entry_safe(req, tmp, &ctx->cancelable_uring_cmd, + hash_node) { + struct io_uring_cmd *cmd = io_kiocb_to_cmd(req, + struct io_uring_cmd); + struct file *file = req->file; + + if (!cancel_all && req->task != task) + continue; + + if (cmd->flags & IORING_URING_CMD_CANCELABLE) { + /* ->sqe isn't available if no async data */ + if (!req_has_async_data(req)) + cmd->sqe = NULL; + file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL); + ret = true; + } + } + io_submit_flush_completions(ctx); + + return ret; +} + static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, struct task_struct *task, bool cancel_all) @@ -3433,6 +3465,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, ret |= io_cancel_defer_files(ctx, task, cancel_all); mutex_lock(&ctx->uring_lock); ret |= io_poll_remove_all(ctx, task, cancel_all); + ret |= io_uring_try_cancel_uring_cmd(ctx, task, cancel_all); mutex_unlock(&ctx->uring_lock); ret |= io_kill_timeouts(ctx, task, cancel_all); if (task) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 15eeb8cdf2d0..6dafc2ead008 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -13,6 +13,51 @@ #include "rsrc.h" #include "uring_cmd.h" +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct io_kiocb *req = cmd_to_io_kiocb(cmd); + struct io_ring_ctx *ctx = req->ctx; + + if (!(cmd->flags & IORING_URING_CMD_CANCELABLE)) + return; + + cmd->flags &= ~IORING_URING_CMD_CANCELABLE; + io_ring_submit_lock(ctx, issue_flags); + hlist_del(&req->hash_node); + io_ring_submit_unlock(ctx, issue_flags); +} + +/* + * Mark this command as concelable, then io_uring_try_cancel_uring_cmd() + * will try to cancel this issued command by sending ->uring_cmd() with + * issue_flags of IO_URING_F_CANCEL. + * + * The command is guaranteed to not be done when calling ->uring_cmd() + * with IO_URING_F_CANCEL, but it is driver's responsibility to deal + * with race between io_uring canceling and normal completion. + */ +void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct io_kiocb *req = cmd_to_io_kiocb(cmd); + struct io_ring_ctx *ctx = req->ctx; + + if (!(cmd->flags & IORING_URING_CMD_CANCELABLE)) { + cmd->flags |= IORING_URING_CMD_CANCELABLE; + io_ring_submit_lock(ctx, issue_flags); + hlist_add_head(&req->hash_node, &ctx->cancelable_uring_cmd); + io_ring_submit_unlock(ctx, issue_flags); + } +} +EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable); + +struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) +{ + return cmd_to_io_kiocb(cmd)->task; +} +EXPORT_SYMBOL_GPL(io_uring_cmd_get_task); + static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); @@ -56,6 +101,8 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2, { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + io_uring_cmd_del_cancelable(ioucmd, issue_flags); + if (ret < 0) req_set_fail(req); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.7-rc5 commit b66509b8497f2b002a2654e386a440f1274ddcc7 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring cmd infra are completely orthogonal and don't need each other's definitions. Start cleaning it up by splitting out all command bits into a new header file. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.170139195... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: include/linux/io_uring.h include/linux/io_uring_types.h drivers/nvme/host/ioctl.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- MAINTAINERS | 1 + drivers/block/ublk_drv.c | 2 +- drivers/nvme/host/ioctl.c | 2 +- include/linux/io_uring.h | 90 +--------------------------------- include/linux/io_uring/cmd.h | 81 ++++++++++++++++++++++++++++++ include/linux/io_uring_types.h | 19 +++++++ io_uring/io_uring.c | 1 + io_uring/rw.c | 2 +- io_uring/uring_cmd.c | 2 +- security/selinux/hooks.c | 2 +- security/smack/smack_lsm.c | 2 +- 11 files changed, 110 insertions(+), 94 deletions(-) create mode 100644 include/linux/io_uring/cmd.h diff --git a/MAINTAINERS b/MAINTAINERS index 61baf2cfc4e1..74eb865e3259 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11093,6 +11093,7 @@ L: io-uring@vger.kernel.org S: Maintained T: git git://git.kernel.dk/linux-block T: git git://git.kernel.dk/liburing +F: include/linux/io_uring/ F: include/linux/io_uring.h F: include/linux/io_uring_types.h F: include/trace/events/io_uring.h diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index df3e5aab4b5a..53b38b5dd0b1 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -36,7 +36,7 @@ #include <linux/sched/mm.h> #include <linux/uaccess.h> #include <linux/cdev.h> -#include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include <linux/blk-mq.h> #include <linux/delay.h> #include <linux/mm.h> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 4ce31f9f0694..9eb856f7f990 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -6,7 +6,7 @@ #include <linux/blk-integrity.h> #include <linux/ptrace.h> /* for force_successful_syscall_return */ #include <linux/nvme_ioctl.h> -#include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include "nvme.h" enum { diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 058a6be303f5..d4264bea0e64 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -6,69 +6,13 @@ #include <linux/xarray.h> #include <uapi/linux/io_uring.h> -enum io_uring_cmd_flags { - IO_URING_F_COMPLETE_DEFER = 1, - IO_URING_F_UNLOCKED = 2, - /* the request is executed from poll, it should not be freed */ - IO_URING_F_MULTISHOT = 4, - /* executed by io-wq */ - IO_URING_F_IOWQ = 8, - /* int's last bit, sign checks are usually faster than a bit test */ - IO_URING_F_NONBLOCK = INT_MIN, - - /* ctx state flags, for URING_CMD */ - IO_URING_F_SQE128 = (1 << 8), - IO_URING_F_CQE32 = (1 << 9), - IO_URING_F_IOPOLL = (1 << 10), - - /* set when uring wants to cancel a previously issued command */ - IO_URING_F_CANCEL = (1 << 11), -}; - -/* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */ -#define IORING_URING_CMD_CANCELABLE (1U << 30) -#define IORING_URING_CMD_POLLED (1U << 31) - -struct io_uring_cmd { - struct file *file; - const struct io_uring_sqe *sqe; - union { - /* callback to defer completions to task context */ - void (*task_work_cb)(struct io_uring_cmd *cmd, unsigned); - /* used for polled completion */ - void *cookie; - }; - u32 cmd_op; - u32 flags; - u8 pdu[32]; /* available inline for free use */ -}; - -static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe) -{ - return sqe->cmd; -} - #if defined(CONFIG_IO_URING) -int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, void *ioucmd); -void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, - unsigned issue_flags); void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); const char *io_uring_get_opcode(u8 opcode); -void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned), - unsigned flags); -/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ -void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)); - -static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) -{ - __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); -} +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); +bool io_is_uring_fops(struct file *file); static inline void io_uring_files_cancel(void) { @@ -85,29 +29,7 @@ static inline void io_uring_free(struct task_struct *tsk) if (tsk->io_uring) __io_uring_free(tsk); } -int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); -bool io_is_uring_fops(struct file *file); -void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, - unsigned int issue_flags); -struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd); #else -static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, void *ioucmd) -{ - return -EOPNOTSUPP; -} -static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, - ssize_t ret2, unsigned issue_flags) -{ -} -static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) -{ -} -static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) -{ -} static inline void io_uring_task_cancel(void) { } @@ -130,14 +52,6 @@ static inline bool io_is_uring_fops(struct file *file) { return false; } -static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, - unsigned int issue_flags) -{ -} -static inline struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) -{ - return NULL; -} #endif #endif diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h new file mode 100644 index 000000000000..62fcfaf6fcc9 --- /dev/null +++ b/include/linux/io_uring/cmd.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_IO_URING_CMD_H +#define _LINUX_IO_URING_CMD_H + +#include <uapi/linux/io_uring.h> +#include <linux/io_uring_types.h> + +/* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */ +#define IORING_URING_CMD_CANCELABLE (1U << 30) +#define IORING_URING_CMD_POLLED (1U << 31) + +struct io_uring_cmd { + struct file *file; + const struct io_uring_sqe *sqe; + union { + /* callback to defer completions to task context */ + void (*task_work_cb)(struct io_uring_cmd *cmd, unsigned); + /* used for polled completion */ + void *cookie; + }; + u32 cmd_op; + u32 flags; + u8 pdu[32]; /* available inline for free use */ +}; + +static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe) +{ + return sqe->cmd; +} + +#if defined(CONFIG_IO_URING) +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, + struct iov_iter *iter, void *ioucmd); +void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, + unsigned issue_flags); +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned), + unsigned flags); +/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)); + +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); +} + +void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, + unsigned int issue_flags); +struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd); + +#else +static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, + struct iov_iter *iter, void *ioucmd) +{ + return -EOPNOTSUPP; +} +static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, + ssize_t ret2, unsigned issue_flags) +{ +} +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ +} +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ +} +static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ +} +static inline struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) +{ + return NULL; +} +#endif + +#endif /* _LINUX_IO_URING_CMD_H */ diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 0cfea3980fe3..ac886196f421 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -7,6 +7,25 @@ #include <linux/llist.h> #include <uapi/linux/io_uring.h> +enum io_uring_cmd_flags { + IO_URING_F_COMPLETE_DEFER = 1, + IO_URING_F_UNLOCKED = 2, + /* the request is executed from poll, it should not be freed */ + IO_URING_F_MULTISHOT = 4, + /* executed by io-wq */ + IO_URING_F_IOWQ = 8, + /* int's last bit, sign checks are usually faster than a bit test */ + IO_URING_F_NONBLOCK = INT_MIN, + + /* ctx state flags, for URING_CMD */ + IO_URING_F_SQE128 = (1 << 8), + IO_URING_F_CQE32 = (1 << 9), + IO_URING_F_IOPOLL = (1 << 10), + + /* set when uring wants to cancel a previously issued command */ + IO_URING_F_CANCEL = (1 << 11), +}; + struct io_wq_work_node { struct io_wq_work_node *next; }; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index a74cb5e4afc8..ec155bd242cd 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -68,6 +68,7 @@ #include <linux/fadvise.h> #include <linux/task_work.h> #include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include <linux/audit.h> #include <linux/security.h> #include <linux/vmalloc.h> diff --git a/io_uring/rw.c b/io_uring/rw.c index 75b001febb4d..780c14bf40a5 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -10,7 +10,7 @@ #include <linux/poll.h> #include <linux/nospec.h> #include <linux/compat.h> -#include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include <uapi/linux/io_uring.h> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 6dafc2ead008..021fe6630471 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -2,7 +2,7 @@ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/file.h> -#include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include <linux/security.h> #include <linux/nospec.h> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index d4a99d98ec77..aac413c4c5f7 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -91,7 +91,7 @@ #include <uapi/linux/mount.h> #include <linux/fsnotify.h> #include <linux/fanotify.h> -#include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include "avc.h" #include "objsec.h" diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4625674f0e95..8f93a18194b0 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -42,7 +42,7 @@ #include <linux/fs_context.h> #include <linux/fs_parser.h> #include <linux/watch_queue.h> -#include <linux/io_uring.h> +#include <linux/io_uring/cmd.h> #include "smack.h" #define TRANS_TRUE "TRUE" -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.7-rc5 commit 6b04a3737057ddfed396c954f9e4be4fe6d53c62 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Now as we can easily include io_uring_types.h, move IOU_F_TWQ_LAZY_WAKE and inline io_uring_cmd_do_in_task_lazy(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/2ec9fb31dd192d1c5cf26d0a2dec5657d88a8e48.170139195... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring/cmd.h | 31 ++++++++++++++++--------------- include/linux/io_uring_types.h | 11 +++++++++++ io_uring/io_uring.h | 10 ---------- io_uring/uring_cmd.c | 7 ------- 4 files changed, 27 insertions(+), 32 deletions(-) diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index 62fcfaf6fcc9..ee9b3bc3a4af 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -36,15 +36,6 @@ void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, void (*task_work_cb)(struct io_uring_cmd *, unsigned), unsigned flags); -/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ -void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)); - -static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) -{ - __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); -} void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags); @@ -60,12 +51,9 @@ static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t ret2, unsigned issue_flags) { } -static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) -{ -} -static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +static inline void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned), + unsigned flags) { } static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, @@ -78,4 +66,17 @@ static inline struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd } #endif +/* users must follow the IOU_F_TWQ_LAZY_WAKE semantics */ +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); +} + +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); +} + #endif /* _LINUX_IO_URING_CMD_H */ diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index ac886196f421..04928567b51a 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -7,6 +7,17 @@ #include <linux/llist.h> #include <uapi/linux/io_uring.h> +enum { + /* + * A hint to not wake right away but delay until there are enough of + * tw's queued to match the number of CQEs the task is waiting for. + * + * Must not be used wirh requests generating more than one CQE. + * It's also ignored unless IORING_SETUP_DEFER_TASKRUN is set. + */ + IOU_F_TWQ_LAZY_WAKE = 1, +}; + enum io_uring_cmd_flags { IO_URING_F_COMPLETE_DEFER = 1, IO_URING_F_UNLOCKED = 2, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index ea0b8acabc71..c5c454744d18 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -15,16 +15,6 @@ #include <trace/events/io_uring.h> #endif -enum { - /* - * A hint to not wake right away but delay until there are enough of - * tw's queued to match the number of CQEs the task is waiting for. - * - * Must not be used wirh requests generating more than one CQE. - * It's also ignored unless IORING_SETUP_DEFER_TASKRUN is set. - */ - IOU_F_TWQ_LAZY_WAKE = 1, -}; enum { IOU_OK = 0, diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 021fe6630471..41708788928a 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -78,13 +78,6 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, } EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); -void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) -{ - __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); -} -EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy); - static inline void io_req_set_cqe32_extra(struct io_kiocb *req, u64 extra1, u64 extra2) { -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.7-rc5 commit 055c15626a45b1ebc9f2f34981e705e1af171236 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- With io_uring_types.h we see all required definitions to inline io_uring_cmd_get_task(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/aa8e317f09e651a5f3e72f8c0ad3902084c1f930.170139195... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring/cmd.h | 10 +++++----- io_uring/uring_cmd.c | 6 ------ 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index ee9b3bc3a4af..d69b4038aa3e 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -39,7 +39,6 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags); -struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd); #else static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, @@ -60,10 +59,6 @@ static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags) { } -static inline struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) -{ - return NULL; -} #endif /* users must follow the IOU_F_TWQ_LAZY_WAKE semantics */ @@ -79,4 +74,9 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); } +static inline struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) +{ + return cmd_to_io_kiocb(cmd)->task; +} + #endif /* _LINUX_IO_URING_CMD_H */ diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 41708788928a..6420500a051a 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -52,12 +52,6 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, } EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable); -struct task_struct *io_uring_cmd_get_task(struct io_uring_cmd *cmd) -{ - return cmd_to_io_kiocb(cmd)->task; -} -EXPORT_SYMBOL_GPL(io_uring_cmd_get_task); - static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.10-rc2 commit 6746ee4c3a189f8b60694f01e7e29bc5ff7972e0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When an io_uring request needs blocking context we offload it to the io_uring's thread pool called io-wq. We can get there off ->uring_cmd by returning -EAGAIN, but there is no straightforward way of doing that from an asynchronous callback. Add a helper that would transfer a command to a blocking context. Note, we do an extra hop via task_work before io_queue_iowq(), that's a limitation of io_uring infra we have that can likely be lifted later if that would ever become a problem. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f735f807d7c8ba50c9452c69dfe5d3e9e535037b.172607208... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/uring_cmd.c include/linux/io_uring/cmd.h [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring/cmd.h | 6 ++++++ io_uring/io_uring.c | 11 +++++++++++ io_uring/io_uring.h | 1 + io_uring/uring_cmd.c | 7 +++++++ 4 files changed, 25 insertions(+) diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index d69b4038aa3e..2597523dc6ad 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -40,6 +40,9 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags); +/* Execute the request from a blocking context */ +void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd); + #else static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) @@ -59,6 +62,9 @@ static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags) { } +static inline void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd) +{ +} #endif /* users must follow the IOU_F_TWQ_LAZY_WAKE semantics */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ec155bd242cd..ca6b6516f75c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -525,6 +525,17 @@ static void io_queue_iowq(struct io_kiocb *req) io_queue_linked_timeout(link); } +static void io_req_queue_iowq_tw(struct io_kiocb *req, struct io_tw_state *ts) +{ + io_queue_iowq(req); +} + +void io_req_queue_iowq(struct io_kiocb *req) +{ + req->io_task_work.func = io_req_queue_iowq_tw; + io_req_task_work_add(req); +} + static __cold void io_queue_deferred(struct io_ring_ctx *ctx) { while (!list_empty(&ctx->defer_list)) { diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index c5c454744d18..c8d90d90b6bf 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -65,6 +65,7 @@ int io_uring_alloc_task_context(struct task_struct *task, int io_ring_add_registered_file(struct io_uring_task *tctx, struct file *file, int start, int end); +void io_req_queue_iowq(struct io_kiocb *req); int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts); int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr); diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 6420500a051a..edc831cc34ed 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -199,6 +199,13 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); +void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd) +{ + struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + + io_req_queue_iowq(req); +} + int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct socket *sock = cmd->file->private_data; -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.10-rc2 commit a6ccb48e13662bcb98282e051512b9686b02d353 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Some io_uring commands can use some inline space in io_kiocb. We have 32 bytes in struct io_uring_cmd, expose it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ca779a61ee5e166e535d70df9c7f07b15d8a0ce.172607208... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring/cmd.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index 2597523dc6ad..7e0f1084815a 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -28,6 +28,15 @@ static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe) return sqe->cmd; } +static inline void io_uring_cmd_private_sz_check(size_t cmd_sz) +{ + BUILD_BUG_ON(cmd_sz > sizeof_field(struct io_uring_cmd, pdu)); +} +#define io_uring_cmd_to_pdu(cmd, pdu_type) ( \ + io_uring_cmd_private_sz_check(sizeof(pdu_type)), \ + ((pdu_type *)&(cmd)->pdu) \ +) + #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit da12d9ab5889b87429d9375748dcd1485b6241f3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- io_uring_try_cancel_uring_cmd() is a part of the cmd handling so let's move it closer to all cmd bits into uring_cmd.c Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/43a3937af4933655f0fd9362c381802f804f43de.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/io_uring.c | 39 +-------------------------------------- io_uring/io_uring.h | 7 +++++++ io_uring/uring_cmd.c | 30 ++++++++++++++++++++++++++++++ io_uring/uring_cmd.h | 3 +++ 4 files changed, 41 insertions(+), 38 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ca6b6516f75c..a52f5b93dec7 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -92,6 +92,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "uring_cmd.h" #include "timeout.h" #include "poll.h" @@ -176,13 +177,6 @@ static struct ctl_table kernel_io_uring_disabled_table[] = { }; #endif -static inline void io_submit_flush_completions(struct io_ring_ctx *ctx) -{ - if (!wq_list_empty(&ctx->submit_state.compl_reqs) || - ctx->submit_state.cqes_count) - __io_submit_flush_completions(ctx); -} - static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx) { return ctx->cached_cq_tail - READ_ONCE(ctx->rings->cq.head); @@ -3399,37 +3393,6 @@ static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx) return ret; } -static bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, - struct task_struct *task, bool cancel_all) -{ - struct hlist_node *tmp; - struct io_kiocb *req; - bool ret = false; - - lockdep_assert_held(&ctx->uring_lock); - - hlist_for_each_entry_safe(req, tmp, &ctx->cancelable_uring_cmd, - hash_node) { - struct io_uring_cmd *cmd = io_kiocb_to_cmd(req, - struct io_uring_cmd); - struct file *file = req->file; - - if (!cancel_all && req->task != task) - continue; - - if (cmd->flags & IORING_URING_CMD_CANCELABLE) { - /* ->sqe isn't available if no async data */ - if (!req_has_async_data(req)) - cmd->sqe = NULL; - file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL); - ret = true; - } - } - io_submit_flush_completions(ctx); - - return ret; -} - static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, struct task_struct *task, bool cancel_all) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index c8d90d90b6bf..9780b94eaed4 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -120,6 +120,13 @@ static inline void io_req_task_work_add(struct io_kiocb *req) __io_req_task_work_add(req, 0); } +static inline void io_submit_flush_completions(struct io_ring_ctx *ctx) +{ + if (!wq_list_empty(&ctx->submit_state.compl_reqs) || + ctx->submit_state.cqes_count) + __io_submit_flush_completions(ctx); +} + #define io_for_each_link(pos, head) \ for (pos = (head); pos; pos = pos->link) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index edc831cc34ed..ee812c539886 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -13,6 +13,36 @@ #include "rsrc.h" #include "uring_cmd.h" +bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, + struct task_struct *task, bool cancel_all) +{ + struct hlist_node *tmp; + struct io_kiocb *req; + bool ret = false; + + lockdep_assert_held(&ctx->uring_lock); + + hlist_for_each_entry_safe(req, tmp, &ctx->cancelable_uring_cmd, + hash_node) { + struct io_uring_cmd *cmd = io_kiocb_to_cmd(req, + struct io_uring_cmd); + struct file *file = req->file; + + if (!cancel_all && req->task != task) + continue; + + if (cmd->flags & IORING_URING_CMD_CANCELABLE) { + /* ->sqe isn't available if no async data */ + if (!req_has_async_data(req)) + cmd->sqe = NULL; + file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL); + ret = true; + } + } + io_submit_flush_completions(ctx); + return ret; +} + static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags) { diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h index 8117684ec3ca..7356bf9aa655 100644 --- a/io_uring/uring_cmd.h +++ b/io_uring/uring_cmd.h @@ -3,3 +3,6 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags); int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_uring_cmd_prep_async(struct io_kiocb *req); + +bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, + struct task_struct *task, bool cancel_all); \ No newline at end of file -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 6edd953b6ec758c98e9dba7234634831f1f6510d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- io_uring cmd converts struct io_tw_state to issue_flags and later back to io_tw_state, it's awfully ill-fated, not to mention that intermediate issue_flags state is not correct. Get rid of the last conversion, drag through tw everything that came with IO_URING_F_UNLOCKED, and replace io_req_complete_defer() with a direct call to io_req_complete_defer(), at least for the time being. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/c53fa3df749752bd058cf6f824a90704822d6bcc.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/uring_cmd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index ee812c539886..451f4c7bb423 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -129,11 +129,11 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2, if (req->ctx->flags & IORING_SETUP_IOPOLL) { /* order with io_iopoll_req_issued() checking ->iopoll_complete */ smp_store_release(&req->iopoll_completed, 1); + } else if (!(issue_flags & IO_URING_F_UNLOCKED)) { + io_req_complete_defer(req); } else { - struct io_tw_state ts = { - .locked = !(issue_flags & IO_URING_F_UNLOCKED), - }; - io_req_task_complete(req, &ts); + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); } } EXPORT_SYMBOL_GPL(io_uring_cmd_done); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit e1eef2e56cb0db143c731b1cdc220980256d2d99 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- !IO_URING_F_UNLOCKED does not translate to availability of the deferred completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should pass and look for to use io_req_complete_defer() and other variants. Luckily, it's not a real problem as two wrongs actually made it right, at least as far as io_uring_cmd_work() goes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/aef76d34fe9410df8ecc42a14544fd76cd9d8b9e.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/uring_cmd.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 451f4c7bb423..d0a4c360478c 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -35,7 +35,8 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, /* ->sqe isn't available if no async data */ if (!req_has_async_data(req)) cmd->sqe = NULL; - file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL); + file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | + IO_URING_F_COMPLETE_DEFER); ret = true; } } @@ -85,7 +86,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable); static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED; + unsigned issue_flags = IO_URING_F_UNLOCKED; + + /* locked task_work executor checks the deffered list completion */ + if (ts->locked) + issue_flags = IO_URING_F_COMPLETE_DEFER; ioucmd->task_work_cb(ioucmd, issue_flags); } @@ -129,7 +134,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2, if (req->ctx->flags & IORING_SETUP_IOPOLL) { /* order with io_iopoll_req_issued() checking ->iopoll_complete */ smp_store_release(&req->iopoll_completed, 1); - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) { + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) { + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED)) + return; io_req_complete_defer(req); } else { req->io_task_work.func = io_req_task_complete; -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 36a005b9c66eade68f4235e612d733510a8d52ff category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Add comments warning users that they're only allowed to pass issue_flags that were given from io_uring. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/82ff8a45f2c3eb5f3a04a33f0692e5e4a1320455.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring/cmd.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index 7e0f1084815a..753083a0c347 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -40,12 +40,25 @@ static inline void io_uring_cmd_private_sz_check(size_t cmd_sz) #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); + +/* + * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd + * and the corresponding io_uring request. + * + * Note: the caller should never hard code @issue_flags and is only allowed + * to pass the mask provided by the core io_uring code. + */ void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, unsigned issue_flags); + void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, void (*task_work_cb)(struct io_uring_cmd *, unsigned), unsigned flags); +/* + * Note: the caller should never hard code @issue_flags and only use the + * mask provided by the core io_uring code. + */ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd, unsigned int issue_flags); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 92219afb980e01d88a1ac6d8fe01dcae255c4279 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- We can run normal task_work without locking the ctx, however we try to lock anyway and most handlers prefer or require it locked. It might have been interesting to multi-submitter ring with high contention completing async read/write requests via task_work, however that will still need to go through io_req_complete_post() and potentially take the lock for rsrc node putting or some other case. In other words, it's hard to care about it, so alawys force the locking. The case described would also because of various io_uring caches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/6ae858f2ef562e6ed9f13c60978c0d48926954ba.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/io_uring.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index a52f5b93dec7..4d72b14d8c1a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1213,8 +1213,9 @@ static unsigned int handle_tw_list(struct llist_node *node, if (req->ctx != *ctx) { ctx_flush_and_put(*ctx, ts); *ctx = req->ctx; - /* if not contended, grab and improve batching */ - ts->locked = mutex_trylock(&(*ctx)->uring_lock); + + ts->locked = true; + mutex_lock(&(*ctx)->uring_lock); percpu_ref_get(&(*ctx)->refs); } INDIRECT_CALL_2(req->io_task_work.func, @@ -1439,11 +1440,9 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts, if (io_run_local_work_continue(ctx, ret, min_events)) goto again; - if (ts->locked) { - io_submit_flush_completions(ctx); - if (io_run_local_work_continue(ctx, ret, min_events)) - goto again; - } + io_submit_flush_completions(ctx); + if (io_run_local_work_continue(ctx, ret, min_events)) + goto again; trace_io_uring_local_work_run(ctx, ret, loops); return ret; @@ -1467,14 +1466,12 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx, static int io_run_local_work(struct io_ring_ctx *ctx, int min_events) { - struct io_tw_state ts = {}; + struct io_tw_state ts = { .locked = true }; int ret; - ts.locked = mutex_trylock(&ctx->uring_lock); + mutex_lock(&ctx->uring_lock); ret = __io_run_local_work(ctx, &ts, min_events); - if (ts.locked) - mutex_unlock(&ctx->uring_lock); - + mutex_unlock(&ctx->uring_lock); return ret; } -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 8e5b3b89ecaf6d9295e561c225b35c574a5e0fe7 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring_types.h | 2 -- io_uring/io_uring.c | 31 ++++++++----------------------- io_uring/io_uring.h | 5 +---- io_uring/poll.c | 2 +- io_uring/rw.c | 6 ++---- io_uring/timeout.c | 8 ++------ io_uring/uring_cmd.c | 8 ++------ 7 files changed, 16 insertions(+), 46 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 04928567b51a..123dbcc96d47 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -417,8 +417,6 @@ struct io_ring_ctx { }; struct io_tw_state { - /* ->uring_lock is taken, callbacks can use io_tw_lock to lock it */ - bool locked; }; enum { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4d72b14d8c1a..088d7cf4f77a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -249,14 +249,12 @@ static __cold void io_fallback_req_func(struct work_struct *work) fallback_work.work); struct llist_node *node = llist_del_all(&ctx->fallback_llist); struct io_kiocb *req, *tmp; - struct io_tw_state ts = { .locked = true, }; + struct io_tw_state ts = {}; percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) req->io_task_work.func(req, &ts); - if (WARN_ON_ONCE(!ts.locked)) - return; io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); @@ -1189,11 +1187,9 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts) return; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); - if (ts->locked) { - io_submit_flush_completions(ctx); - mutex_unlock(&ctx->uring_lock); - ts->locked = false; - } + + io_submit_flush_completions(ctx); + mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); } @@ -1213,8 +1209,6 @@ static unsigned int handle_tw_list(struct llist_node *node, if (req->ctx != *ctx) { ctx_flush_and_put(*ctx, ts); *ctx = req->ctx; - - ts->locked = true; mutex_lock(&(*ctx)->uring_lock); percpu_ref_get(&(*ctx)->refs); } @@ -1451,22 +1445,16 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts, static inline int io_run_local_work_locked(struct io_ring_ctx *ctx, int min_events) { - struct io_tw_state ts = { .locked = true, }; - int ret; + struct io_tw_state ts = {}; if (llist_empty(&ctx->work_llist)) return 0; - - ret = __io_run_local_work(ctx, &ts, min_events); - /* shouldn't happen! */ - if (WARN_ON_ONCE(!ts.locked)) - mutex_lock(&ctx->uring_lock); - return ret; + return __io_run_local_work(ctx, &ts, min_events); } static int io_run_local_work(struct io_ring_ctx *ctx, int min_events) { - struct io_tw_state ts = { .locked = true }; + struct io_tw_state ts = {}; int ret; mutex_lock(&ctx->uring_lock); @@ -1696,10 +1684,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min) void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts) { - if (ts->locked) - io_req_complete_defer(req); - else - io_req_complete_post(req, IO_URING_F_UNLOCKED); + io_req_complete_defer(req); } /* diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 9780b94eaed4..0c0dc4abd6d0 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -315,10 +315,7 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx) static inline void io_tw_lock(struct io_ring_ctx *ctx, struct io_tw_state *ts) { - if (!ts->locked) { - mutex_lock(&ctx->uring_lock); - ts->locked = true; - } + lockdep_assert_held(&ctx->uring_lock); } /* diff --git a/io_uring/poll.c b/io_uring/poll.c index cf8e86bc96de..b975b05c3f4e 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -318,7 +318,7 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts) __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events); - if (!io_fill_cqe_req_aux(req, ts->locked, mask, + if (!io_fill_cqe_req_aux(req, true, mask, IORING_CQE_F_MORE)) { io_req_set_res(req, mask, 0); return IOU_POLL_REMOVE_POLL_USE_RES; diff --git a/io_uring/rw.c b/io_uring/rw.c index 780c14bf40a5..efd9f0b63618 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -286,11 +286,9 @@ void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) io_req_io_end(req); - if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED; + if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) + req->cqe.flags |= io_put_kbuf(req, 0); - req->cqe.flags |= io_put_kbuf(req, issue_flags); - } io_req_task_complete(req, ts); } diff --git a/io_uring/timeout.c b/io_uring/timeout.c index 277e22d55c61..7d31b70e4d57 100644 --- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -72,10 +72,7 @@ static void io_timeout_complete(struct io_kiocb *req, struct io_tw_state *ts) struct io_ring_ctx *ctx = req->ctx; if (!io_timeout_finish(timeout, data)) { - bool filled; - filled = io_fill_cqe_req_aux(req, ts->locked, -ETIME, - IORING_CQE_F_MORE); - if (filled) { + if (io_fill_cqe_req_aux(req, true, -ETIME, IORING_CQE_F_MORE)) { /* re-arm timer */ spin_lock_irq(&ctx->timeout_lock); list_add(&timeout->list, ctx->timeout_list.prev); @@ -301,7 +298,6 @@ int io_timeout_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd) static void io_req_task_link_timeout(struct io_kiocb *req, struct io_tw_state *ts) { - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED; struct io_timeout *timeout = io_kiocb_to_cmd(req, struct io_timeout); struct io_kiocb *prev = timeout->prev; int ret = -ENOENT; @@ -313,7 +309,7 @@ static void io_req_task_link_timeout(struct io_kiocb *req, struct io_tw_state *t .data = prev->cqe.user_data, }; - ret = io_try_cancel(req->task->io_uring, &cd, issue_flags); + ret = io_try_cancel(req->task->io_uring, &cd, 0); } io_req_set_res(req, ret ?: -ETIME, 0); io_req_task_complete(req, ts); diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index d0a4c360478c..8142488a1a3e 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -86,13 +86,9 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable); static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); - unsigned issue_flags = IO_URING_F_UNLOCKED; - /* locked task_work executor checks the deffered list completion */ - if (ts->locked) - issue_flags = IO_URING_F_COMPLETE_DEFER; - - ioucmd->task_work_cb(ioucmd, issue_flags); + /* task_work executor checks the deffered list completion */ + ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER); } void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit e5c12945be5016d681ff305ea7306fef5902219d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true. Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c io_uring/net.c io_uring/rw.c [Context differences in first two files. And in rw.c, IORING_OP_READ_MULTISHOT is not supported yet.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/io_uring.c | 15 +++------------ io_uring/io_uring.h | 2 +- io_uring/net.c | 6 ++---- io_uring/poll.c | 3 +-- io_uring/timeout.c | 2 +- 5 files changed, 8 insertions(+), 20 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 088d7cf4f77a..c3e11fcd5170 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -941,38 +941,29 @@ static void __io_flush_post_cqes(struct io_ring_ctx *ctx) state->cqes_count = 0; } -static bool __io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, - bool allow_overflow) +bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags) { bool filled; io_cq_lock(ctx); filled = io_fill_cqe_aux(ctx, user_data, res, cflags); - if (!filled && allow_overflow) + if (!filled) filled = io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0); io_cq_unlock_post(ctx); return filled; } -bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags) -{ - return __io_post_aux_cqe(ctx, user_data, res, cflags, true); -} - /* * A helper for multishot requests posting additional CQEs. * Should only be used from a task_work including IO_URING_F_MULTISHOT. */ -bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags) +bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) { struct io_ring_ctx *ctx = req->ctx; u64 user_data = req->cqe.user_data; struct io_uring_cqe *cqe; - if (!defer) - return __io_post_aux_cqe(ctx, user_data, res, cflags, false); - lockdep_assert_held(&ctx->uring_lock); if (ctx->submit_state.cqes_count == ARRAY_SIZE(ctx->completion_cqes)) { diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 0c0dc4abd6d0..ed7d66e6786b 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -41,7 +41,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx); void io_req_defer_failed(struct io_kiocb *req, s32 res); void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags); bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags); -bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags); +bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); diff --git a/io_uring/net.c b/io_uring/net.c index 1a0e98e19dc0..4906cace45b0 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -697,8 +697,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, * Fill CQE for this receive and see if we should keep trying to * receive from this socket. */ - if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, - *ret, cflags | IORING_CQE_F_MORE)) { + if (io_req_post_cqe(req, *ret, cflags | IORING_CQE_F_MORE)) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE; @@ -1433,8 +1432,7 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags) if (ret < 0) return ret; - if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, - ret, IORING_CQE_F_MORE)) + if (io_req_post_cqe(req, ret, IORING_CQE_F_MORE)) goto retry; io_req_set_res(req, ret, 0); diff --git a/io_uring/poll.c b/io_uring/poll.c index b975b05c3f4e..35c6550b3ed2 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -318,8 +318,7 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts) __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events); - if (!io_fill_cqe_req_aux(req, true, mask, - IORING_CQE_F_MORE)) { + if (!io_req_post_cqe(req, mask, IORING_CQE_F_MORE)) { io_req_set_res(req, mask, 0); return IOU_POLL_REMOVE_POLL_USE_RES; } diff --git a/io_uring/timeout.c b/io_uring/timeout.c index 7d31b70e4d57..63c99962b9f3 100644 --- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -72,7 +72,7 @@ static void io_timeout_complete(struct io_kiocb *req, struct io_tw_state *ts) struct io_ring_ctx *ctx = req->ctx; if (!io_timeout_finish(timeout, data)) { - if (io_fill_cqe_req_aux(req, true, -ETIME, IORING_CQE_F_MORE)) { + if (io_req_post_cqe(req, -ETIME, IORING_CQE_F_MORE)) { /* re-arm timer */ spin_lock_irq(&ctx->timeout_lock); list_add(&timeout->list, ctx->timeout_list.prev); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 902ce82c2aa130bea5e3feca2d4ae62781865da7 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- io_post_aux_cqe(), which is used for multishot requests, delays completions by putting CQEs into a temporary array for the purpose completion lock/flush batching. DEFER_TASKRUN doesn't need any locking, so for it we can put completions directly into the CQ and defer post completion handling with a flag. That leaves !DEFER_TASKRUN, which is not that interesting / hot for multishot requests, so have conditional locking with deferred flush for them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/b1d05a81fd27aaa2a07f9860af13059e7ad7a890.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring_types.h | 3 +- io_uring/io_uring.c | 62 +++++++--------------------------- io_uring/io_uring.h | 2 +- 3 files changed, 15 insertions(+), 52 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 123dbcc96d47..dc81e9266313 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -203,6 +203,7 @@ struct io_submit_state { bool plug_started; bool need_plug; + bool cq_flush; unsigned short submit_nr; unsigned int cqes_count; struct blk_plug plug; @@ -336,8 +337,6 @@ struct io_ring_ctx { unsigned cq_last_tm_flush; } ____cacheline_aligned_in_smp; - struct io_uring_cqe completion_cqes[16]; - spinlock_t completion_lock; /* IRQ completion list, under ->completion_lock */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index c3e11fcd5170..595f470badf1 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -647,6 +647,12 @@ static inline void __io_cq_lock(struct io_ring_ctx *ctx) spin_lock(&ctx->completion_lock); } +static inline void __io_cq_unlock(struct io_ring_ctx *ctx) +{ + if (!ctx->lockless_cq) + spin_unlock(&ctx->completion_lock); +} + static inline void io_cq_lock(struct io_ring_ctx *ctx) __acquires(ctx->completion_lock) { @@ -916,31 +922,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, return false; } -static void __io_flush_post_cqes(struct io_ring_ctx *ctx) - __must_hold(&ctx->uring_lock) -{ - struct io_submit_state *state = &ctx->submit_state; - unsigned int i; - - lockdep_assert_held(&ctx->uring_lock); - for (i = 0; i < state->cqes_count; i++) { - struct io_uring_cqe *cqe = &ctx->completion_cqes[i]; - - if (!io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags)) { - if (ctx->lockless_cq) { - spin_lock(&ctx->completion_lock); - io_cqring_event_overflow(ctx, cqe->user_data, - cqe->res, cqe->flags, 0, 0); - spin_unlock(&ctx->completion_lock); - } else { - io_cqring_event_overflow(ctx, cqe->user_data, - cqe->res, cqe->flags, 0, 0); - } - } - } - state->cqes_count = 0; -} - bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags) { bool filled; @@ -961,30 +942,15 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) { struct io_ring_ctx *ctx = req->ctx; - u64 user_data = req->cqe.user_data; - struct io_uring_cqe *cqe; + bool posted; lockdep_assert_held(&ctx->uring_lock); - if (ctx->submit_state.cqes_count == ARRAY_SIZE(ctx->completion_cqes)) { - __io_cq_lock(ctx); - __io_flush_post_cqes(ctx); - /* no need to flush - flush is deferred */ - __io_cq_unlock_post(ctx); - } - - /* For defered completions this is not as strict as it is otherwise, - * however it's main job is to prevent unbounded posted completions, - * and in that it works just as well. - */ - if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) - return false; - - cqe = &ctx->completion_cqes[ctx->submit_state.cqes_count++]; - cqe->user_data = user_data; - cqe->res = res; - cqe->flags = cflags; - return true; + __io_cq_lock(ctx); + posted = io_fill_cqe_aux(ctx, req->cqe.user_data, res, cflags); + ctx->submit_state.cq_flush = true; + __io_cq_unlock_post(ctx); + return posted; } static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) @@ -1538,9 +1504,6 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx) struct io_wq_work_node *node; __io_cq_lock(ctx); - /* must come first to preserve CQE ordering in failure cases */ - if (state->cqes_count) - __io_flush_post_cqes(ctx); __wq_list_for_each(node, &state->compl_reqs) { struct io_kiocb *req = container_of(node, struct io_kiocb, comp_list); @@ -1562,6 +1525,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx) io_free_batch_list(ctx, state->compl_reqs.first); INIT_WQ_LIST(&state->compl_reqs); } + ctx->submit_state.cq_flush = false; } static unsigned io_cqring_events(struct io_ring_ctx *ctx) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index ed7d66e6786b..6ec351ffe55c 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -123,7 +123,7 @@ static inline void io_req_task_work_add(struct io_kiocb *req) static inline void io_submit_flush_completions(struct io_ring_ctx *ctx) { if (!wq_list_empty(&ctx->submit_state.compl_reqs) || - ctx->submit_state.cqes_count) + ctx->submit_state.cq_flush) __io_submit_flush_completions(ctx); } -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 23fbdde6205d9351bb52a4b8f11ec38bdbc8561a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- task_work execution is now always locked, and we shouldn't get into io_req_complete_post() from them. That means that complete_post() is always called out of the original task context and we don't even need to check current. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/24ec27f27db0d8f58c974d8118dca1d345314ddc.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 595f470badf1..7fd5eb81a8ba 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1003,7 +1003,7 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) { - if (req->ctx->task_complete && req->ctx->submitter_task != current) { + if (req->ctx->task_complete) { req->io_task_work.func = io_req_task_complete; io_req_task_work_add(req); } else if (!(issue_flags & IO_URING_F_UNLOCKED) || -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit 0667db14e1f029d56243aa2509ebc5f944388200 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Make io_req_complete_post() to push all IORING_SETUP_IOPOLL requests to task_work, it's much cleaner and should normally happen. We couldn't do it before because there was a possibility of looping in complete_post() -> tw -> complete_post() -> ... Also, unexport the function and inline __io_req_complete_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/ea19c032ace3e0dd96ac4d991a063b0188037014.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/io_uring.c | 29 +++++++++++------------------ io_uring/io_uring.h | 1 - 2 files changed, 11 insertions(+), 19 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7fd5eb81a8ba..eed7070323e9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -953,11 +953,21 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) return posted; } -static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) +static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) { struct io_ring_ctx *ctx = req->ctx; struct io_rsrc_node *rsrc_node = NULL; + /* + * Handle special CQ sync cases via task_work. DEFER_TASKRUN requires + * the submitter task context, IOPOLL protects with uring_lock. + */ + if (ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)) { + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); + return; + } + io_cq_lock(ctx); if (!(req->flags & REQ_F_CQE_SKIP)) { if (!io_fill_cqe_req(ctx, req)) @@ -1001,23 +1011,6 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) } } -void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) -{ - if (req->ctx->task_complete) { - req->io_task_work.func = io_req_task_complete; - io_req_task_work_add(req); - } else if (!(issue_flags & IO_URING_F_UNLOCKED) || - !(req->ctx->flags & IORING_SETUP_IOPOLL)) { - __io_req_complete_post(req, issue_flags); - } else { - struct io_ring_ctx *ctx = req->ctx; - - mutex_lock(&ctx->uring_lock); - __io_req_complete_post(req, issue_flags & ~IO_URING_F_UNLOCKED); - mutex_unlock(&ctx->uring_lock); - } -} - void io_req_defer_failed(struct io_kiocb *req, s32 res) __must_hold(&ctx->uring_lock) { diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 6ec351ffe55c..cec1da0d8803 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -39,7 +39,6 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow); void io_req_cqe_overflow(struct io_kiocb *req); int io_run_task_work_sig(struct io_ring_ctx *ctx); void io_req_defer_failed(struct io_kiocb *req, s32 res); -void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags); bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags); bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.9-rc4 commit c133b3b06b0653036b0c07675c1db0c89467ccdb category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Move CONFIG_PROVE_LOCKING checks inside of io_lockdep_assert_cq_locked() and kill the else branch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/bbf33c429c9f6d7207a8fe66d1a5866ec2c99850.171079918... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: io_uring/io_uring.h [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- io_uring/io_uring.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index cec1da0d8803..cc976cd03585 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -86,9 +86,9 @@ bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, void *io_mem_alloc(size_t size); void io_mem_free(void *ptr); -#if defined(CONFIG_PROVE_LOCKING) static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) { +#if defined(CONFIG_PROVE_LOCKING) lockdep_assert(in_task()); if (ctx->flags & IORING_SETUP_IOPOLL) { @@ -107,12 +107,8 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) else lockdep_assert(current == ctx->submitter_task); } -} -#else -static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) -{ -} #endif +} static inline void io_req_task_work_add(struct io_kiocb *req) { -- 2.39.2

From: Pavel Begunkov <asml.silence@gmail.com> mainline inclusion from mainline-v6.10-rc2 commit df3b8ca604f224eb4cd51669416ad4d607682273 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the taks that submitted a request is dying, a task work for that request might get run by a kernel thread or even worse by a half dismantled task. We can't just cancel the task work without running the callback as the cmd might need to do some clean up, so pass a flag instead. If set, it's not safe to access any task resources and the callback is expected to cancel the cmd ASAP. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com> Conflicts: include/linux/io_uring_types.h [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring_types.h | 1 + io_uring/uring_cmd.c | 6 +++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index dc81e9266313..968e7153d9b3 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -35,6 +35,7 @@ enum io_uring_cmd_flags { /* set when uring wants to cancel a previously issued command */ IO_URING_F_CANCEL = (1 << 11), + IO_URING_F_TASK_DEAD = (1 << 13), }; struct io_wq_work_node { diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 8142488a1a3e..e22cd9d2ba2c 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -86,9 +86,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable); static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + unsigned int flags = IO_URING_F_COMPLETE_DEFER; + + if (current->flags & (PF_EXITING | PF_KTHREAD)) + flags |= IO_URING_F_TASK_DEAD; /* task_work executor checks the deffered list completion */ - ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER); + ioucmd->task_work_cb(ioucmd, flags); } void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, -- 2.39.2

From: Jens Axboe <axboe@kernel.dk> mainline inclusion from mainline-v6.10-rc1 commit 26b97668e5339434e5df8ddc7b1898a37a350112 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the intermediate CQE aux cache got removed, any usage of the this member went away. As it isn't used anymore, kill it. Fixes: 902ce82c2aa1 ("io_uring: get rid of intermediate aux cqe caches") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Long Li <leo.lilong@huawei.com> --- include/linux/io_uring_types.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 968e7153d9b3..b82badd42bfd 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -206,7 +206,6 @@ struct io_submit_state { bool need_plug; bool cq_flush; unsigned short submit_nr; - unsigned int cqes_count; struct blk_plug plug; }; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 92270d076115ae81b83f0605703602d2d5a866e2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This function is needed by fuse_uring.c to clean ring queues, so make it non static. Especially in non-static mode the function name 'end_requests' should be prefixed with fuse_ Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/dev.c [fuse resend not merged yet.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 11 +++++------ fs/fuse/fuse_dev_i.h | 14 ++++++++++++++ 2 files changed, 19 insertions(+), 6 deletions(-) create mode 100644 fs/fuse/fuse_dev_i.h diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9d6158d2da46..c83e2b2e0312 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -7,6 +7,7 @@ */ #include "fuse_i.h" +#include "fuse_dev_i.h" #include <linux/init.h> #include <linux/module.h> @@ -33,8 +34,6 @@ MODULE_ALIAS("devname:fuse"); static struct kmem_cache *fuse_req_cachep; -static void end_requests(struct list_head *head); - static struct fuse_dev *fuse_get_dev(struct file *file) { /* @@ -1896,7 +1895,7 @@ static void fuse_resend(struct fuse_conn *fc) spin_unlock(&fiq->lock); list_for_each_entry(req, &to_queue, list) clear_bit(FR_PENDING, &req->flags); - end_requests(&to_queue); + fuse_dev_end_requests(&to_queue); return; } /* iq and pq requests are both oldest to newest */ @@ -2215,7 +2214,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait) } /* Abort all requests on the given list (pending or processing) */ -static void end_requests(struct list_head *head) +void fuse_dev_end_requests(struct list_head *head) { while (!list_empty(head)) { struct fuse_req *req; @@ -2318,7 +2317,7 @@ void fuse_abort_conn(struct fuse_conn *fc) wake_up_all(&fc->blocked_waitq); spin_unlock(&fc->lock); - end_requests(&to_end); + fuse_dev_end_requests(&to_end); } else { spin_unlock(&fc->lock); } @@ -2348,7 +2347,7 @@ int fuse_dev_release(struct inode *inode, struct file *file) list_splice_init(&fpq->processing[i], &to_end); spin_unlock(&fpq->lock); - end_requests(&to_end); + fuse_dev_end_requests(&to_end); /* Are we the last open device? */ if (atomic_dec_and_test(&fc->dev_count)) { diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h new file mode 100644 index 000000000000..4fcff2223fa6 --- /dev/null +++ b/fs/fuse/fuse_dev_i.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * FUSE: Filesystem in Userspace + * Copyright (C) 2001-2008 Miklos Szeredi <miklos@szeredi.hu> + */ +#ifndef _FS_FUSE_DEV_I_H +#define _FS_FUSE_DEV_I_H + +#include <linux/types.h> + +void fuse_dev_end_requests(struct list_head *head); + +#endif + -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 867d93dcdede5ea453543085d1ed0bf43cde60de category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Another preparation patch, as this function will be needed by fuse/dev.c and fuse/dev_uring.c. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 9 --------- fs/fuse/fuse_dev_i.h | 9 +++++++++ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index c83e2b2e0312..9a1ba0693a5d 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -34,15 +34,6 @@ MODULE_ALIAS("devname:fuse"); static struct kmem_cache *fuse_req_cachep; -static struct fuse_dev *fuse_get_dev(struct file *file) -{ - /* - * Lockless access is OK, because file->private data is set - * once during mount and is valid until the file is released. - */ - return READ_ONCE(file->private_data); -} - static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req) { INIT_LIST_HEAD(&req->list); diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 4fcff2223fa6..e7ea1b21c182 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -8,6 +8,15 @@ #include <linux/types.h> +static inline struct fuse_dev *fuse_get_dev(struct file *file) +{ + /* + * Lockless access is OK, because file->private data is set + * once during mount and is valid until the file is released. + */ + return READ_ONCE(file->private_data); +} + void fuse_dev_end_requests(struct list_head *head); #endif -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 88be7aa98d91f5761f8764273ebd961915e4632d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- These are needed by fuse-over-io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 4 ---- fs/fuse/fuse_dev_i.h | 4 ++++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9a1ba0693a5d..72e2a51df582 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -26,10 +26,6 @@ MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); -/* Ordinary requests have even IDs, while interrupts IDs are odd */ -#define FUSE_INT_REQ_BIT (1ULL << 0) -#define FUSE_REQ_ID_STEP (1ULL << 1) - #define DEFAULT_BG_QUEUE READ static struct kmem_cache *fuse_req_cachep; diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index e7ea1b21c182..08a7e88e0027 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -8,6 +8,10 @@ #include <linux/types.h> +/* Ordinary requests have even IDs, while interrupts IDs are odd */ +#define FUSE_INT_REQ_BIT (1ULL << 0) +#define FUSE_REQ_ID_STEP (1ULL << 1) + static inline struct fuse_dev *fuse_get_dev(struct file *file) { /* -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit a7040a06e4bc6a550ddb2078d59fad311dc0ee3b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- [Add several documentation updates I had missed after renaming functions and also fixes 'make htmldocs'.] Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- Documentation/filesystems/fuse-io-uring.rst | 99 +++++++++++++++++++++ Documentation/filesystems/index.rst | 1 + 2 files changed, 100 insertions(+) create mode 100644 Documentation/filesystems/fuse-io-uring.rst diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst new file mode 100644 index 000000000000..d73dd0dbd238 --- /dev/null +++ b/Documentation/filesystems/fuse-io-uring.rst @@ -0,0 +1,99 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================= +FUSE-over-io-uring design documentation +======================================= + +This documentation covers basic details how the fuse +kernel/userspace communication through io-uring is configured +and works. For generic details about FUSE see fuse.rst. + +This document also covers the current interface, which is +still in development and might change. + +Limitations +=========== +As of now not all requests types are supported through io-uring, userspace +is required to also handle requests through /dev/fuse after io-uring setup +is complete. Specifically notifications (initiated from the daemon side) +and interrupts. + +Fuse io-uring configuration +=========================== + +Fuse kernel requests are queued through the classical /dev/fuse +read/write interface - until io-uring setup is complete. + +In order to set up fuse-over-io-uring fuse-server (user-space) +needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the /dev/fuse +connection file descriptor. Initial submit is with the sub command +FUSE_URING_REQ_REGISTER, which will just register entries to be +available in the kernel. + +Once at least one entry per queue is submitted, kernel starts +to enqueue to ring queues. +Note, every CPU core has its own fuse-io-uring queue. +Userspace handles the CQE/fuse-request and submits the result as +subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes +the requests and also marks the entry available again. If there are +pending requests waiting the request will be immediately submitted +to the daemon again. + +Initial SQE +-----------:: + + | | FUSE filesystem daemon + | | + | | >io_uring_submit() + | | IORING_OP_URING_CMD / + | | FUSE_URING_CMD_REGISTER + | | [wait cqe] + | | >io_uring_wait_cqe() or + | | >io_uring_submit_and_wait() + | | + | >fuse_uring_cmd() | + | >fuse_uring_register() | + + +Sending requests with CQEs +--------------------------:: + + | | FUSE filesystem daemon + | | [waiting for CQEs] + | "rm /mnt/fuse/file" | + | | + | >sys_unlink() | + | >fuse_unlink() | + | [allocate request] | + | >fuse_send_one() | + | ... | + | >fuse_uring_queue_fuse_req | + | [queue request on fg queue] | + | >fuse_uring_add_req_to_ring_ent() | + | ... | + | >fuse_uring_copy_to_ring() | + | >io_uring_cmd_done() | + | >request_wait_answer() | + | [sleep on req->waitq] | + | | [receives and handles CQE] + | | [submit result and fetch next] + | | >io_uring_submit() + | | IORING_OP_URING_CMD/ + | | FUSE_URING_CMD_COMMIT_AND_FETCH + | >fuse_uring_cmd() | + | >fuse_uring_commit_fetch() | + | >fuse_uring_commit() | + | >fuse_uring_copy_from_ring() | + | [ copy the result to the fuse req] | + | >fuse_uring_req_end() | + | >fuse_request_end() | + | [wake up req->waitq] | + | >fuse_uring_next_fuse_req | + | [wait or handle next req] | + | | + | [req->waitq woken up] | + | <fuse_unlink() | + | <sys_unlink() | + + + diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 09cade7eaefc..763facd0bdbb 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -94,6 +94,7 @@ Documentation for filesystem implementations. hpfs fuse fuse-io + fuse-io-uring inotify isofs nilfs2 -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 7ccd86ba3a485a8bc33478776eb7053d9adb7816 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This change sets up FUSE operations to always have headers in args.in_args[0], even for opcodes without an actual header. This step prepares for a clean separation of payload from headers, initially it is used by fuse-over-io-uring. For opcodes without a header, we use a zero-sized struct as a placeholder. This approach: - Keeps things consistent across all FUSE operations - Will help with payload alignment later - Avoids future issues when header sizes change Op codes that already have an op code specific header do not need modification. Op codes that have neither payload nor op code headers are not modified either (FUSE_READLINK and FUSE_DESTROY). FUSE_BATCH_FORGET already has the header in the right place, but is not using fuse_copy_args - as -over-uring is currently not handling forgets it does not matter for now, but header separation will later need special attention for that op code. Correct the struct fuse_args->in_args array max size. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/dir.c fs/fuse/fuse_i.h [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dax.c | 11 ++++++----- fs/fuse/dev.c | 9 +++++---- fs/fuse/dir.c | 32 ++++++++++++++++++-------------- fs/fuse/fuse_i.h | 15 ++++++++++++++- fs/fuse/xattr.c | 7 ++++--- 5 files changed, 47 insertions(+), 27 deletions(-) diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 12ef91d170bb..44bd30d448e4 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -240,11 +240,12 @@ static int fuse_send_removemapping(struct inode *inode, args.opcode = FUSE_REMOVEMAPPING; args.nodeid = fi->nodeid; - args.in_numargs = 2; - args.in_args[0].size = sizeof(*inargp); - args.in_args[0].value = inargp; - args.in_args[1].size = inargp->count * sizeof(*remove_one); - args.in_args[1].value = remove_one; + args.in_numargs = 3; + fuse_set_zero_arg0(&args); + args.in_args[1].size = sizeof(*inargp); + args.in_args[1].value = inargp; + args.in_args[2].size = inargp->count * sizeof(*remove_one); + args.in_args[2].value = remove_one; return fuse_simple_request(fm, &args); } diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 72e2a51df582..5e44c39c6fc1 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1758,7 +1758,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, args = &ap->args; args->nodeid = outarg->nodeid; args->opcode = FUSE_NOTIFY_REPLY; - args->in_numargs = 2; + args->in_numargs = 3; args->in_pages = true; args->end = fuse_retrieve_end; @@ -1785,9 +1785,10 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, } ra->inarg.offset = outarg->offset; ra->inarg.size = total_len; - args->in_args[0].size = sizeof(ra->inarg); - args->in_args[0].value = &ra->inarg; - args->in_args[1].size = total_len; + fuse_set_zero_arg0(args); + args->in_args[1].size = sizeof(ra->inarg); + args->in_args[1].value = &ra->inarg; + args->in_args[2].size = total_len; err = fuse_simple_notify_reply(fm, args, outarg->notify_unique); if (err) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index edf28f15de51..7570c3e1d8f9 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -175,9 +175,10 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args, memset(outarg, 0, sizeof(struct fuse_entry_out)); args->opcode = FUSE_LOOKUP; args->nodeid = nodeid; - args->in_numargs = 1; - args->in_args[0].size = name->len + 1; - args->in_args[0].value = name->name; + args->in_numargs = 2; + fuse_set_zero_arg0(args); + args->in_args[1].size = name->len + 1; + args->in_args[1].value = name->name; args->out_numargs = 1; args->out_args[0].size = sizeof(struct fuse_entry_out); args->out_args[0].value = outarg; @@ -920,11 +921,12 @@ static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir, FUSE_ARGS(args); args.opcode = FUSE_SYMLINK; - args.in_numargs = 2; - args.in_args[0].size = entry->d_name.len + 1; - args.in_args[0].value = entry->d_name.name; - args.in_args[1].size = len; - args.in_args[1].value = link; + args.in_numargs = 3; + fuse_set_zero_arg0(&args); + args.in_args[1].size = entry->d_name.len + 1; + args.in_args[1].value = entry->d_name.name; + args.in_args[2].size = len; + args.in_args[2].value = link; return create_new_entry(fm, &args, dir, entry, S_IFLNK); } @@ -984,9 +986,10 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry) args.opcode = FUSE_UNLINK; args.nodeid = get_node_id(dir); - args.in_numargs = 1; - args.in_args[0].size = entry->d_name.len + 1; - args.in_args[0].value = entry->d_name.name; + args.in_numargs = 2; + fuse_set_zero_arg0(&args); + args.in_args[1].size = entry->d_name.len + 1; + args.in_args[1].value = entry->d_name.name; err = fuse_simple_request(fm, &args); if (!err) { fuse_dir_changed(dir); @@ -1007,9 +1010,10 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry) args.opcode = FUSE_RMDIR; args.nodeid = get_node_id(dir); - args.in_numargs = 1; - args.in_args[0].size = entry->d_name.len + 1; - args.in_args[0].value = entry->d_name.name; + args.in_numargs = 2; + fuse_set_zero_arg0(&args); + args.in_args[1].size = entry->d_name.len + 1; + args.in_args[1].value = entry->d_name.name; err = fuse_simple_request(fm, &args); if (!err) { fuse_dir_changed(dir); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 665e89d8ea5b..b352b02c3a10 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -321,7 +321,7 @@ struct fuse_args { bool may_block:1; bool is_ext:1; bool invalidate_vmap:1; - struct fuse_in_arg in_args[3]; + struct fuse_in_arg in_args[4]; struct fuse_arg out_args[2]; void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error); /* Used for kvec iter backed by vmalloc address */ @@ -978,6 +978,19 @@ struct fuse_mount { struct rcu_head rcu; }; +/* + * Empty header for FUSE opcodes without specific header needs. + * Used as a placeholder in args->in_args[0] for consistency + * across all FUSE operations, simplifying request handling. + */ +struct fuse_zero_header {}; + +static inline void fuse_set_zero_arg0(struct fuse_args *args) +{ + args->in_args[0].size = sizeof(struct fuse_zero_header); + args->in_args[0].value = NULL; +} + static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb) { return sb->s_fs_info; diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c index 690b9aadceaa..981e6b63b0fd 100644 --- a/fs/fuse/xattr.c +++ b/fs/fuse/xattr.c @@ -164,9 +164,10 @@ int fuse_removexattr(struct inode *inode, const char *name) args.opcode = FUSE_REMOVEXATTR; args.nodeid = get_node_id(inode); - args.in_numargs = 1; - args.in_args[0].size = strlen(name) + 1; - args.in_args[0].value = name; + args.in_numargs = 2; + fuse_set_zero_arg0(&args); + args.in_args[1].size = strlen(name) + 1; + args.in_args[1].value = name; err = fuse_simple_request(fm, &args); if (err == -ENOSYS) { fm->fc->no_removexattr = 1; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 24fe962c86f55347385933a1b06ca71b60854690 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD). For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue entries. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/Makefile fs/fuse/fuse_i.h include/uapi/linux/fuse.h [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/Kconfig | 12 ++ fs/fuse/Makefile | 1 + fs/fuse/dev_uring.c | 326 ++++++++++++++++++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 113 +++++++++++++ fs/fuse/fuse_i.h | 5 + fs/fuse/inode.c | 10 ++ include/uapi/linux/fuse.h | 77 ++++++++- 7 files changed, 543 insertions(+), 1 deletion(-) create mode 100644 fs/fuse/dev_uring.c create mode 100644 fs/fuse/dev_uring_i.h diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig index 8674dbfbe59d..ca215a3cba3e 100644 --- a/fs/fuse/Kconfig +++ b/fs/fuse/Kconfig @@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH to be performed directly on a backing file. If you want to allow passthrough operations, answer Y. + +config FUSE_IO_URING + bool "FUSE communication over io-uring" + default y + depends on FUSE_FS + depends on IO_URING + help + This allows sending FUSE requests over the io-uring interface and + also adds request core affinity. + + If you want to allow fuse server/client communication through io-uring, + answer Y diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index dabc852a7ff1..dfc91845445f 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -11,6 +11,7 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o fuse-y += iomode.o fuse-$(CONFIG_FUSE_DAX) += dax.o fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o +fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o fuse-$(CONFIG_SYSCTL) += sysctl.o virtiofs-y := virtio_fs.o diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c new file mode 100644 index 000000000000..42092a234570 --- /dev/null +++ b/fs/fuse/dev_uring.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * FUSE: Filesystem in Userspace + * Copyright (c) 2023-2024 DataDirect Networks. + */ + +#include "fuse_i.h" +#include "dev_uring_i.h" +#include "fuse_dev_i.h" + +#include <linux/fs.h> +#include <linux/io_uring/cmd.h> + +static bool __read_mostly enable_uring; +module_param(enable_uring, bool, 0644); +MODULE_PARM_DESC(enable_uring, + "Enable userspace communication through io-uring"); + +#define FUSE_URING_IOV_SEGS 2 /* header and payload */ + + +bool fuse_uring_enabled(void) +{ + return enable_uring; +} + +void fuse_uring_destruct(struct fuse_conn *fc) +{ + struct fuse_ring *ring = fc->ring; + int qid; + + if (!ring) + return; + + for (qid = 0; qid < ring->nr_queues; qid++) { + struct fuse_ring_queue *queue = ring->queues[qid]; + + if (!queue) + continue; + + WARN_ON(!list_empty(&queue->ent_avail_queue)); + WARN_ON(!list_empty(&queue->ent_commit_queue)); + + kfree(queue); + ring->queues[qid] = NULL; + } + + kfree(ring->queues); + kfree(ring); + fc->ring = NULL; +} + +/* + * Basic ring setup for this connection based on the provided configuration + */ +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc) +{ + struct fuse_ring *ring; + size_t nr_queues = num_possible_cpus(); + struct fuse_ring *res = NULL; + size_t max_payload_size; + + ring = kzalloc(sizeof(*fc->ring), GFP_KERNEL_ACCOUNT); + if (!ring) + return NULL; + + ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *), + GFP_KERNEL_ACCOUNT); + if (!ring->queues) + goto out_err; + + max_payload_size = max(FUSE_MIN_READ_BUFFER, fc->max_write); + max_payload_size = max(max_payload_size, fc->max_pages * PAGE_SIZE); + + spin_lock(&fc->lock); + if (fc->ring) { + /* race, another thread created the ring in the meantime */ + spin_unlock(&fc->lock); + res = fc->ring; + goto out_err; + } + + fc->ring = ring; + ring->nr_queues = nr_queues; + ring->fc = fc; + ring->max_payload_sz = max_payload_size; + + spin_unlock(&fc->lock); + return ring; + +out_err: + kfree(ring->queues); + kfree(ring); + return res; +} + +static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, + int qid) +{ + struct fuse_conn *fc = ring->fc; + struct fuse_ring_queue *queue; + + queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT); + if (!queue) + return NULL; + queue->qid = qid; + queue->ring = ring; + spin_lock_init(&queue->lock); + + INIT_LIST_HEAD(&queue->ent_avail_queue); + INIT_LIST_HEAD(&queue->ent_commit_queue); + + spin_lock(&fc->lock); + if (ring->queues[qid]) { + spin_unlock(&fc->lock); + kfree(queue); + return ring->queues[qid]; + } + + /* + * write_once and lock as the caller mostly doesn't take the lock at all + */ + WRITE_ONCE(ring->queues[qid], queue); + spin_unlock(&fc->lock); + + return queue; +} + +/* + * Make a ring entry available for fuse_req assignment + */ +static void fuse_uring_ent_avail(struct fuse_ring_ent *ent, + struct fuse_ring_queue *queue) +{ + WARN_ON_ONCE(!ent->cmd); + list_move(&ent->list, &queue->ent_avail_queue); + ent->state = FRRS_AVAILABLE; +} + +/* + * fuse_uring_req_fetch command handling + */ +static void fuse_uring_do_register(struct fuse_ring_ent *ent, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct fuse_ring_queue *queue = ent->queue; + + spin_lock(&queue->lock); + ent->cmd = cmd; + fuse_uring_ent_avail(ent, queue); + spin_unlock(&queue->lock); +} + +/* + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1] + * the payload + */ +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe, + struct iovec iov[FUSE_URING_IOV_SEGS]) +{ + struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr)); + struct iov_iter iter; + ssize_t ret; + + if (sqe->len != FUSE_URING_IOV_SEGS) + return -EINVAL; + + /* + * Direction for buffer access will actually be READ and WRITE, + * using write for the import should include READ access as well. + */ + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS, + FUSE_URING_IOV_SEGS, &iov, &iter); + if (ret < 0) + return ret; + + return 0; +} + +static struct fuse_ring_ent * +fuse_uring_create_ring_ent(struct io_uring_cmd *cmd, + struct fuse_ring_queue *queue) +{ + struct fuse_ring *ring = queue->ring; + struct fuse_ring_ent *ent; + size_t payload_size; + struct iovec iov[FUSE_URING_IOV_SEGS]; + int err; + + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov); + if (err) { + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n", + err); + return ERR_PTR(err); + } + + err = -EINVAL; + if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) { + pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len); + return ERR_PTR(err); + } + + payload_size = iov[1].iov_len; + if (payload_size < ring->max_payload_sz) { + pr_info_ratelimited("Invalid req payload len %zu\n", + payload_size); + return ERR_PTR(err); + } + + err = -ENOMEM; + ent = kzalloc(sizeof(*ent), GFP_KERNEL_ACCOUNT); + if (!ent) + return ERR_PTR(err); + + INIT_LIST_HEAD(&ent->list); + + ent->queue = queue; + ent->headers = iov[0].iov_base; + ent->payload = iov[1].iov_base; + + return ent; +} + +/* + * Register header and payload buffer with the kernel and puts the + * entry as "ready to get fuse requests" on the queue + */ +static int fuse_uring_register(struct io_uring_cmd *cmd, + unsigned int issue_flags, struct fuse_conn *fc) +{ + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe); + struct fuse_ring *ring = fc->ring; + struct fuse_ring_queue *queue; + struct fuse_ring_ent *ent; + int err; + unsigned int qid = READ_ONCE(cmd_req->qid); + + err = -ENOMEM; + if (!ring) { + ring = fuse_uring_create(fc); + if (!ring) + return err; + } + + if (qid >= ring->nr_queues) { + pr_info_ratelimited("fuse: Invalid ring qid %u\n", qid); + return -EINVAL; + } + + queue = ring->queues[qid]; + if (!queue) { + queue = fuse_uring_create_queue(ring, qid); + if (!queue) + return err; + } + + /* + * The created queue above does not need to be destructed in + * case of entry errors below, will be done at ring destruction time. + */ + + ent = fuse_uring_create_ring_ent(cmd, queue); + if (IS_ERR(ent)) + return PTR_ERR(ent); + + fuse_uring_do_register(ent, cmd, issue_flags); + + return 0; +} + +/* + * Entry function from io_uring to handle the given passthrough command + * (op code IORING_OP_URING_CMD) + */ +int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct fuse_dev *fud; + struct fuse_conn *fc; + u32 cmd_op = cmd->cmd_op; + int err; + + if (!enable_uring) { + pr_info_ratelimited("fuse-io-uring is disabled\n"); + return -EOPNOTSUPP; + } + + /* This extra SQE size holds struct fuse_uring_cmd_req */ + if (!(issue_flags & IO_URING_F_SQE128)) + return -EINVAL; + + fud = fuse_get_dev(cmd->file); + if (!fud) { + pr_info_ratelimited("No fuse device found\n"); + return -ENOTCONN; + } + fc = fud->fc; + + if (fc->aborted) + return -ECONNABORTED; + if (!fc->connected) + return -ENOTCONN; + + /* + * fuse_uring_register() needs the ring to be initialized, + * we need to know the max payload size + */ + if (!fc->initialized) + return -EAGAIN; + + switch (cmd_op) { + case FUSE_IO_URING_CMD_REGISTER: + err = fuse_uring_register(cmd, issue_flags, fc); + if (err) { + pr_info_once("FUSE_IO_URING_CMD_REGISTER failed err=%d\n", + err); + return err; + } + break; + default: + return -EINVAL; + } + + return -EIOCBQUEUED; +} diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h new file mode 100644 index 000000000000..ae1536355b36 --- /dev/null +++ b/fs/fuse/dev_uring_i.h @@ -0,0 +1,113 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * FUSE: Filesystem in Userspace + * Copyright (c) 2023-2024 DataDirect Networks. + */ + +#ifndef _FS_FUSE_DEV_URING_I_H +#define _FS_FUSE_DEV_URING_I_H + +#include "fuse_i.h" + +#ifdef CONFIG_FUSE_IO_URING + +enum fuse_ring_req_state { + FRRS_INVALID = 0, + + /* The ring entry received from userspace and it is being processed */ + FRRS_COMMIT, + + /* The ring entry is waiting for new fuse requests */ + FRRS_AVAILABLE, + + /* The ring entry is in or on the way to user space */ + FRRS_USERSPACE, +}; + +/** A fuse ring entry, part of the ring queue */ +struct fuse_ring_ent { + /* userspace buffer */ + struct fuse_uring_req_header __user *headers; + void __user *payload; + + /* the ring queue that owns the request */ + struct fuse_ring_queue *queue; + + /* fields below are protected by queue->lock */ + + struct io_uring_cmd *cmd; + + struct list_head list; + + enum fuse_ring_req_state state; + + struct fuse_req *fuse_req; +}; + +struct fuse_ring_queue { + /* + * back pointer to the main fuse uring structure that holds this + * queue + */ + struct fuse_ring *ring; + + /* queue id, corresponds to the cpu core */ + unsigned int qid; + + /* + * queue lock, taken when any value in the queue changes _and_ also + * a ring entry state changes. + */ + spinlock_t lock; + + /* available ring entries (struct fuse_ring_ent) */ + struct list_head ent_avail_queue; + + /* + * entries in the process of being committed or in the process + * to be sent to userspace + */ + struct list_head ent_commit_queue; +}; + +/** + * Describes if uring is for communication and holds alls the data needed + * for uring communication + */ +struct fuse_ring { + /* back pointer */ + struct fuse_conn *fc; + + /* number of ring queues */ + size_t nr_queues; + + /* maximum payload/arg size */ + size_t max_payload_sz; + + struct fuse_ring_queue **queues; +}; + +bool fuse_uring_enabled(void); +void fuse_uring_destruct(struct fuse_conn *fc); +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); + +#else /* CONFIG_FUSE_IO_URING */ + +struct fuse_ring; + +static inline void fuse_uring_create(struct fuse_conn *fc) +{ +} + +static inline void fuse_uring_destruct(struct fuse_conn *fc) +{ +} + +static inline bool fuse_uring_enabled(void) +{ + return false; +} + +#endif /* CONFIG_FUSE_IO_URING */ + +#endif /* _FS_FUSE_DEV_URING_I_H */ diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index b352b02c3a10..4143f2518ebf 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -950,6 +950,11 @@ struct fuse_conn { struct idr backing_files_map; #endif +#ifdef CONFIG_FUSE_IO_URING + /** uring connection information*/ + struct fuse_ring *ring; +#endif + KABI_RESERVE(1) KABI_RESERVE(2) KABI_RESERVE(3) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 93352ee8717a..19eff83a681a 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -7,6 +7,7 @@ */ #include "fuse_i.h" +#include "dev_uring_i.h" #include <linux/pagemap.h> #include <linux/slab.h> @@ -966,6 +967,8 @@ static void delayed_release(struct rcu_head *p) { struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu); + fuse_uring_destruct(fc); + put_user_ns(fc->user_ns); fc->release(fc); } @@ -1425,6 +1428,13 @@ void fuse_send_init(struct fuse_mount *fm) if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) flags |= FUSE_PASSTHROUGH; + /* + * This is just an information flag for fuse server. No need to check + * the reply - server is either sending IORING_OP_URING_CMD or not. + */ + if (fuse_uring_enabled()) + flags |= FUSE_OVER_IO_URING; + ia->in.flags = flags; ia->in.flags2 = flags >> 32; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index dd1809072e63..1e403b67f6a9 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -217,6 +217,16 @@ * - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag * - add FUSE_NO_EXPORT_SUPPORT init flag * - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag + + * + * 7.42 + * - Add FUSE_OVER_IO_URING and all other io-uring related flags and data + * structures: + * - struct fuse_uring_ent_in_out + * - struct fuse_uring_req_header + * - struct fuse_uring_cmd_req + * - FUSE_URING_IN_OUT_HEADER_SZ + * - FUSE_URING_OP_IN_OUT_SZ + * - enum fuse_uring_cmd */ #ifndef _LINUX_FUSE_H @@ -252,7 +262,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 40 +#define FUSE_KERNEL_MINOR_VERSION 42 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -424,6 +434,7 @@ struct fuse_file_lock { * FUSE_SEPARATE_BACKGROUND: separate background queue for WRITE requests and * the others * FUSE_WRITE_ALIGNMENT: write request is aligned on max_write boundary + * FUSE_OVER_IO_URING: Indicate that client supports io-uring */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -466,6 +477,7 @@ struct fuse_file_lock { #define FUSE_PASSTHROUGH (1ULL << 37) #define FUSE_NO_EXPORT_SUPPORT (1ULL << 38) #define FUSE_HAS_RESEND (1ULL << 39) +#define FUSE_OVER_IO_URING (1ULL << 41) #define FUSE_WRITE_ALIGNMENT (1ULL << 55) #define FUSE_SEPARATE_BACKGROUND (1ULL << 56) /* The 57th bit is left to FUSE_HAS_RECOVERY */ @@ -1192,4 +1204,67 @@ struct fuse_supp_groups { uint32_t groups[]; }; +/** + * Size of the ring buffer header + */ +#define FUSE_URING_IN_OUT_HEADER_SZ 128 +#define FUSE_URING_OP_IN_OUT_SZ 128 + +/* Used as part of the fuse_uring_req_header */ +struct fuse_uring_ent_in_out { + uint64_t flags; + + /* + * commit ID to be used in a reply to a ring request (see also + * struct fuse_uring_cmd_req) + */ + uint64_t commit_id; + + /* size of user payload buffer */ + uint32_t payload_sz; + uint32_t padding; + + uint64_t reserved; +}; + +/** + * Header for all fuse-io-uring requests + */ +struct fuse_uring_req_header { + /* struct fuse_in_header / struct fuse_out_header */ + char in_out[FUSE_URING_IN_OUT_HEADER_SZ]; + + /* per op code header */ + char op_in[FUSE_URING_OP_IN_OUT_SZ]; + + struct fuse_uring_ent_in_out ring_ent_in_out; +}; + +/** + * sqe commands to the kernel + */ +enum fuse_uring_cmd { + FUSE_IO_URING_CMD_INVALID = 0, + + /* register the request buffer and fetch a fuse request */ + FUSE_IO_URING_CMD_REGISTER = 1, + + /* commit fuse request result and fetch next request */ + FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2, +}; + +/** + * In the 80B command area of the SQE. + */ +struct fuse_uring_cmd_req { + uint64_t flags; + + /* entry identifier for commits */ + uint64_t commit_id; + + /* queue the command is for (queue index) */ + uint16_t qid; + uint8_t padding[6]; +}; + #endif /* _LINUX_FUSE_H */ -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit d0f9c62aaf7a98412b46a91fe7daad76b316b3b7 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Move 'struct fuse_copy_state' and fuse_copy_* functions to fuse_dev_i.h to make it available for fuse-io-uring. 'copy_out_args()' is renamed to 'fuse_copy_out_args'. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 30 ++++++++---------------------- fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 22 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5e44c39c6fc1..557a7dabce63 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -698,22 +698,8 @@ static int unlock_request(struct fuse_req *req) return err; } -struct fuse_copy_state { - int write; - struct fuse_req *req; - struct iov_iter *iter; - struct pipe_buffer *pipebufs; - struct pipe_buffer *currbuf; - struct pipe_inode_info *pipe; - unsigned long nr_segs; - struct page *pg; - unsigned len; - unsigned offset; - unsigned move_pages:1; -}; - -static void fuse_copy_init(struct fuse_copy_state *cs, int write, - struct iov_iter *iter) +void fuse_copy_init(struct fuse_copy_state *cs, int write, + struct iov_iter *iter) { memset(cs, 0, sizeof(*cs)); cs->write = write; @@ -1067,9 +1053,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size) } /* Copy request arguments to/from userspace buffer */ -static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, - unsigned argpages, struct fuse_arg *args, - int zeroing) +int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, + unsigned argpages, struct fuse_arg *args, + int zeroing) { int err = 0; unsigned i; @@ -1944,8 +1930,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique) return NULL; } -static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, - unsigned nbytes) +int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, + unsigned nbytes) { unsigned reqsize = sizeof(struct fuse_out_header); @@ -2047,7 +2033,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, if (oh.error) err = nbytes != sizeof(oh) ? -EINVAL : 0; else - err = copy_out_args(cs, req->args, nbytes); + err = fuse_copy_out_args(cs, req->args, nbytes); fuse_copy_finish(cs); spin_lock(&fpq->lock); diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 08a7e88e0027..21eb1bdb492d 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -12,6 +12,23 @@ #define FUSE_INT_REQ_BIT (1ULL << 0) #define FUSE_REQ_ID_STEP (1ULL << 1) +struct fuse_arg; +struct fuse_args; + +struct fuse_copy_state { + int write; + struct fuse_req *req; + struct iov_iter *iter; + struct pipe_buffer *pipebufs; + struct pipe_buffer *currbuf; + struct pipe_inode_info *pipe; + unsigned long nr_segs; + struct page *pg; + unsigned int len; + unsigned int offset; + unsigned int move_pages:1; +}; + static inline struct fuse_dev *fuse_get_dev(struct file *file) { /* @@ -23,5 +40,13 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file) void fuse_dev_end_requests(struct list_head *head); +void fuse_copy_init(struct fuse_copy_state *cs, int write, + struct iov_iter *iter); +int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs, + unsigned int argpages, struct fuse_arg *args, + int zeroing); +int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, + unsigned int nbytes); + #endif -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit f773a7c2c3d934d00542c6471170066d150d152f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Add special fuse-io-uring into the fuse argument copy handler. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 12 +++++++++++- fs/fuse/fuse_dev_i.h | 4 ++++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 557a7dabce63..b40de841389a 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -806,6 +806,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size) *size -= ncpy; cs->len -= ncpy; cs->offset += ncpy; + if (cs->is_uring) + cs->ring.copied_sz += ncpy; + return ncpy; } @@ -1933,7 +1936,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique) int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, unsigned nbytes) { - unsigned reqsize = sizeof(struct fuse_out_header); + + unsigned int reqsize = 0; + + /* + * Uring has all headers separated from args - args is payload only + */ + if (!cs->is_uring) + reqsize = sizeof(struct fuse_out_header); reqsize += fuse_len_args(args->out_numargs, args->out_args); diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 21eb1bdb492d..4a8a4feb2df5 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -27,6 +27,10 @@ struct fuse_copy_state { unsigned int len; unsigned int offset; unsigned int move_pages:1; + unsigned int is_uring:1; + struct { + unsigned int copied_sz; /* copied size into the user buffer */ + } ring; }; static inline struct fuse_dev *fuse_get_dev(struct file *file) -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 3821336530616776414aa7a879640052b89def28 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- fuse-over-io-uring uses existing functions to find requests based on their unique id - make these functions non-static. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 6 +++--- fs/fuse/fuse_dev_i.h | 5 +++++ fs/fuse/fuse_i.h | 5 +++++ fs/fuse/inode.c | 2 +- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index b40de841389a..dc7b5e568145 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -200,7 +200,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq) } EXPORT_SYMBOL_GPL(fuse_get_unique); -static unsigned int fuse_req_hash(u64 unique) +unsigned int fuse_req_hash(u64 unique) { return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS); } @@ -1921,7 +1921,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code, } /* Look up request on processing list by unique ID */ -static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique) +struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique) { unsigned int hash = fuse_req_hash(unique); struct fuse_req *req; @@ -2005,7 +2005,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, spin_lock(&fpq->lock); req = NULL; if (fpq->connected) - req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT); + req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT); err = -ENOENT; if (!req) { diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 4a8a4feb2df5..b64ab84cbc0d 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -14,6 +14,8 @@ struct fuse_arg; struct fuse_args; +struct fuse_pqueue; +struct fuse_req; struct fuse_copy_state { int write; @@ -42,6 +44,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file) return READ_ONCE(file->private_data); } +unsigned int fuse_req_hash(u64 unique); +struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique); + void fuse_dev_end_requests(struct list_head *head); void fuse_copy_init(struct fuse_copy_state *cs, int write, diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 4143f2518ebf..ddc239411257 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1246,6 +1246,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o); */ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc); +/** + * Initialize the fuse processing queue + */ +void fuse_pqueue_init(struct fuse_pqueue *fpq); + /** * Initialize fuse_conn */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 19eff83a681a..4bcdeb94cec4 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -912,7 +912,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq, fiq->priv = priv; } -static void fuse_pqueue_init(struct fuse_pqueue *fpq) +void fuse_pqueue_init(struct fuse_pqueue *fpq) { unsigned int i; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit c090c8abae4b6b77a1bee116aa6c385456ebef96 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This adds support for fuse request completion through ring SQEs (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing the ring entry it becomes available for new fuse requests. Handling of requests through the ring (SQE/CQE handling) is complete now. Fuse request data are copied through the mmaped ring buffer, there is no support for any zero copy yet. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring.c | 441 ++++++++++++++++++++++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 12 ++ fs/fuse/fuse_i.h | 4 + 3 files changed, 457 insertions(+) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 42092a234570..1030c1720990 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -24,6 +24,17 @@ bool fuse_uring_enabled(void) return enable_uring; } +static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req, + int error) +{ + ent->fuse_req = NULL; + if (error) + req->out.h.error = error; + + clear_bit(FR_SENT, &req->flags); + fuse_request_end(req); +} + void fuse_uring_destruct(struct fuse_conn *fc) { struct fuse_ring *ring = fc->ring; @@ -39,8 +50,11 @@ void fuse_uring_destruct(struct fuse_conn *fc) continue; WARN_ON(!list_empty(&queue->ent_avail_queue)); + WARN_ON(!list_empty(&queue->ent_w_req_queue)); WARN_ON(!list_empty(&queue->ent_commit_queue)); + WARN_ON(!list_empty(&queue->ent_in_userspace)); + kfree(queue->fpq.processing); kfree(queue); ring->queues[qid] = NULL; } @@ -99,20 +113,34 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, { struct fuse_conn *fc = ring->fc; struct fuse_ring_queue *queue; + struct list_head *pq; queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT); if (!queue) return NULL; + pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); + if (!pq) { + kfree(queue); + return NULL; + } + queue->qid = qid; queue->ring = ring; spin_lock_init(&queue->lock); INIT_LIST_HEAD(&queue->ent_avail_queue); INIT_LIST_HEAD(&queue->ent_commit_queue); + INIT_LIST_HEAD(&queue->ent_w_req_queue); + INIT_LIST_HEAD(&queue->ent_in_userspace); + INIT_LIST_HEAD(&queue->fuse_req_queue); + + queue->fpq.processing = pq; + fuse_pqueue_init(&queue->fpq); spin_lock(&fc->lock); if (ring->queues[qid]) { spin_unlock(&fc->lock); + kfree(queue->fpq.processing); kfree(queue); return ring->queues[qid]; } @@ -126,6 +154,214 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, return queue; } +/* + * Checks for errors and stores it into the request + */ +static int fuse_uring_out_header_has_err(struct fuse_out_header *oh, + struct fuse_req *req, + struct fuse_conn *fc) +{ + int err; + + err = -EINVAL; + if (oh->unique == 0) { + /* Not supported through io-uring yet */ + pr_warn_once("notify through fuse-io-uring not supported\n"); + goto err; + } + + if (oh->error <= -ERESTARTSYS || oh->error > 0) + goto err; + + if (oh->error) { + err = oh->error; + goto err; + } + + err = -ENOENT; + if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) { + pr_warn_ratelimited("unique mismatch, expected: %llu got %llu\n", + req->in.h.unique, + oh->unique & ~FUSE_INT_REQ_BIT); + goto err; + } + + /* + * Is it an interrupt reply ID? + * XXX: Not supported through fuse-io-uring yet, it should not even + * find the request - should not happen. + */ + WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT); + + err = 0; +err: + return err; +} + +static int fuse_uring_copy_from_ring(struct fuse_ring *ring, + struct fuse_req *req, + struct fuse_ring_ent *ent) +{ + struct fuse_copy_state cs; + struct fuse_args *args = req->args; + struct iov_iter iter; + int err; + struct fuse_uring_ent_in_out ring_in_out; + + err = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out, + sizeof(ring_in_out)); + if (err) + return -EFAULT; + + err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz, + &iter); + if (err) + return err; + + fuse_copy_init(&cs, 0, &iter); + cs.is_uring = 1; + cs.req = req; + + return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz); +} + + /* + * Copy data from the req to the ring buffer + */ +static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req, + struct fuse_ring_ent *ent) +{ + struct fuse_copy_state cs; + struct fuse_args *args = req->args; + struct fuse_in_arg *in_args = args->in_args; + int num_args = args->in_numargs; + int err; + struct iov_iter iter; + struct fuse_uring_ent_in_out ent_in_out = { + .flags = 0, + .commit_id = req->in.h.unique, + }; + + err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter); + if (err) { + pr_info_ratelimited("fuse: Import of user buffer failed\n"); + return err; + } + + fuse_copy_init(&cs, 1, &iter); + cs.is_uring = 1; + cs.req = req; + + if (num_args > 0) { + /* + * Expectation is that the first argument is the per op header. + * Some op code have that as zero size. + */ + if (args->in_args[0].size > 0) { + err = copy_to_user(&ent->headers->op_in, in_args->value, + in_args->size); + if (err) { + pr_info_ratelimited( + "Copying the header failed.\n"); + return -EFAULT; + } + } + in_args++; + num_args--; + } + + /* copy the payload */ + err = fuse_copy_args(&cs, num_args, args->in_pages, + (struct fuse_arg *)in_args, 0); + if (err) { + pr_info_ratelimited("%s fuse_copy_args failed\n", __func__); + return err; + } + + ent_in_out.payload_sz = cs.ring.copied_sz; + err = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out, + sizeof(ent_in_out)); + return err ? -EFAULT : 0; +} + +static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent, + struct fuse_req *req) +{ + struct fuse_ring_queue *queue = ent->queue; + struct fuse_ring *ring = queue->ring; + int err; + + err = -EIO; + if (WARN_ON(ent->state != FRRS_FUSE_REQ)) { + pr_err("qid=%d ring-req=%p invalid state %d on send\n", + queue->qid, ent, ent->state); + return err; + } + + err = -EINVAL; + if (WARN_ON(req->in.h.unique == 0)) + return err; + + /* copy the request */ + err = fuse_uring_args_to_ring(ring, req, ent); + if (unlikely(err)) { + pr_info_ratelimited("Copy to ring failed: %d\n", err); + return err; + } + + /* copy fuse_in_header */ + err = copy_to_user(&ent->headers->in_out, &req->in.h, + sizeof(req->in.h)); + if (err) { + err = -EFAULT; + return err; + } + + return 0; +} + +static int fuse_uring_prepare_send(struct fuse_ring_ent *ent, + struct fuse_req *req) +{ + int err; + + err = fuse_uring_copy_to_ring(ent, req); + if (!err) + set_bit(FR_SENT, &req->flags); + else + fuse_uring_req_end(ent, req, err); + + return err; +} + +/* + * Write data to the ring buffer and send the request to userspace, + * userspace will read it + * This is comparable with classical read(/dev/fuse) + */ +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ent, + struct fuse_req *req, + unsigned int issue_flags) +{ + struct fuse_ring_queue *queue = ent->queue; + int err; + struct io_uring_cmd *cmd; + + err = fuse_uring_prepare_send(ent, req); + if (err) + return err; + + spin_lock(&queue->lock); + cmd = ent->cmd; + ent->cmd = NULL; + ent->state = FRRS_USERSPACE; + list_move(&ent->list, &queue->ent_in_userspace); + spin_unlock(&queue->lock); + + io_uring_cmd_done(cmd, 0, 0, issue_flags); + return 0; +} + /* * Make a ring entry available for fuse_req assignment */ @@ -137,6 +373,203 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ent, ent->state = FRRS_AVAILABLE; } +/* Used to find the request on SQE commit */ +static void fuse_uring_add_to_pq(struct fuse_ring_ent *ent, + struct fuse_req *req) +{ + struct fuse_ring_queue *queue = ent->queue; + struct fuse_pqueue *fpq = &queue->fpq; + unsigned int hash; + + req->ring_entry = ent; + hash = fuse_req_hash(req->in.h.unique); + list_move_tail(&req->list, &fpq->processing[hash]); +} + +/* + * Assign a fuse queue entry to the given entry + */ +static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, + struct fuse_req *req) +{ + struct fuse_ring_queue *queue = ent->queue; + struct fuse_conn *fc = req->fm->fc; + struct fuse_iqueue *fiq = &fc->iq; + + lockdep_assert_held(&queue->lock); + + if (WARN_ON_ONCE(ent->state != FRRS_AVAILABLE && + ent->state != FRRS_COMMIT)) { + pr_warn("%s qid=%d state=%d\n", __func__, ent->queue->qid, + ent->state); + } + + spin_lock(&fiq->lock); + clear_bit(FR_PENDING, &req->flags); + spin_unlock(&fiq->lock); + ent->fuse_req = req; + ent->state = FRRS_FUSE_REQ; + list_move(&ent->list, &queue->ent_w_req_queue); + fuse_uring_add_to_pq(ent, req); +} + +/* Fetch the next fuse request if available */ +static struct fuse_req *fuse_uring_ent_assign_req(struct fuse_ring_ent *ent) + __must_hold(&queue->lock) +{ + struct fuse_req *req; + struct fuse_ring_queue *queue = ent->queue; + struct list_head *req_queue = &queue->fuse_req_queue; + + lockdep_assert_held(&queue->lock); + + /* get and assign the next entry while it is still holding the lock */ + req = list_first_entry_or_null(req_queue, struct fuse_req, list); + if (req) + fuse_uring_add_req_to_ring_ent(ent, req); + + return req; +} + +/* + * Read data from the ring buffer, which user space has written to + * This is comparible with handling of classical write(/dev/fuse). + * Also make the ring request available again for new fuse requests. + */ +static void fuse_uring_commit(struct fuse_ring_ent *ent, struct fuse_req *req, + unsigned int issue_flags) +{ + struct fuse_ring *ring = ent->queue->ring; + struct fuse_conn *fc = ring->fc; + ssize_t err = 0; + + err = copy_from_user(&req->out.h, &ent->headers->in_out, + sizeof(req->out.h)); + if (err) { + req->out.h.error = -EFAULT; + goto out; + } + + err = fuse_uring_out_header_has_err(&req->out.h, req, fc); + if (err) { + /* req->out.h.error already set */ + goto out; + } + + err = fuse_uring_copy_from_ring(ring, req, ent); +out: + fuse_uring_req_end(ent, req, err); +} + +/* + * Get the next fuse req and send it + */ +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ent, + struct fuse_ring_queue *queue, + unsigned int issue_flags) +{ + int err; + struct fuse_req *req; + +retry: + spin_lock(&queue->lock); + fuse_uring_ent_avail(ent, queue); + req = fuse_uring_ent_assign_req(ent); + spin_unlock(&queue->lock); + + if (req) { + err = fuse_uring_send_next_to_ring(ent, req, issue_flags); + if (err) + goto retry; + } +} + +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent) +{ + struct fuse_ring_queue *queue = ent->queue; + + lockdep_assert_held(&queue->lock); + + if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE)) + return -EIO; + + ent->state = FRRS_COMMIT; + list_move(&ent->list, &queue->ent_commit_queue); + + return 0; +} + +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */ +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags, + struct fuse_conn *fc) +{ + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe); + struct fuse_ring_ent *ent; + int err; + struct fuse_ring *ring = fc->ring; + struct fuse_ring_queue *queue; + uint64_t commit_id = READ_ONCE(cmd_req->commit_id); + unsigned int qid = READ_ONCE(cmd_req->qid); + struct fuse_pqueue *fpq; + struct fuse_req *req; + + err = -ENOTCONN; + if (!ring) + return err; + + if (qid >= ring->nr_queues) + return -EINVAL; + + queue = ring->queues[qid]; + if (!queue) + return err; + fpq = &queue->fpq; + + spin_lock(&queue->lock); + /* Find a request based on the unique ID of the fuse request + * This should get revised, as it needs a hash calculation and list + * search. And full struct fuse_pqueue is needed (memory overhead). + * As well as the link from req to ring_ent. + */ + req = fuse_request_find(fpq, commit_id); + err = -ENOENT; + if (!req) { + pr_info("qid=%d commit_id %llu not found\n", queue->qid, + commit_id); + spin_unlock(&queue->lock); + return err; + } + list_del_init(&req->list); + ent = req->ring_entry; + req->ring_entry = NULL; + + err = fuse_ring_ent_set_commit(ent); + if (err != 0) { + pr_info_ratelimited("qid=%d commit_id %llu state %d", + queue->qid, commit_id, ent->state); + spin_unlock(&queue->lock); + req->out.h.error = err; + clear_bit(FR_SENT, &req->flags); + fuse_request_end(req); + return err; + } + + ent->cmd = cmd; + spin_unlock(&queue->lock); + + /* without the queue lock, as other locks are taken */ + fuse_uring_commit(ent, req, issue_flags); + + /* + * Fetching the next request is absolutely required as queued + * fuse requests would otherwise not get processed - committing + * and fetching is done in one step vs legacy fuse, which has separated + * read (fetch request) and write (commit result). + */ + fuse_uring_next_fuse_req(ent, queue, issue_flags); + return 0; +} + /* * fuse_uring_req_fetch command handling */ @@ -318,6 +751,14 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd, return err; } break; + case FUSE_IO_URING_CMD_COMMIT_AND_FETCH: + err = fuse_uring_commit_fetch(cmd, issue_flags, fc); + if (err) { + pr_info_once("FUSE_IO_URING_COMMIT_AND_FETCH failed err=%d\n", + err); + return err; + } + break; default: return -EINVAL; } diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index ae1536355b36..44bf237f0d5a 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -20,6 +20,9 @@ enum fuse_ring_req_state { /* The ring entry is waiting for new fuse requests */ FRRS_AVAILABLE, + /* The ring entry got assigned a fuse req */ + FRRS_FUSE_REQ, + /* The ring entry is in or on the way to user space */ FRRS_USERSPACE, }; @@ -67,7 +70,16 @@ struct fuse_ring_queue { * entries in the process of being committed or in the process * to be sent to userspace */ + struct list_head ent_w_req_queue; struct list_head ent_commit_queue; + + /* entries in userspace */ + struct list_head ent_in_userspace; + + /* fuse requests waiting for an entry slot */ + struct list_head fuse_req_queue; + + struct fuse_pqueue fpq; }; /** diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index ddc239411257..156c86358a9c 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -449,6 +449,10 @@ struct fuse_req { /** fuse_mount this request belongs to */ struct fuse_mount *fm; + +#ifdef CONFIG_FUSE_IO_URING + void *ring_entry; +#endif }; struct fuse_iqueue; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 4a9bfb9b6850fec0685447aed280533cf980de70 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- On teardown struct file_operations::uring_cmd requests need to be completed by calling io_uring_cmd_done(). Not completing all ring entries would result in busy io-uring tasks giving warning messages in intervals and unreleased struct file. Additionally the fuse connection and with that the ring can only get released when all io-uring commands are completed. Completion is done with ring entries that are a) in waiting state for new fuse requests - io_uring_cmd_done is needed b) already in userspace - io_uring_cmd_done through teardown is not needed, the request can just get released. If fuse server is still active and commits such a ring entry, fuse_uring_cmd() already checks if the connection is active and then complete the io-uring itself with -ENOTCONN. I.e. special handling is not needed. This scheme is basically represented by the ring entry state FRRS_WAIT and FRRS_USERSPACE. Entries in state: - FRRS_INIT: No action needed, do not contribute to ring->queue_refs yet - All other states: Are currently processed by other tasks, async teardown is needed and it has to wait for the two states above. It could be also solved without an async teardown task, but would require additional if conditions in hot code paths. Also in my personal opinion the code looks cleaner with async teardown. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 9 ++ fs/fuse/dev_uring.c | 207 ++++++++++++++++++++++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 51 +++++++++++ 3 files changed, 267 insertions(+) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index dc7b5e568145..8bbce43cf31a 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -6,6 +6,7 @@ See the file COPYING. */ +#include "dev_uring_i.h" #include "fuse_i.h" #include "fuse_dev_i.h" @@ -2302,6 +2303,12 @@ void fuse_abort_conn(struct fuse_conn *fc) spin_unlock(&fc->lock); fuse_dev_end_requests(&to_end); + + /* + * fc->lock must not be taken to avoid conflicts with io-uring + * locks + */ + fuse_uring_abort(fc); } else { spin_unlock(&fc->lock); } @@ -2313,6 +2320,8 @@ void fuse_wait_aborted(struct fuse_conn *fc) /* matches implicit memory barrier in fuse_drop_waiting() */ smp_mb(); wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0); + + fuse_uring_wait_stopped_queues(fc); } int fuse_dev_release(struct inode *inode, struct file *file) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 1030c1720990..1161b9aa5e11 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -35,6 +35,37 @@ static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req, fuse_request_end(req); } +/* Abort all list queued request on the given ring queue */ +static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue) +{ + struct fuse_req *req; + LIST_HEAD(req_list); + + spin_lock(&queue->lock); + list_for_each_entry(req, &queue->fuse_req_queue, list) + clear_bit(FR_PENDING, &req->flags); + list_splice_init(&queue->fuse_req_queue, &req_list); + spin_unlock(&queue->lock); + + /* must not hold queue lock to avoid order issues with fi->lock */ + fuse_dev_end_requests(&req_list); +} + +void fuse_uring_abort_end_requests(struct fuse_ring *ring) +{ + int qid; + struct fuse_ring_queue *queue; + + for (qid = 0; qid < ring->nr_queues; qid++) { + queue = READ_ONCE(ring->queues[qid]); + if (!queue) + continue; + + queue->stopped = true; + fuse_uring_abort_end_queue_requests(queue); + } +} + void fuse_uring_destruct(struct fuse_conn *fc) { struct fuse_ring *ring = fc->ring; @@ -94,10 +125,13 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc) goto out_err; } + init_waitqueue_head(&ring->stop_waitq); + fc->ring = ring; ring->nr_queues = nr_queues; ring->fc = fc; ring->max_payload_sz = max_payload_size; + atomic_set(&ring->queue_refs, 0); spin_unlock(&fc->lock); return ring; @@ -154,6 +188,175 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, return queue; } +static void fuse_uring_stop_fuse_req_end(struct fuse_req *req) +{ + clear_bit(FR_SENT, &req->flags); + req->out.h.error = -ECONNABORTED; + fuse_request_end(req); +} + +/* + * Release a request/entry on connection tear down + */ +static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent) +{ + struct fuse_req *req; + struct io_uring_cmd *cmd; + + struct fuse_ring_queue *queue = ent->queue; + + spin_lock(&queue->lock); + cmd = ent->cmd; + ent->cmd = NULL; + req = ent->fuse_req; + ent->fuse_req = NULL; + if (req) { + /* remove entry from queue->fpq->processing */ + list_del_init(&req->list); + } + spin_unlock(&queue->lock); + + if (cmd) + io_uring_cmd_done(cmd, -ENOTCONN, 0, IO_URING_F_UNLOCKED); + + if (req) + fuse_uring_stop_fuse_req_end(req); + + list_del_init(&ent->list); + kfree(ent); +} + +static void fuse_uring_stop_list_entries(struct list_head *head, + struct fuse_ring_queue *queue, + enum fuse_ring_req_state exp_state) +{ + struct fuse_ring *ring = queue->ring; + struct fuse_ring_ent *ent, *next; + ssize_t queue_refs = SSIZE_MAX; + LIST_HEAD(to_teardown); + + spin_lock(&queue->lock); + list_for_each_entry_safe(ent, next, head, list) { + if (ent->state != exp_state) { + pr_warn("entry teardown qid=%d state=%d expected=%d", + queue->qid, ent->state, exp_state); + continue; + } + + list_move(&ent->list, &to_teardown); + } + spin_unlock(&queue->lock); + + /* no queue lock to avoid lock order issues */ + list_for_each_entry_safe(ent, next, &to_teardown, list) { + fuse_uring_entry_teardown(ent); + queue_refs = atomic_dec_return(&ring->queue_refs); + WARN_ON_ONCE(queue_refs < 0); + } +} + +static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue) +{ + fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue, + FRRS_USERSPACE); + fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue, + FRRS_AVAILABLE); +} + +/* + * Log state debug info + */ +static void fuse_uring_log_ent_state(struct fuse_ring *ring) +{ + int qid; + struct fuse_ring_ent *ent; + + for (qid = 0; qid < ring->nr_queues; qid++) { + struct fuse_ring_queue *queue = ring->queues[qid]; + + if (!queue) + continue; + + spin_lock(&queue->lock); + /* + * Log entries from the intermediate queue, the other queues + * should be empty + */ + list_for_each_entry(ent, &queue->ent_w_req_queue, list) { + pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n", + ring, qid, ent, ent->state); + } + list_for_each_entry(ent, &queue->ent_commit_queue, list) { + pr_info(" ent-commit-queue ring=%p qid=%d ent=%p state=%d\n", + ring, qid, ent, ent->state); + } + spin_unlock(&queue->lock); + } + ring->stop_debug_log = 1; +} + +static void fuse_uring_async_stop_queues(struct work_struct *work) +{ + int qid; + struct fuse_ring *ring = + container_of(work, struct fuse_ring, async_teardown_work.work); + + /* XXX code dup */ + for (qid = 0; qid < ring->nr_queues; qid++) { + struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]); + + if (!queue) + continue; + + fuse_uring_teardown_entries(queue); + } + + /* + * Some ring entries might be in the middle of IO operations, + * i.e. in process to get handled by file_operations::uring_cmd + * or on the way to userspace - we could handle that with conditions in + * run time code, but easier/cleaner to have an async tear down handler + * If there are still queue references left + */ + if (atomic_read(&ring->queue_refs) > 0) { + if (time_after(jiffies, + ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT)) + fuse_uring_log_ent_state(ring); + + schedule_delayed_work(&ring->async_teardown_work, + FUSE_URING_TEARDOWN_INTERVAL); + } else { + wake_up_all(&ring->stop_waitq); + } +} + +/* + * Stop the ring queues + */ +void fuse_uring_stop_queues(struct fuse_ring *ring) +{ + int qid; + + for (qid = 0; qid < ring->nr_queues; qid++) { + struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]); + + if (!queue) + continue; + + fuse_uring_teardown_entries(queue); + } + + if (atomic_read(&ring->queue_refs) > 0) { + ring->teardown_time = jiffies; + INIT_DELAYED_WORK(&ring->async_teardown_work, + fuse_uring_async_stop_queues); + schedule_delayed_work(&ring->async_teardown_work, + FUSE_URING_TEARDOWN_INTERVAL); + } else { + wake_up_all(&ring->stop_waitq); + } +} + /* * Checks for errors and stores it into the request */ @@ -525,6 +728,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags, return err; fpq = &queue->fpq; + if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped)) + return err; + spin_lock(&queue->lock); /* Find a request based on the unique ID of the fuse request * This should get revised, as it needs a hash calculation and list @@ -652,6 +858,7 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd, ent->headers = iov[0].iov_base; ent->payload = iov[1].iov_base; + atomic_inc(&ring->queue_refs); return ent; } diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index 44bf237f0d5a..a4316e118cbd 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -11,6 +11,9 @@ #ifdef CONFIG_FUSE_IO_URING +#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ) +#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20) + enum fuse_ring_req_state { FRRS_INVALID = 0, @@ -80,6 +83,8 @@ struct fuse_ring_queue { struct list_head fuse_req_queue; struct fuse_pqueue fpq; + + bool stopped; }; /** @@ -97,12 +102,51 @@ struct fuse_ring { size_t max_payload_sz; struct fuse_ring_queue **queues; + + /* + * Log ring entry states on stop when entries cannot be released + */ + unsigned int stop_debug_log : 1; + + wait_queue_head_t stop_waitq; + + /* async tear down */ + struct delayed_work async_teardown_work; + + /* log */ + unsigned long teardown_time; + + atomic_t queue_refs; }; bool fuse_uring_enabled(void); void fuse_uring_destruct(struct fuse_conn *fc); +void fuse_uring_stop_queues(struct fuse_ring *ring); +void fuse_uring_abort_end_requests(struct fuse_ring *ring); int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); +static inline void fuse_uring_abort(struct fuse_conn *fc) +{ + struct fuse_ring *ring = fc->ring; + + if (ring == NULL) + return; + + if (atomic_read(&ring->queue_refs) > 0) { + fuse_uring_abort_end_requests(ring); + fuse_uring_stop_queues(ring); + } +} + +static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc) +{ + struct fuse_ring *ring = fc->ring; + + if (ring) + wait_event(ring->stop_waitq, + atomic_read(&ring->queue_refs) == 0); +} + #else /* CONFIG_FUSE_IO_URING */ struct fuse_ring; @@ -120,6 +164,13 @@ static inline bool fuse_uring_enabled(void) return false; } +static inline void fuse_uring_abort(struct fuse_conn *fc) +{ +} + +static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc) +{ +} #endif /* CONFIG_FUSE_IO_URING */ #endif /* _FS_FUSE_DEV_URING_I_H */ -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit ba74ba571189668697a8d8da906ad6d44762ebc6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- These functions are also needed by fuse-over-io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 5 +++-- fs/fuse/fuse_dev_i.h | 5 +++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 8bbce43cf31a..fa9425098126 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -217,7 +217,8 @@ __releases(fiq->lock) spin_unlock(&fiq->lock); } -static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget) +void fuse_dev_queue_forget(struct fuse_iqueue *fiq, + struct fuse_forget_link *forget) { spin_lock(&fiq->lock); if (fiq->connected) { @@ -230,7 +231,7 @@ static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_li } } -static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) +void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) { spin_lock(&fiq->lock); if (list_empty(&req->intr_entry)) { diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index b64ab84cbc0d..3b2bfe1248d3 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -16,6 +16,8 @@ struct fuse_arg; struct fuse_args; struct fuse_pqueue; struct fuse_req; +struct fuse_iqueue; +struct fuse_forget_link; struct fuse_copy_state { int write; @@ -56,6 +58,9 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs, int zeroing); int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, unsigned int nbytes); +void fuse_dev_queue_forget(struct fuse_iqueue *fiq, + struct fuse_forget_link *forget); +void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req); #endif -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit c2c9af9a0b13261c36909036057a116f2edb5e1a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This prepares queueing and sending foreground requests through io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring.c | 180 ++++++++++++++++++++++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 8 ++ 2 files changed, 188 insertions(+) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 1161b9aa5e11..728000434589 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -24,6 +24,29 @@ bool fuse_uring_enabled(void) return enable_uring; } +struct fuse_uring_pdu { + struct fuse_ring_ent *ent; +}; + +static const struct fuse_iqueue_ops fuse_io_uring_ops; + +static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd, + struct fuse_ring_ent *ring_ent) +{ + struct fuse_uring_pdu *pdu = + io_uring_cmd_to_pdu(cmd, struct fuse_uring_pdu); + + pdu->ent = ring_ent; +} + +static struct fuse_ring_ent *uring_cmd_to_ring_ent(struct io_uring_cmd *cmd) +{ + struct fuse_uring_pdu *pdu = + io_uring_cmd_to_pdu(cmd, struct fuse_uring_pdu); + + return pdu->ent; +} + static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req, int error) { @@ -776,6 +799,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags, return 0; } +static bool is_ring_ready(struct fuse_ring *ring, int current_qid) +{ + int qid; + struct fuse_ring_queue *queue; + bool ready = true; + + for (qid = 0; qid < ring->nr_queues && ready; qid++) { + if (current_qid == qid) + continue; + + queue = ring->queues[qid]; + if (!queue) { + ready = false; + break; + } + + spin_lock(&queue->lock); + if (list_empty(&queue->ent_avail_queue)) + ready = false; + spin_unlock(&queue->lock); + } + + return ready; +} + /* * fuse_uring_req_fetch command handling */ @@ -784,11 +832,23 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent, unsigned int issue_flags) { struct fuse_ring_queue *queue = ent->queue; + struct fuse_ring *ring = queue->ring; + struct fuse_conn *fc = ring->fc; + struct fuse_iqueue *fiq = &fc->iq; spin_lock(&queue->lock); ent->cmd = cmd; fuse_uring_ent_avail(ent, queue); spin_unlock(&queue->lock); + + if (!ring->ready) { + bool ready = is_ring_ready(ring, queue->qid); + + if (ready) { + WRITE_ONCE(fiq->ops, &fuse_io_uring_ops); + WRITE_ONCE(ring->ready, true); + } + } } /* @@ -972,3 +1032,123 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd, return -EIOCBQUEUED; } + +static void fuse_uring_send(struct fuse_ring_ent *ent, struct io_uring_cmd *cmd, + ssize_t ret, unsigned int issue_flags) +{ + struct fuse_ring_queue *queue = ent->queue; + + spin_lock(&queue->lock); + ent->state = FRRS_USERSPACE; + list_move(&ent->list, &queue->ent_in_userspace); + ent->cmd = NULL; + spin_unlock(&queue->lock); + + io_uring_cmd_done(cmd, ret, 0, issue_flags); +} + +/* + * This prepares and sends the ring request in fuse-uring task context. + * User buffers are not mapped yet - the application does not have permission + * to write to it - this has to be executed in ring task context. + */ +static void fuse_uring_send_in_task(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd); + struct fuse_ring_queue *queue = ent->queue; + int err; + + if (!(issue_flags & IO_URING_F_TASK_DEAD)) { + err = fuse_uring_prepare_send(ent, ent->fuse_req); + if (err) { + fuse_uring_next_fuse_req(ent, queue, issue_flags); + return; + } + } else { + err = -ECANCELED; + } + + fuse_uring_send(ent, cmd, err, issue_flags); +} + +static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring) +{ + unsigned int qid; + struct fuse_ring_queue *queue; + + qid = task_cpu(current); + + if (WARN_ONCE(qid >= ring->nr_queues, + "Core number (%u) exceeds nr queues (%zu)\n", qid, + ring->nr_queues)) + qid = 0; + + queue = ring->queues[qid]; + WARN_ONCE(!queue, "Missing queue for qid %d\n", qid); + + return queue; +} + +static void fuse_uring_dispatch_ent(struct fuse_ring_ent *ent) +{ + struct io_uring_cmd *cmd = ent->cmd; + + uring_cmd_set_ring_ent(cmd, ent); + io_uring_cmd_complete_in_task(cmd, fuse_uring_send_in_task); +} + +/* queue a fuse request and send it if a ring entry is available */ +void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req) +{ + struct fuse_conn *fc = req->fm->fc; + struct fuse_ring *ring = fc->ring; + struct fuse_ring_queue *queue; + struct fuse_ring_ent *ent = NULL; + int err; + + err = -EINVAL; + queue = fuse_uring_task_to_queue(ring); + if (!queue) + goto err; + + if (req->in.h.opcode != FUSE_NOTIFY_REPLY) + req->in.h.unique = fuse_get_unique(fiq); + + spin_lock(&queue->lock); + err = -ENOTCONN; + if (unlikely(queue->stopped)) + goto err_unlock; + + ent = list_first_entry_or_null(&queue->ent_avail_queue, + struct fuse_ring_ent, list); + if (ent) + fuse_uring_add_req_to_ring_ent(ent, req); + else + list_add_tail(&req->list, &queue->fuse_req_queue); + spin_unlock(&queue->lock); + + if (ent) + fuse_uring_dispatch_ent(ent); + + return; + +err_unlock: + spin_unlock(&queue->lock); +err: + req->out.h.error = err; + clear_bit(FR_PENDING, &req->flags); + fuse_request_end(req); +} + +static const struct fuse_iqueue_ops fuse_io_uring_ops = { + /* should be send over io-uring as enhancement */ + .send_forget = fuse_dev_queue_forget, + + /* + * could be send over io-uring, but interrupts should be rare, + * no need to make the code complex + */ + .send_interrupt = fuse_dev_queue_interrupt, + .send_req = fuse_uring_queue_fuse_req, +}; diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index a4316e118cbd..0517a6eafc91 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -117,6 +117,8 @@ struct fuse_ring { unsigned long teardown_time; atomic_t queue_refs; + + bool ready; }; bool fuse_uring_enabled(void); @@ -124,6 +126,7 @@ void fuse_uring_destruct(struct fuse_conn *fc); void fuse_uring_stop_queues(struct fuse_ring *ring); void fuse_uring_abort_end_requests(struct fuse_ring *ring); int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); +void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req); static inline void fuse_uring_abort(struct fuse_conn *fc) { @@ -147,6 +150,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc) atomic_read(&ring->queue_refs) == 0); } +static inline bool fuse_uring_ready(struct fuse_conn *fc) +{ + return fc->ring && fc->ring->ready; +} + #else /* CONFIG_FUSE_IO_URING */ struct fuse_ring; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 857b0263f30eebe13ab4b6a65156a0d6c8fc2210 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This prepares queueing and sending background requests through io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Conflicts: fs/fuse/dev.c fs/fuse/dev_uring.c fs/fuse/fuse_dev_i.h [Conflict due to 36f3bc2f5ebd ("anolis: fuse: separate bg_queue for write and other requests") support two background queue] Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 37 +++++++++++++--- fs/fuse/dev_uring.c | 99 +++++++++++++++++++++++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 12 ++++++ fs/fuse/fuse_dev_i.h | 2 + 4 files changed, 145 insertions(+), 5 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index fa9425098126..e57f95749ed2 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -27,8 +27,6 @@ MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); -#define DEFAULT_BG_QUEUE READ - static struct kmem_cache *fuse_req_cachep; static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req) @@ -406,8 +404,13 @@ void fuse_request_end(struct fuse_req *req) } fc->num_background--; - fuse_dec_active_bg(fc, req); - flush_bg_queue(fc); + + if (fuse_uring_ready(fc)) { + fc->active_background[DEFAULT_BG_QUEUE]--; + } else { + fuse_dec_active_bg(fc, req); + flush_bg_queue(fc); + } spin_unlock(&fc->bg_lock); } else { /* Wake up waiter sleeping in request_wait_answer() */ @@ -588,7 +591,25 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args) return ret; } -static bool fuse_request_queue_background(struct fuse_req *req) +#ifdef CONFIG_FUSE_IO_URING +static bool fuse_request_queue_background_uring(struct fuse_conn *fc, + struct fuse_req *req) +{ + struct fuse_iqueue *fiq = &fc->iq; + + req->in.h.unique = fuse_get_unique(fiq); + req->in.h.len = sizeof(struct fuse_in_header) + + fuse_len_args(req->args->in_numargs, + (struct fuse_arg *) req->args->in_args); + + return fuse_uring_queue_bq_req(req); +} +#endif + +/* + * @return true if queued + */ +static int fuse_request_queue_background(struct fuse_req *req) { struct fuse_mount *fm = req->fm; struct fuse_conn *fc = fm->fc; @@ -600,6 +621,12 @@ static bool fuse_request_queue_background(struct fuse_req *req) atomic_inc(&fc->num_waiting); } __set_bit(FR_ISREPLY, &req->flags); + +#ifdef CONFIG_FUSE_IO_URING + if (fuse_uring_ready(fc)) + return fuse_request_queue_background_uring(fc, req); +#endif + spin_lock(&fc->bg_lock); if (likely(fc->connected)) { fc->num_background++; diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 728000434589..341d26d7e964 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -47,10 +47,53 @@ static struct fuse_ring_ent *uring_cmd_to_ring_ent(struct io_uring_cmd *cmd) return pdu->ent; } +static void fuse_uring_flush_bg(struct fuse_ring_queue *queue) +{ + struct fuse_ring *ring = queue->ring; + struct fuse_conn *fc = ring->fc; + + lockdep_assert_held(&queue->lock); + lockdep_assert_held(&fc->bg_lock); + + /* + * Allow one bg request per queue, ignoring global fc limits. + * This prevents a single queue from consuming all resources and + * eliminates the need for remote queue wake-ups when global + * limits are met but this queue has no more waiting requests. + */ + while ((fc->active_background[DEFAULT_BG_QUEUE] < fc->max_background || + !queue->active_background) && + (!list_empty(&queue->fuse_req_bg_queue))) { + struct fuse_req *req; + + req = list_first_entry(&queue->fuse_req_bg_queue, + struct fuse_req, list); + fc->active_background[DEFAULT_BG_QUEUE]++; + queue->active_background++; + + list_move_tail(&req->list, &queue->fuse_req_queue); + } +} + static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req, int error) { + struct fuse_ring_queue *queue = ent->queue; + struct fuse_ring *ring = queue->ring; + struct fuse_conn *fc = ring->fc; + + lockdep_assert_not_held(&queue->lock); + spin_lock(&queue->lock); ent->fuse_req = NULL; + if (test_bit(FR_BACKGROUND, &req->flags)) { + queue->active_background--; + spin_lock(&fc->bg_lock); + fuse_uring_flush_bg(queue); + spin_unlock(&fc->bg_lock); + } + + spin_unlock(&queue->lock); + if (error) req->out.h.error = error; @@ -78,6 +121,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring) { int qid; struct fuse_ring_queue *queue; + struct fuse_conn *fc = ring->fc; for (qid = 0; qid < ring->nr_queues; qid++) { queue = READ_ONCE(ring->queues[qid]); @@ -85,6 +129,13 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring) continue; queue->stopped = true; + + WARN_ON_ONCE(ring->fc->max_background != UINT_MAX); + spin_lock(&queue->lock); + spin_lock(&fc->bg_lock); + fuse_uring_flush_bg(queue); + spin_unlock(&fc->bg_lock); + spin_unlock(&queue->lock); fuse_uring_abort_end_queue_requests(queue); } } @@ -190,6 +241,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, INIT_LIST_HEAD(&queue->ent_w_req_queue); INIT_LIST_HEAD(&queue->ent_in_userspace); INIT_LIST_HEAD(&queue->fuse_req_queue); + INIT_LIST_HEAD(&queue->fuse_req_bg_queue); queue->fpq.processing = pq; fuse_pqueue_init(&queue->fpq); @@ -1141,6 +1193,53 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req) fuse_request_end(req); } +bool fuse_uring_queue_bq_req(struct fuse_req *req) +{ + struct fuse_conn *fc = req->fm->fc; + struct fuse_ring *ring = fc->ring; + struct fuse_ring_queue *queue; + struct fuse_ring_ent *ent = NULL; + + queue = fuse_uring_task_to_queue(ring); + if (!queue) + return false; + + spin_lock(&queue->lock); + if (unlikely(queue->stopped)) { + spin_unlock(&queue->lock); + return false; + } + + list_add_tail(&req->list, &queue->fuse_req_bg_queue); + + ent = list_first_entry_or_null(&queue->ent_avail_queue, + struct fuse_ring_ent, list); + spin_lock(&fc->bg_lock); + fc->num_background++; + if (fc->num_background == fc->max_background) + fc->blocked = 1; + fuse_uring_flush_bg(queue); + spin_unlock(&fc->bg_lock); + + /* + * Due to bg_queue flush limits there might be other bg requests + * in the queue that need to be handled first. Or no further req + * might be available. + */ + req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req, + list); + if (ent && req) { + fuse_uring_add_req_to_ring_ent(ent, req); + spin_unlock(&queue->lock); + + fuse_uring_dispatch_ent(ent); + } else { + spin_unlock(&queue->lock); + } + + return true; +} + static const struct fuse_iqueue_ops fuse_io_uring_ops = { /* should be send over io-uring as enhancement */ .send_forget = fuse_dev_queue_forget, diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index 0517a6eafc91..0182be61778b 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -82,8 +82,13 @@ struct fuse_ring_queue { /* fuse requests waiting for an entry slot */ struct list_head fuse_req_queue; + /* background fuse requests */ + struct list_head fuse_req_bg_queue; + struct fuse_pqueue fpq; + unsigned int active_background; + bool stopped; }; @@ -127,6 +132,7 @@ void fuse_uring_stop_queues(struct fuse_ring *ring); void fuse_uring_abort_end_requests(struct fuse_ring *ring); int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req); +bool fuse_uring_queue_bq_req(struct fuse_req *req); static inline void fuse_uring_abort(struct fuse_conn *fc) { @@ -179,6 +185,12 @@ static inline void fuse_uring_abort(struct fuse_conn *fc) static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc) { } + +static inline bool fuse_uring_ready(struct fuse_conn *fc) +{ + return false; +} + #endif /* CONFIG_FUSE_IO_URING */ #endif /* _FS_FUSE_DEV_URING_I_H */ diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 3b2bfe1248d3..476f98db1bf1 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -12,6 +12,8 @@ #define FUSE_INT_REQ_BIT (1ULL << 0) #define FUSE_REQ_ID_STEP (1ULL << 1) +#define DEFAULT_BG_QUEUE READ + struct fuse_arg; struct fuse_args; struct fuse_pqueue; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit b6236c8407cba5d7a108facb1bcfab24994d3814 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the fuse-server terminates while the fuse-client or kernel still has queued URING_CMDs, these commands retain references to the struct file used by the fuse connection. This prevents fuse_dev_release() from being invoked, resulting in a hung mount point. This patch addresses the issue by making queued URING_CMDs cancelable, allowing fuse_dev_release() to proceed as expected and preventing the mount point from hanging. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring.c | 69 +++++++++++++++++++++++++++++++++++++++++-- fs/fuse/dev_uring_i.h | 9 ++++++ 2 files changed, 75 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 341d26d7e964..de1510d1af6e 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -150,6 +150,7 @@ void fuse_uring_destruct(struct fuse_conn *fc) for (qid = 0; qid < ring->nr_queues; qid++) { struct fuse_ring_queue *queue = ring->queues[qid]; + struct fuse_ring_ent *ent, *next; if (!queue) continue; @@ -159,6 +160,12 @@ void fuse_uring_destruct(struct fuse_conn *fc) WARN_ON(!list_empty(&queue->ent_commit_queue)); WARN_ON(!list_empty(&queue->ent_in_userspace)); + list_for_each_entry_safe(ent, next, &queue->ent_released, + list) { + list_del_init(&ent->list); + kfree(ent); + } + kfree(queue->fpq.processing); kfree(queue); ring->queues[qid] = NULL; @@ -242,6 +249,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, INIT_LIST_HEAD(&queue->ent_in_userspace); INIT_LIST_HEAD(&queue->fuse_req_queue); INIT_LIST_HEAD(&queue->fuse_req_bg_queue); + INIT_LIST_HEAD(&queue->ent_released); queue->fpq.processing = pq; fuse_pqueue_init(&queue->fpq); @@ -289,6 +297,15 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent) /* remove entry from queue->fpq->processing */ list_del_init(&req->list); } + + /* + * The entry must not be freed immediately, due to access of direct + * pointer access of entries through IO_URING_F_CANCEL - there is a risk + * of race between daemon termination (which triggers IO_URING_F_CANCEL + * and accesses entries without checking the list state first + */ + list_move(&ent->list, &queue->ent_released); + ent->state = FRRS_RELEASED; spin_unlock(&queue->lock); if (cmd) @@ -296,9 +313,6 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent) if (req) fuse_uring_stop_fuse_req_end(req); - - list_del_init(&ent->list); - kfree(ent); } static void fuse_uring_stop_list_entries(struct list_head *head, @@ -318,6 +332,7 @@ static void fuse_uring_stop_list_entries(struct list_head *head, continue; } + ent->state = FRRS_TEARDOWN; list_move(&ent->list, &to_teardown); } spin_unlock(&queue->lock); @@ -432,6 +447,46 @@ void fuse_uring_stop_queues(struct fuse_ring *ring) } } +/* + * Handle IO_URING_F_CANCEL, typically should come on daemon termination. + * + * Releasing the last entry should trigger fuse_dev_release() if + * the daemon was terminated + */ +static void fuse_uring_cancel(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd); + struct fuse_ring_queue *queue; + bool need_cmd_done = false; + + /* + * direct access on ent - it must not be destructed as long as + * IO_URING_F_CANCEL might come up + */ + queue = ent->queue; + spin_lock(&queue->lock); + if (ent->state == FRRS_AVAILABLE) { + ent->state = FRRS_USERSPACE; + list_move(&ent->list, &queue->ent_in_userspace); + need_cmd_done = true; + ent->cmd = NULL; + } + spin_unlock(&queue->lock); + + if (need_cmd_done) { + /* no queue lock to avoid lock order issues */ + io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags); + } +} + +static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags, + struct fuse_ring_ent *ring_ent) +{ + uring_cmd_set_ring_ent(cmd, ring_ent); + io_uring_cmd_mark_cancelable(cmd, issue_flags); +} + /* * Checks for errors and stores it into the request */ @@ -839,6 +894,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags, spin_unlock(&queue->lock); /* without the queue lock, as other locks are taken */ + fuse_uring_prepare_cancel(cmd, issue_flags, ent); fuse_uring_commit(ent, req, issue_flags); /* @@ -888,6 +944,8 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent, struct fuse_conn *fc = ring->fc; struct fuse_iqueue *fiq = &fc->iq; + fuse_uring_prepare_cancel(cmd, issue_flags, ent); + spin_lock(&queue->lock); ent->cmd = cmd; fuse_uring_ent_avail(ent, queue); @@ -1038,6 +1096,11 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd, return -EOPNOTSUPP; } + if ((unlikely(issue_flags & IO_URING_F_CANCEL))) { + fuse_uring_cancel(cmd, issue_flags); + return 0; + } + /* This extra SQE size holds struct fuse_uring_cmd_req */ if (!(issue_flags & IO_URING_F_SQE128)) return -EINVAL; diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index 0182be61778b..2102b3d0c1ae 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -28,6 +28,12 @@ enum fuse_ring_req_state { /* The ring entry is in or on the way to user space */ FRRS_USERSPACE, + + /* The ring entry is in teardown */ + FRRS_TEARDOWN, + + /* The ring entry is released, but not freed yet */ + FRRS_RELEASED, }; /** A fuse ring entry, part of the ring queue */ @@ -79,6 +85,9 @@ struct fuse_ring_queue { /* entries in userspace */ struct list_head ent_in_userspace; + /* entries that are released */ + struct list_head ent_released; + /* fuse requests waiting for an entry slot */ struct list_head fuse_req_queue; -- 2.39.2

From: Bernd Schubert <bernd@bsbernd.com> mainline inclusion from mainline-v6.10-rc2 commit 3393ff964e0fa5def66570c54a4612bf9df06b76 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Avoid races and block request allocation until io-uring queues are ready. This is a especially important for background requests, as bg request completion might cause lock order inversion of the typical queue->lock and then fc->bg_lock fuse_request_end spin_lock(&fc->bg_lock); flush_bg_queue fuse_send_one fuse_uring_queue_fuse_req spin_lock(&queue->lock); Signed-off-by: Bernd Schubert <bernd@bsbernd.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/fuse_i.h fs/fuse/inode.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 3 ++- fs/fuse/dev_uring.c | 3 +++ fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 2 ++ 4 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e57f95749ed2..63073f93136a 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -73,7 +73,8 @@ void fuse_set_initialized(struct fuse_conn *fc) static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) { - return !fc->initialized || (for_background && fc->blocked); + return !fc->initialized || (for_background && fc->blocked) || + (fc->io_uring && !fuse_uring_ready(fc)); } static void fuse_drop_waiting(struct fuse_conn *fc) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index de1510d1af6e..a918491f7348 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -957,6 +957,7 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent, if (ready) { WRITE_ONCE(fiq->ops, &fuse_io_uring_ops); WRITE_ONCE(ring->ready, true); + wake_up_all(&fc->blocked_waitq); } } } @@ -1130,6 +1131,8 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd, if (err) { pr_info_once("FUSE_IO_URING_CMD_REGISTER failed err=%d\n", err); + fc->io_uring = 0; + wake_up_all(&fc->blocked_waitq); return err; } break; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 156c86358a9c..aefd6f97275b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -896,6 +896,9 @@ struct fuse_conn { /* write reques is aligned on max_write boundary */ unsigned int write_alignment:1; + /* Use io_uring for communication */ + unsigned int io_uring; + /** Maximum stack depth for passthrough backing files */ int max_stack_depth; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 4bcdeb94cec4..e7797a0e0003 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1362,6 +1362,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, fc->separate_background = 1; if (flags & FUSE_WRITE_ALIGNMENT) fc->write_alignment = 1; + if (flags & FUSE_OVER_IO_URING && fuse_uring_enabled()) + fc->io_uring = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 786412a73e7ee5b00ef3437bbf2f3a250759b2ae category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All required parts are handled now, fuse-io-uring can be enabled. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 3 +++ fs/fuse/dev_uring.c | 3 +-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 63073f93136a..6f0d18a95129 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2510,6 +2510,9 @@ const struct file_operations fuse_dev_operations = { .fasync = fuse_dev_fasync, .unlocked_ioctl = fuse_dev_ioctl, .compat_ioctl = compat_ptr_ioctl, +#ifdef CONFIG_FUSE_IO_URING + .uring_cmd = fuse_uring_cmd, +#endif }; EXPORT_SYMBOL_GPL(fuse_dev_operations); diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index a918491f7348..8433bb669487 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -1084,8 +1084,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd, * Entry function from io_uring to handle the given passthrough command * (op code IORING_OP_URING_CMD) */ -int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd, - unsigned int issue_flags) +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct fuse_dev *fud; struct fuse_conn *fc; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.10-rc2 commit 2d4fde59fd502a65c1698b61ad4d0f10a9ab665a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The enable_uring module parameter allows administrators to enable/disable io-uring support for FUSE at runtime. However, disabling io-uring while connections already have it enabled can lead to an inconsistent state. Fix this by keeping io-uring enabled on connections that were already using it, even if the module parameter is later disabled. This ensures active FUSE mounts continue to function correctly. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 8433bb669487..baf165d6e118 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -1091,11 +1091,6 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) u32 cmd_op = cmd->cmd_op; int err; - if (!enable_uring) { - pr_info_ratelimited("fuse-io-uring is disabled\n"); - return -EOPNOTSUPP; - } - if ((unlikely(issue_flags & IO_URING_F_CANCEL))) { fuse_uring_cancel(cmd, issue_flags); return 0; @@ -1112,6 +1107,12 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) } fc = fud->fc; + /* Once a connection has io-uring enabled on it, it can't be disabled */ + if (!enable_uring && !fc->io_uring) { + pr_info_ratelimited("fuse-io-uring is disabled\n"); + return -EOPNOTSUPP; + } + if (fc->aborted) return -ECONNABORTED; if (!fc->connected) -- 2.39.2

From: Joanne Koong <joannelkoong@gmail.com> mainline inclusion from mainline-v6.14-rc7 commit d9ecc77193cad25402ff5517fb26fb22b4db0e10 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There is a race condition leading to a kernel crash from a null dereference when attemping to access fc->lock in fuse_uring_create_queue(). fc may be NULL in the case where another thread is creating the uring in fuse_uring_create() and has set fc->ring but has not yet set ring->fc when fuse_uring_create_queue() reads ring->fc. There is another race condition as well where in fuse_uring_register(), ring->nr_queues may still be 0 and not yet set to the new value when we compare qid against it. This fix sets fc->ring only after ring->fc and ring->nr_queues have been set, which guarantees now that ring->fc is a proper pointer when any queues are created and ring->nr_queues reflects the right number of queues if ring is not NULL. We must use smp_store_release() and smp_load_acquire() semantics to ensure the ordering will remain correct where fc->ring is assigned only after ring->fc and ring->nr_queues have been assigned. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20250318003028.3330599-1-joannelkoong@gmail.com Fixes: 24fe962c86f5 ("fuse: {io-uring} Handle SQEs - register commands") Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index baf165d6e118..4628d9327eda 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -208,11 +208,11 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc) init_waitqueue_head(&ring->stop_waitq); - fc->ring = ring; ring->nr_queues = nr_queues; ring->fc = fc; ring->max_payload_sz = max_payload_size; atomic_set(&ring->queue_refs, 0); + smp_store_release(&fc->ring, ring); spin_unlock(&fc->lock); return ring; @@ -1041,7 +1041,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd, unsigned int issue_flags, struct fuse_conn *fc) { const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe); - struct fuse_ring *ring = fc->ring; + struct fuse_ring *ring = smp_load_acquire(&fc->ring); struct fuse_ring_queue *queue; struct fuse_ring_ent *ent; int err; -- 2.39.2

From: Luis Henriques <luis@igalia.com> mainline inclusion from mainline-v6.14-rc7 commit d55011469b41d9da6c06cb1c4a4da7a87fe155bc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When mounting a user-space filesystem using io_uring, the initialization of the rings is done separately in the server side. If for some reason (e.g. a server bug) this step is not performed it will be impossible to unmount the filesystem if there are already requests waiting. This issue is easily reproduced with the libfuse passthrough_ll example, if the queue depth is set to '0' and a request is queued before trying to unmount the filesystem. When trying to force the unmount, fuse_abort_conn() will try to wake up all tasks waiting in fc->blocked_waitq, but because the rings were never initialized, fuse_uring_ready() will never return 'true'. Fixes: 3393ff964e0f ("fuse: block request allocation until io-uring init is complete") Signed-off-by: Luis Henriques <luis@igalia.com> Link: https://lore.kernel.org/r/20250306111218.13734-1-luis@igalia.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 6f0d18a95129..c504f2a72b0d 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -74,7 +74,7 @@ void fuse_set_initialized(struct fuse_conn *fc) static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) { return !fc->initialized || (for_background && fc->blocked) || - (fc->io_uring && !fuse_uring_ready(fc)); + (fc->io_uring && fc->connected && !fuse_uring_ready(fc)); } static void fuse_drop_waiting(struct fuse_conn *fc) -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.14-rc7 commit 09098e62e4be8f0755e58d6078aaf27cbd9a3a8d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- task-A (application) might be in request_wait_answer and try to remove the request when it has FR_PENDING set. task-B (a fuse-server io-uring task) might handle this request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when fetching the next request and accessed the req from the pending list in fuse_uring_ent_assign_req(). That code path was not protected by fiq->lock and so might race with task-A. For scaling reasons we better don't use fiq->lock, but add a handler to remove canceled requests from the queue. This also removes usage of fiq->lock from fuse_uring_add_req_to_ring_ent() altogether, as it was there just to protect against this race and incomplete. Also added is a comment why FR_PENDING is not cleared. Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support") Cc: <stable@vger.kernel.org> # v6.14 Reported-by: Joanne Koong <joannelkoong@gmail.com> Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRj... Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev.c | 34 +++++++++++++++++++++++++--------- fs/fuse/dev_uring.c | 15 +++++++++++---- fs/fuse/dev_uring_i.h | 6 ++++++ fs/fuse/fuse_dev_i.h | 1 + fs/fuse/fuse_i.h | 3 +++ 5 files changed, 46 insertions(+), 13 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index c504f2a72b0d..8b52464d2e71 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -438,6 +438,24 @@ static int queue_interrupt(struct fuse_req *req) return 0; } +bool fuse_remove_pending_req(struct fuse_req *req, spinlock_t *lock) +{ + spin_lock(lock); + if (test_bit(FR_PENDING, &req->flags)) { + /* + * FR_PENDING does not get cleared as the request will end + * up in destruction anyway. + */ + list_del(&req->list); + spin_unlock(lock); + __fuse_put_request(req); + req->out.h.error = -EINTR; + return true; + } + spin_unlock(lock); + return false; +} + static void request_wait_answer(struct fuse_req *req) { struct fuse_conn *fc = req->fm->fc; @@ -459,22 +477,20 @@ static void request_wait_answer(struct fuse_req *req) } if (!test_bit(FR_FORCE, &req->flags)) { + bool removed; + /* Only fatal signals may interrupt this */ err = wait_event_killable(req->waitq, test_bit(FR_FINISHED, &req->flags)); if (!err) return; - spin_lock(&fiq->lock); - /* Request is not yet in userspace, bail out */ - if (test_bit(FR_PENDING, &req->flags)) { - list_del(&req->list); - spin_unlock(&fiq->lock); - __fuse_put_request(req); - req->out.h.error = -EINTR; + if (test_bit(FR_URING, &req->flags)) + removed = fuse_uring_remove_pending_req(req); + else + removed = fuse_remove_pending_req(req, &fiq->lock); + if (removed) return; - } - spin_unlock(&fiq->lock); } /* diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 4628d9327eda..093b887e935a 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -726,8 +726,6 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, struct fuse_req *req) { struct fuse_ring_queue *queue = ent->queue; - struct fuse_conn *fc = req->fm->fc; - struct fuse_iqueue *fiq = &fc->iq; lockdep_assert_held(&queue->lock); @@ -737,9 +735,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, ent->state); } - spin_lock(&fiq->lock); clear_bit(FR_PENDING, &req->flags); - spin_unlock(&fiq->lock); ent->fuse_req = req; ent->state = FRRS_FUSE_REQ; list_move(&ent->list, &queue->ent_w_req_queue); @@ -1238,6 +1234,8 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req) if (unlikely(queue->stopped)) goto err_unlock; + set_bit(FR_URING, &req->flags); + req->ring_queue = queue; ent = list_first_entry_or_null(&queue->ent_avail_queue, struct fuse_ring_ent, list); if (ent) @@ -1276,6 +1274,8 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req) return false; } + set_bit(FR_URING, &req->flags); + req->ring_queue = queue; list_add_tail(&req->list, &queue->fuse_req_bg_queue); ent = list_first_entry_or_null(&queue->ent_avail_queue, @@ -1306,6 +1306,13 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req) return true; } +bool fuse_uring_remove_pending_req(struct fuse_req *req) +{ + struct fuse_ring_queue *queue = req->ring_queue; + + return fuse_remove_pending_req(req, &queue->lock); +} + static const struct fuse_iqueue_ops fuse_io_uring_ops = { /* should be send over io-uring as enhancement */ .send_forget = fuse_dev_queue_forget, diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index 2102b3d0c1ae..e5b39a92b7ca 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -142,6 +142,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring); int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req); bool fuse_uring_queue_bq_req(struct fuse_req *req); +bool fuse_uring_remove_pending_req(struct fuse_req *req); static inline void fuse_uring_abort(struct fuse_conn *fc) { @@ -200,6 +201,11 @@ static inline bool fuse_uring_ready(struct fuse_conn *fc) return false; } +static inline bool fuse_uring_remove_pending_req(struct fuse_req *req) +{ + return false; +} + #endif /* CONFIG_FUSE_IO_URING */ #endif /* _FS_FUSE_DEV_URING_I_H */ diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 476f98db1bf1..a20cbaca9ceb 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -63,6 +63,7 @@ int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget); void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req); +bool fuse_remove_pending_req(struct fuse_req *req, spinlock_t *lock); #endif diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index aefd6f97275b..d1ccbaa96f87 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -389,6 +389,7 @@ struct fuse_io_priv { * FR_FINISHED: request is finished * FR_PRIVATE: request is on private list * FR_ASYNC: request is asynchronous + * FR_URING: request is handled through fuse-io-uring */ enum fuse_req_flag { FR_ISREPLY, @@ -403,6 +404,7 @@ enum fuse_req_flag { FR_FINISHED, FR_PRIVATE, FR_ASYNC, + FR_URING, }; /** @@ -452,6 +454,7 @@ struct fuse_req { #ifdef CONFIG_FUSE_IO_URING void *ring_entry; + void *ring_queue; #endif }; -- 2.39.2

From: Luis Henriques <luis@igalia.com> mainline inclusion from mainline-v6.14-rc7 commit 841c7b812c038661e4f659d1b9c9a366c6d24b71 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Function fuse_uring_create() is used only from dev_uring.c and does not need to be exposed in the header file. Furthermore, it has the wrong signature. While there, also remove the 'struct fuse_ring' forward declaration. Signed-off-by: Luis Henriques <luis@igalia.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring_i.h | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index e5b39a92b7ca..d87d0a55cb5f 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -173,12 +173,6 @@ static inline bool fuse_uring_ready(struct fuse_conn *fc) #else /* CONFIG_FUSE_IO_URING */ -struct fuse_ring; - -static inline void fuse_uring_create(struct fuse_conn *fc) -{ -} - static inline void fuse_uring_destruct(struct fuse_conn *fc) { } -- 2.39.2

From: Joanne Koong <joannelkoong@gmail.com> mainline inclusion from mainline-v6.14-rc7 commit 2d066800a4276340a97acc75c148892eb6f8781a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICJPON CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the ring is allocated, it is kzalloc-ed. ring->queue_refs will already be initialized to 0 by default. It does not need to be atomically set to 0. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/fuse/dev_uring.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 093b887e935a..dbc836c7069e 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -211,7 +211,6 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc) ring->nr_queues = nr_queues; ring->fc = fc; ring->max_payload_sz = max_payload_size; - atomic_set(&ring->queue_refs, 0); smp_store_release(&fc->ring, ring); spin_unlock(&fc->lock); -- 2.39.2

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/17114 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/7E3... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/17114 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/7E3...
participants (2)
-
Long Li
-
patchwork bot