[PATCH OLK-6.6 00/15] fuse: Backport Anolis Features and Performance patches

Hou Tao (1): fuse: set FR_PENDING atomically in fuse_resend() Jiachen Zhang (1): fuse: remove an unnecessary if statement Jingbo Xu (3): fuse: add support for explicit export disabling anolis: fuse: separate bg_queue for write and other requests anolis: fuse: introduce write alignment Joanne Koong (2): fuse: check aborted connection before adding requests to pending list for resending fuse: enable dynamic configuration of fuse max pages limit (FUSE_MAX_MAX_PAGES) Josef Bacik (1): fuse: use fuse_range_is_writeback() instead of iterating pages Kemeng Shi (1): fuse: remove unneeded lock which protecting update of congestion_threshold Miklos Szeredi (2): fuse: cleanup request queuing towards virtiofs fuse: clear FR_PENDING if abort is detected when sending request Richard Fung (1): fuse: Add initial support for fs-verity Zhao Chen (2): fuse: Introduce a new notification type for resend pending requests fuse: Use the high bit of request ID for indicating resend requests yangyun (1): fuse: add fast path for fuse_range_is_writeback Documentation/admin-guide/sysctl/fs.rst | 10 + fs/fuse/Makefile | 1 + fs/fuse/control.c | 6 +- fs/fuse/dev.c | 291 +++++++++++++++++------- fs/fuse/file.c | 26 ++- fs/fuse/fuse_i.h | 58 +++-- fs/fuse/inode.c | 39 +++- fs/fuse/ioctl.c | 62 +++++ fs/fuse/sysctl.c | 40 ++++ fs/fuse/virtio_fs.c | 41 ++-- include/uapi/linux/fuse.h | 22 ++ 11 files changed, 451 insertions(+), 145 deletions(-) create mode 100644 fs/fuse/sysctl.c -- 2.34.3

From: Jingbo Xu <jefflexu@linux.alibaba.com> mainline inclusion from mainline-v6.9-rc1 commit e022f6a1c711ab6d76e9e59dce77e2b25df75076 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- open_by_handle_at(2) can fail with -ESTALE with a valid handle returned by a previous name_to_handle_at(2) for evicted fuse inodes, which is especially common when entry_valid_timeout is 0, e.g. when the fuse daemon is in "cache=none" mode. The time sequence is like: name_to_handle_at(2) # succeed evict fuse inode open_by_handle_at(2) # fail The root cause is that, with 0 entry_valid_timeout, the dput() called in name_to_handle_at(2) will trigger iput -> evict(), which will send FUSE_FORGET to the daemon. The following open_by_handle_at(2) will send a new FUSE_LOOKUP request upon inode cache miss since the previous inode eviction. Then the fuse daemon may fail the FUSE_LOOKUP request with -ENOENT as the cached metadata of the requested inode has already been cleaned up during the previous FUSE_FORGET. The returned -ENOENT is treated as -ESTALE when open_by_handle_at(2) returns. This confuses the application somehow, as open_by_handle_at(2) fails when the previous name_to_handle_at(2) succeeds. The returned errno is also confusing as the requested file is not deleted and already there. It is reasonable to fail name_to_handle_at(2) early in this case, after which the application can fallback to open(2) to access files. Since this issue typically appears when entry_valid_timeout is 0 which is configured by the fuse daemon, the fuse daemon is the right person to explicitly disable the export when required. Also considering FUSE_EXPORT_SUPPORT actually indicates the support for lookups of "." and "..", and there are existing fuse daemons supporting export without FUSE_EXPORT_SUPPORT set, for compatibility, we add a new INIT flag for such purpose. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/inode.c [Conflicts with mainline commit 3ab394b363c5 ("fuse: disable the combination of passthrough and writeback cache").] Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/inode.c | 11 ++++++++++- include/uapi/linux/fuse.h | 3 +++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d771ce993e3e..26b670daf204 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1148,10 +1148,15 @@ static struct dentry *fuse_get_parent(struct dentry *child) fuse_invalidate_entry_cache(parent); return parent; } +/* only for fid encoding; no support for file handle */ +static const struct export_operations fuse_export_fid_operations = { + .encode_fh = fuse_encode_fh, +}; + static const struct export_operations fuse_export_operations = { .fh_to_dentry = fuse_fh_to_dentry, .fh_to_parent = fuse_fh_to_parent, .encode_fh = fuse_encode_fh, .get_parent = fuse_get_parent, @@ -1345,10 +1350,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, !(flags & FUSE_WRITEBACK_CACHE)) { fc->passthrough = 1; fc->max_stack_depth = arg->max_stack_depth; fm->sb->s_stack_depth = arg->max_stack_depth; } + if (flags & FUSE_NO_EXPORT_SUPPORT) + fm->sb->s_export_op = &fuse_export_fid_operations; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; fc->no_flock = 1; } @@ -1391,11 +1398,12 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL | FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | - FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP; + FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP | + FUSE_NO_EXPORT_SUPPORT; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; if (fuse_is_inode_dax_mode(fm->fc->dax_mode)) flags |= FUSE_HAS_INODE_DAX; @@ -1588,10 +1596,11 @@ static int fuse_fill_super_submount(struct super_block *sb, WARN_ON(sb->s_bdi != &noop_backing_dev_info); sb->s_bdi = bdi_get(parent_sb->s_bdi); sb->s_xattr = parent_sb->s_xattr; + sb->s_export_op = parent_sb->s_export_op; sb->s_time_gran = parent_sb->s_time_gran; sb->s_blocksize = parent_sb->s_blocksize; sb->s_blocksize_bits = parent_sb->s_blocksize_bits; sb->s_subtype = kstrdup(parent_sb->s_subtype, GFP_KERNEL); if (parent_sb->s_subtype && !sb->s_subtype) diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 1162a47b6a42..a86c2cad65ad 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -213,10 +213,11 @@ * - add FUSE_STATX and related structures * * 7.40 * - add max_stack_depth to fuse_init_out, add FUSE_PASSTHROUGH init flag * - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag + * - add FUSE_NO_EXPORT_SUPPORT init flag */ #ifndef _LINUX_FUSE_H #define _LINUX_FUSE_H @@ -414,10 +415,11 @@ struct fuse_file_lock { * FUSE_HAS_INODE_DAX: use per inode DAX * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir, * symlink and mknod (single group that matches parent) * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation * FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode. + * FUSE_NO_EXPORT_SUPPORT: explicitly disable export support */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) #define FUSE_FILE_OPS (1 << 2) #define FUSE_ATOMIC_O_TRUNC (1 << 3) @@ -454,10 +456,11 @@ struct fuse_file_lock { #define FUSE_HAS_INODE_DAX (1ULL << 33) #define FUSE_CREATE_SUPP_GROUP (1ULL << 34) #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) #define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) #define FUSE_PASSTHROUGH (1ULL << 37) +#define FUSE_NO_EXPORT_SUPPORT (1ULL << 38) /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP /** -- 2.34.3

From: Zhao Chen <winters.zc@antgroup.com> mainline inclusion from mainline-v6.9-rc1 commit 760eac73f9f69aa28fcb3050b4946c2dcc656d12 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When a FUSE daemon panics and failover, we aim to minimize the impact on applications by reusing the existing FUSE connection. During this process, another daemon is employed to preserve the FUSE connection's file descriptor. The new started FUSE Daemon will takeover the fd and continue to provide service. However, it is possible for some inflight requests to be lost and never returned. As a result, applications awaiting replies would become stuck forever. To address this, we can resend these pending requests to the new started FUSE daemon. This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which can trigger resending of the pending requests, ensuring they are properly processed again. Signed-off-by: Zhao Chen <winters.zc@antgroup.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 56 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/fuse.h | 2 ++ 2 files changed, 58 insertions(+) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 3c842286cdbb..b68e38f66ef4 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1775,10 +1775,63 @@ static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size, copy_finish: fuse_copy_finish(cs); return err; } +/* + * Resending all processing queue requests. + * + * During a FUSE daemon panics and failover, it is possible for some inflight + * requests to be lost and never returned. As a result, applications awaiting + * replies would become stuck forever. To address this, we can use notification + * to trigger resending of these pending requests to the FUSE daemon, ensuring + * they are properly processed again. + * + * Please note that this strategy is applicable only to idempotent requests or + * if the FUSE daemon takes careful measures to avoid processing duplicated + * non-idempotent requests. + */ +static void fuse_resend(struct fuse_conn *fc) +{ + struct fuse_dev *fud; + struct fuse_req *req, *next; + struct fuse_iqueue *fiq = &fc->iq; + LIST_HEAD(to_queue); + unsigned int i; + + spin_lock(&fc->lock); + if (!fc->connected) { + spin_unlock(&fc->lock); + return; + } + + list_for_each_entry(fud, &fc->devices, entry) { + struct fuse_pqueue *fpq = &fud->pq; + + spin_lock(&fpq->lock); + for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) + list_splice_tail_init(&fpq->processing[i], &to_queue); + spin_unlock(&fpq->lock); + } + spin_unlock(&fc->lock); + + list_for_each_entry_safe(req, next, &to_queue, list) { + __set_bit(FR_PENDING, &req->flags); + } + + spin_lock(&fiq->lock); + /* iq and pq requests are both oldest to newest */ + list_splice(&to_queue, &fiq->pending); + fiq->ops->wake_pending_and_unlock(fiq); +} + +static int fuse_notify_resend(struct fuse_conn *fc) +{ + fuse_resend(fc); + return 0; +} + static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code, unsigned int size, struct fuse_copy_state *cs) { /* Don't try to move pages (yet) */ cs->move_pages = 0; @@ -1800,10 +1853,13 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code, return fuse_notify_retrieve(fc, size, cs); case FUSE_NOTIFY_DELETE: return fuse_notify_delete(fc, size, cs); + case FUSE_NOTIFY_RESEND: + return fuse_notify_resend(fc); + default: fuse_copy_finish(cs); return -EINVAL; } } diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index a86c2cad65ad..659932eb35e1 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -214,10 +214,11 @@ * * 7.40 * - add max_stack_depth to fuse_init_out, add FUSE_PASSTHROUGH init flag * - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag * - add FUSE_NO_EXPORT_SUPPORT init flag + * - add FUSE_NOTIFY_RESEND */ #ifndef _LINUX_FUSE_H #define _LINUX_FUSE_H @@ -643,10 +644,11 @@ enum fuse_notify_code { FUSE_NOTIFY_INVAL_INODE = 2, FUSE_NOTIFY_INVAL_ENTRY = 3, FUSE_NOTIFY_STORE = 4, FUSE_NOTIFY_RETRIEVE = 5, FUSE_NOTIFY_DELETE = 6, + FUSE_NOTIFY_RESEND = 7, FUSE_NOTIFY_CODE_MAX, }; /* The read buffer is required to be at least 8k, but may be much larger */ #define FUSE_MIN_READ_BUFFER 8192 -- 2.34.3

From: Zhao Chen <winters.zc@antgroup.com> mainline inclusion from mainline-v6.9-rc1 commit 9e7f5296f475ba5ab887ae3e55b922e17e99752b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Some FUSE daemons want to know if the received request is a resend request. The high bit of the fuse request ID is utilized for indicating this, enabling the receiver to perform appropriate handling. The init flag "FUSE_HAS_RESEND" is added to indicate this feature. Signed-off-by: Zhao Chen <winters.zc@antgroup.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 2 ++ fs/fuse/inode.c | 2 +- include/uapi/linux/fuse.h | 13 ++++++++++++- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index b68e38f66ef4..f0a31da5587d 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1814,10 +1814,12 @@ static void fuse_resend(struct fuse_conn *fc) } spin_unlock(&fc->lock); list_for_each_entry_safe(req, next, &to_queue, list) { __set_bit(FR_PENDING, &req->flags); + /* mark the request as resend request */ + req->in.h.unique |= FUSE_UNIQUE_RESEND; } spin_lock(&fiq->lock); /* iq and pq requests are both oldest to newest */ list_splice(&to_queue, &fiq->pending); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 26b670daf204..b67928a773c6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1399,11 +1399,11 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP | - FUSE_NO_EXPORT_SUPPORT; + FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; if (fuse_is_inode_dax_mode(fm->fc->dax_mode)) flags |= FUSE_HAS_INODE_DAX; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 659932eb35e1..d08b99d60f6f 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -214,11 +214,11 @@ * * 7.40 * - add max_stack_depth to fuse_init_out, add FUSE_PASSTHROUGH init flag * - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag * - add FUSE_NO_EXPORT_SUPPORT init flag - * - add FUSE_NOTIFY_RESEND + * - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag */ #ifndef _LINUX_FUSE_H #define _LINUX_FUSE_H @@ -417,10 +417,12 @@ struct fuse_file_lock { * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir, * symlink and mknod (single group that matches parent) * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation * FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode. * FUSE_NO_EXPORT_SUPPORT: explicitly disable export support + * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit + * of the request ID indicates resend requests */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) #define FUSE_FILE_OPS (1 << 2) #define FUSE_ATOMIC_O_TRUNC (1 << 3) @@ -458,10 +460,11 @@ struct fuse_file_lock { #define FUSE_CREATE_SUPP_GROUP (1ULL << 34) #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) #define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) #define FUSE_PASSTHROUGH (1ULL << 37) #define FUSE_NO_EXPORT_SUPPORT (1ULL << 38) +#define FUSE_HAS_RESEND (1ULL << 39) /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP /** @@ -971,10 +974,18 @@ struct fuse_fallocate_in { uint64_t length; uint32_t mode; uint32_t padding; }; +/** + * FUSE request unique ID flag + * + * Indicates whether this is a resend request. The receiver should handle this + * request accordingly. + */ +#define FUSE_UNIQUE_RESEND (1ULL << 63) + struct fuse_in_header { uint32_t len; uint32_t opcode; uint64_t unique; uint64_t nodeid; -- 2.34.3

From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.10-rc1 commit 42815f8ac54c5113bf450ec4b7ccc5b62af0f6a7 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When fuse_resend() moves the requests from processing lists to pending list, it uses __set_bit() to set FR_PENDING bit in req->flags. Using __set_bit() is not safe, because other functions may update req->flags concurrently (e.g., request_wait_answer() may call set_bit(FR_INTERRUPTED, &flags)). Fix it by using set_bit() instead. Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index f0a31da5587d..9bcb279d8440 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1813,11 +1813,11 @@ static void fuse_resend(struct fuse_conn *fc) spin_unlock(&fpq->lock); } spin_unlock(&fc->lock); list_for_each_entry_safe(req, next, &to_queue, list) { - __set_bit(FR_PENDING, &req->flags); + set_bit(FR_PENDING, &req->flags); /* mark the request as resend request */ req->in.h.unique |= FUSE_UNIQUE_RESEND; } spin_lock(&fiq->lock); -- 2.34.3

From: Joanne Koong <joannelkoong@gmail.com> mainline inclusion from mainline-v6.11-rc7 commit 97f30876c94382d1b01d45c2c76be8911b196527 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There is a race condition where inflight requests will not be aborted if they are in the middle of being re-sent when the connection is aborted. If fuse_resend has already moved all the requests in the fpq->processing lists to its private queue ("to_queue") and then the connection starts and finishes aborting, these requests will be added to the pending queue and remain on it indefinitely. Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v6.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9bcb279d8440..680c467f50b3 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -29,10 +29,12 @@ MODULE_ALIAS("devname:fuse"); #define FUSE_INT_REQ_BIT (1ULL << 0) #define FUSE_REQ_ID_STEP (1ULL << 1) static struct kmem_cache *fuse_req_cachep; +static void end_requests(struct list_head *head); + static struct fuse_dev *fuse_get_dev(struct file *file) { /* * Lockless access is OK, because file->private data is set * once during mount and is valid until the file is released. @@ -1819,10 +1821,17 @@ static void fuse_resend(struct fuse_conn *fc) /* mark the request as resend request */ req->in.h.unique |= FUSE_UNIQUE_RESEND; } spin_lock(&fiq->lock); + if (!fiq->connected) { + spin_unlock(&fiq->lock); + list_for_each_entry(req, &to_queue, list) + clear_bit(FR_PENDING, &req->flags); + end_requests(&to_queue); + return; + } /* iq and pq requests are both oldest to newest */ list_splice(&to_queue, &fiq->pending); fiq->ops->wake_pending_and_unlock(fiq); } -- 2.34.3

From: Jingbo Xu <jefflexu@linux.alibaba.com> anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC6CFR CVE: NA -------------------------------- ANBZ: #9340 Reaahead may be starved by the writeback wave, since the writeback routine sends forced background requests which are enqueued into bg_queue list without considering the max_background limit, while the background requests sent by readahead routine are non-forced and thus throttled by max_background limit. There can be hundreds thousands of WRITE requests queued in bg_queue list prior to READ requests, and thus the asynchronous readahead can be starved from the writeback wave. Fix this by introducing two bg_queue lists and separating WRITE requests from the others. Also make readahead routine send forced background request. Besides also introduce FUSE_SEPARATE_BACKGROUND init flag. When FUSE_SEPARATE_BACKGROUND init flag is set, there are two separate background queues, one for WRITE requests and one for the others. The number of active background requests is also counted separately for these two sorts of requests in this case, and thus there are at maximum max_background in-flight background requests for each sort of requests. Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3352 Conflicts: fs/fuse/dev.c fs/fuse/fuse_i.h fs/fuse/inode.c include/uapi/linux/fuse.h [Context conflict.] Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 70 ++++++++++++++++++++++++++++++++++----- fs/fuse/file.c | 5 +++ fs/fuse/fuse_i.h | 21 +++++++++--- fs/fuse/inode.c | 11 +++--- include/uapi/linux/fuse.h | 4 +++ 5 files changed, 93 insertions(+), 18 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 680c467f50b3..92301dbe7f77 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -27,10 +27,12 @@ MODULE_ALIAS("devname:fuse"); /* Ordinary requests have even IDs, while interrupts IDs are odd */ #define FUSE_INT_REQ_BIT (1ULL << 0) #define FUSE_REQ_ID_STEP (1ULL << 1) +#define DEFAULT_BG_QUEUE READ + static struct kmem_cache *fuse_req_cachep; static void end_requests(struct list_head *head); static struct fuse_dev *fuse_get_dev(struct file *file) @@ -252,25 +254,75 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, kfree(forget); spin_unlock(&fiq->lock); } } -static void flush_bg_queue(struct fuse_conn *fc) +static void fuse_add_bg_queue(struct fuse_conn *fc, struct fuse_req *req) { - struct fuse_iqueue *fiq = &fc->iq; + if (fc->separate_background) { + if (req->args->opcode == FUSE_WRITE) + list_add_tail(&req->list, &fc->bg_queue[WRITE]); + else + list_add_tail(&req->list, &fc->bg_queue[READ]); + } else { + /* default to one single background queue */ + list_add_tail(&req->list, &fc->bg_queue[DEFAULT_BG_QUEUE]); + } +} - while (fc->active_background < fc->max_background && - !list_empty(&fc->bg_queue)) { - struct fuse_req *req; +static void fuse_dec_active_bg(struct fuse_conn *fc, struct fuse_req *req) +{ + if (fc->separate_background) { + if (req->args->opcode == FUSE_WRITE) + fc->active_background[WRITE]--; + else + fc->active_background[READ]--; + } else { + /* default to one single count */ + fc->active_background[DEFAULT_BG_QUEUE]--; + } +} - req = list_first_entry(&fc->bg_queue, struct fuse_req, list); +/* bg_queue needs to be further flushed when true returned */ +static bool do_flush_bg_queue(struct fuse_conn *fc, unsigned int index, + unsigned int batch) +{ + struct fuse_iqueue *fiq = &fc->iq; + struct fuse_req *req; + unsigned int count = 0; + + while (fc->active_background[index] < fc->max_background && + !list_empty(&fc->bg_queue[index])) { + if (batch && count++ == batch) + return true; + req = list_first_entry(&fc->bg_queue[index], + struct fuse_req, list); list_del(&req->list); - fc->active_background++; + fc->active_background[index]++; spin_lock(&fiq->lock); req->in.h.unique = fuse_get_unique(fiq); queue_request_and_unlock(fiq, req); } + return false; +} + +static void flush_bg_queue(struct fuse_conn *fc) +{ + if (!fc->separate_background) { + do_flush_bg_queue(fc, DEFAULT_BG_QUEUE, 0); + } else { + bool proceed_write = true, proceed_other = true; + + do { + if (proceed_other) + proceed_other = do_flush_bg_queue(fc, READ, + FUSE_DEFAULT_MAX_BACKGROUND); + if (proceed_write) + proceed_write = do_flush_bg_queue(fc, WRITE, + FUSE_DEFAULT_MAX_BACKGROUND); + } while (proceed_other || proceed_write); + } } /* * This function is called when a request is finished. Either a reply * has arrived or it was aborted (and not yet sent) or some error @@ -316,11 +368,11 @@ void fuse_request_end(struct fuse_req *req) if (waitqueue_active(&fc->blocked_waitq)) wake_up(&fc->blocked_waitq); } fc->num_background--; - fc->active_background--; + fuse_dec_active_bg(fc, req); flush_bg_queue(fc); spin_unlock(&fc->bg_lock); } else { /* Wake up waiter sleeping in request_wait_answer() */ wake_up(&req->waitq); @@ -538,11 +590,11 @@ static bool fuse_request_queue_background(struct fuse_req *req) spin_lock(&fc->bg_lock); if (likely(fc->connected)) { fc->num_background++; if (fc->num_background == fc->max_background) fc->blocked = 1; - list_add_tail(&req->list, &fc->bg_queue); + fuse_add_bg_queue(fc, req); flush_bg_queue(fc); queued = true; } spin_unlock(&fc->bg_lock); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 25d6951ef2c0..7200b176ac79 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -980,10 +980,15 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) fuse_read_args_fill(ia, file, pos, count, FUSE_READ); ia->read.attr_ver = fuse_get_attr_version(fm->fc); if (fm->fc->async_read) { ia->ff = fuse_file_get(ff); ap->args.end = fuse_readpages_end; + /* force background request to avoid starvation from writeback */ + if (fm->fc->separate_background) { + ap->args.force = true; + ap->args.nocreds = true; + } err = fuse_simple_background(fm, &ap->args, GFP_KERNEL); if (!err) return; } else { res = fuse_simple_request(fm, &ap->args); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 0428658c499e..c9902fb877cb 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -37,10 +37,13 @@ #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32 /** Maximum of max_pages received in init_out */ #define FUSE_MAX_MAX_PAGES 256 +/** Maximum number of outstanding background requests */ +#define FUSE_DEFAULT_MAX_BACKGROUND 12 + /** Bias for fi->writectr, meaning new writepages must not be sent */ #define FUSE_NOWRITE INT_MIN /** It could be as large as PATH_MAX, but would that have any uses? */ #define FUSE_NAME_MAX 1024 @@ -662,15 +665,22 @@ struct fuse_conn { unsigned congestion_threshold; /** Number of requests currently in the background */ unsigned num_background; - /** Number of background requests currently queued for userspace */ - unsigned active_background; + /* + * Number of background requests currently queued for userspace. + * active_background[WRITE] for WRITE requests, and + * active_background[READ] for others. + */ + unsigned active_background[2]; - /** The list of background requests set aside for later queuing */ - struct list_head bg_queue; + /* + * The list of background requests set aside for later queuing. + * bg_queue[WRITE] for WRITE requests, bg_queue[READ] for others. + */ + struct list_head bg_queue[2]; /** Protects: max_background, congestion_threshold, num_background, * active_background, bg_queue, blocked */ spinlock_t bg_lock; @@ -865,10 +875,13 @@ struct fuse_conn { unsigned int no_tmpfile:1; /* Relax restrictions to allow shared mmap in FOPEN_DIRECT_IO mode */ unsigned int direct_io_allow_mmap:1; + /* separate background queue for WRITE requests and the others */ + unsigned int separate_background:1; + /* Is statx not implemented by fs? */ unsigned int no_statx:1; /* Use pages instead of pointer for kernel I/O */ unsigned int use_pages_for_kvec_io:1; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index b67928a773c6..a1669c498f5e 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -51,13 +51,10 @@ MODULE_PARM_DESC(max_user_congthresh, "Global limit for the maximum congestion threshold an " "unprivileged user can set"); #define FUSE_DEFAULT_BLKSIZE 512 -/** Maximum number of outstanding background requests */ -#define FUSE_DEFAULT_MAX_BACKGROUND 12 - /** Congestion starts at 75% of maximum */ #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND * 3 / 4) #ifdef CONFIG_BLOCK static struct file_system_type fuseblk_fs_type; @@ -933,11 +930,12 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, init_rwsem(&fc->killsb); refcount_set(&fc->count, 1); atomic_set(&fc->dev_count, 1); init_waitqueue_head(&fc->blocked_waitq); fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv); - INIT_LIST_HEAD(&fc->bg_queue); + INIT_LIST_HEAD(&fc->bg_queue[READ]); + INIT_LIST_HEAD(&fc->bg_queue[WRITE]); INIT_LIST_HEAD(&fc->entry); INIT_LIST_HEAD(&fc->devices); atomic_set(&fc->num_waiting, 0); fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND; fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD; @@ -1352,10 +1350,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, fc->max_stack_depth = arg->max_stack_depth; fm->sb->s_stack_depth = arg->max_stack_depth; } if (flags & FUSE_NO_EXPORT_SUPPORT) fm->sb->s_export_op = &fuse_export_fid_operations; + if (flags & FUSE_SEPARATE_BACKGROUND) + fc->separate_background = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; fc->no_flock = 1; } @@ -1399,11 +1399,12 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP | - FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND; + FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND | + FUSE_SEPARATE_BACKGROUND; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; if (fuse_is_inode_dax_mode(fm->fc->dax_mode)) flags |= FUSE_HAS_INODE_DAX; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index d08b99d60f6f..2a84cecf75a1 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -419,10 +419,12 @@ struct fuse_file_lock { * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation * FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode. * FUSE_NO_EXPORT_SUPPORT: explicitly disable export support * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit * of the request ID indicates resend requests + * FUSE_SEPARATE_BACKGROUND: separate background queue for WRITE requests and + * the others */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) #define FUSE_FILE_OPS (1 << 2) #define FUSE_ATOMIC_O_TRUNC (1 << 3) @@ -461,10 +463,12 @@ struct fuse_file_lock { #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) #define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) #define FUSE_PASSTHROUGH (1ULL << 37) #define FUSE_NO_EXPORT_SUPPORT (1ULL << 38) #define FUSE_HAS_RESEND (1ULL << 39) +#define FUSE_SEPARATE_BACKGROUND (1ULL << 56) +/* The 57th bit is left to FUSE_HAS_RECOVERY */ /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP /** -- 2.34.3

From: Jingbo Xu <jefflexu@linux.alibaba.com> anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC6CFR CVE: NA -------------------------------- ANBZ: #9568 Sometimes the file offset alignment needs to be opt-in to achieve the optimum performance at the backend store. For example when ErasureCode [1] is used at the backend store, the optimum write performance is achieved when the WRITE request is aligned with the stripe size of ErasureCode. Otherwise a non-aligned WRITE request needs to be split at the stripe size boundary. It is quite costly to handle these split partial requests, as firstly the whole stripe to which the split partial request belongs needs to be read out, then overwrite the read stripe buffer with the request, and finally write the whole stripe back to the persistent storage. Thus the backend store can suffer severe performance degradation when WRITE requests can not fit into one stripe exactly. The write performance can be 10x slower when the request is 256KB in size given 4MB stripe size. Also there can be 50% performance degradation in theory if the request is not stripe boundary aligned. Besides, the conveyed test indicates that, the non-alignment issue becomes more severe when decreasing fuse's max_ratio, maybe partly because the background writeback now is more likely to run parallelly with the dirtier. fuse's max_ratio ratio of aligned WRITE requests ---------------- ------------------------------- 70 99.9% 40 74% 20 45% 10 20% With the patched version, which makes the alignment constraint opt-in when constructing WRITE requests, the ratio of aligned WRITE requests increases to 98% (previously 20%) when fuse's max_ratio is 10. [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.... Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3533 Conflicts: fs/fuse/file.c fs/fuse/fuse_i.h fs/fuse/inode.c include/uapi/linux/fuse.h [Context conflict.] Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/file.c | 4 ++++ fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 10 +++++++++- include/uapi/linux/fuse.h | 2 ++ 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 7200b176ac79..37282b8363f0 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2303,10 +2303,14 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct page *page, /* Need to grow the pages array? If so, did the expansion fail? */ if (ap->num_pages == data->max_pages && !fuse_pages_realloc(data)) return true; + /* Reached alignment boundary */ + if (fc->write_alignment && !(page->index % fc->write_align_pages)) + return true; + return false; } static int fuse_writepages_fill(struct folio *folio, struct writeback_control *wbc, void *_data) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index c9902fb877cb..7dd7471962b9 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -641,10 +641,13 @@ struct fuse_conn { unsigned max_read; /** Maximum write size */ unsigned max_write; + /* Maxmum number of pages that write request should be aligned with */ + unsigned int write_align_pages; + /** Maximum number of pages that can be used in a single request */ unsigned int max_pages; /** Constrain ->max_pages to this value during feature negotiation */ unsigned int max_pages_limit; @@ -887,10 +890,13 @@ struct fuse_conn { unsigned int use_pages_for_kvec_io:1; /** Passthrough support for read/write IO */ unsigned int passthrough:1; + /* write reques is aligned on max_write boundary */ + unsigned int write_alignment:1; + /** Maximum stack depth for passthrough backing files */ int max_stack_depth; /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index a1669c498f5e..9caa80973a5c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1352,10 +1352,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, } if (flags & FUSE_NO_EXPORT_SUPPORT) fm->sb->s_export_op = &fuse_export_fid_operations; if (flags & FUSE_SEPARATE_BACKGROUND) fc->separate_background = 1; + if (flags & FUSE_WRITE_ALIGNMENT) + fc->write_alignment = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; fc->no_flock = 1; } @@ -1363,10 +1365,16 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages); fc->minor = arg->minor; fc->max_write = arg->minor < 5 ? 4096 : arg->max_write; fc->max_write = max_t(unsigned, 4096, fc->max_write); + if (fc->write_alignment) { + if (fc->max_write % PAGE_SIZE) + ok = false; + else + fc->write_align_pages = fc->max_write >> PAGE_SHIFT; + } fc->conn_init = 1; } kfree(ia); if (!ok) { @@ -1400,11 +1408,11 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP | FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND | - FUSE_SEPARATE_BACKGROUND; + FUSE_SEPARATE_BACKGROUND | FUSE_WRITE_ALIGNMENT; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; if (fuse_is_inode_dax_mode(fm->fc->dax_mode)) flags |= FUSE_HAS_INODE_DAX; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 2a84cecf75a1..dd1809072e63 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -421,10 +421,11 @@ struct fuse_file_lock { * FUSE_NO_EXPORT_SUPPORT: explicitly disable export support * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit * of the request ID indicates resend requests * FUSE_SEPARATE_BACKGROUND: separate background queue for WRITE requests and * the others + * FUSE_WRITE_ALIGNMENT: write request is aligned on max_write boundary */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) #define FUSE_FILE_OPS (1 << 2) #define FUSE_ATOMIC_O_TRUNC (1 << 3) @@ -463,10 +464,11 @@ struct fuse_file_lock { #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) #define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) #define FUSE_PASSTHROUGH (1ULL << 37) #define FUSE_NO_EXPORT_SUPPORT (1ULL << 38) #define FUSE_HAS_RESEND (1ULL << 39) +#define FUSE_WRITE_ALIGNMENT (1ULL << 55) #define FUSE_SEPARATE_BACKGROUND (1ULL << 56) /* The 57th bit is left to FUSE_HAS_RECOVERY */ /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP -- 2.34.3

From: Richard Fung <richardfung@google.com> mainline inclusion from mainline-v6.10-rc1 commit 9fe2a036a23ceeac402c4fde8ec37c02ab25f133 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This adds support for the FS_IOC_ENABLE_VERITY and FS_IOC_MEASURE_VERITY ioctls. The FS_IOC_READ_VERITY_METADATA is missing but from the documentation, "This is a fairly specialized use case, and most fs-verity users won’t need this ioctl." Signed-off-by: Richard Fung <richardfung@google.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/ioctl.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c index 726640fa439e..572ce8a82ceb 100644 --- a/fs/fuse/ioctl.c +++ b/fs/fuse/ioctl.c @@ -6,10 +6,11 @@ #include "fuse_i.h" #include <linux/uio.h> #include <linux/compat.h> #include <linux/fileattr.h> +#include <linux/fsverity.h> static ssize_t fuse_send_ioctl(struct fuse_mount *fm, struct fuse_args *args, struct fuse_ioctl_out *outarg) { ssize_t ret; @@ -115,10 +116,57 @@ static int fuse_copy_ioctl_iovec(struct fuse_conn *fc, struct iovec *dst, } return 0; } +/* For fs-verity, determine iov lengths from input */ +static int fuse_setup_measure_verity(unsigned long arg, struct iovec *iov) +{ + __u16 digest_size; + struct fsverity_digest __user *uarg = (void __user *)arg; + + if (copy_from_user(&digest_size, &uarg->digest_size, sizeof(digest_size))) + return -EFAULT; + + if (digest_size > SIZE_MAX - sizeof(struct fsverity_digest)) + return -EINVAL; + + iov->iov_len = sizeof(struct fsverity_digest) + digest_size; + + return 0; +} + +static int fuse_setup_enable_verity(unsigned long arg, struct iovec *iov, + unsigned int *in_iovs) +{ + struct fsverity_enable_arg enable; + struct fsverity_enable_arg __user *uarg = (void __user *)arg; + const __u32 max_buffer_len = FUSE_MAX_MAX_PAGES * PAGE_SIZE; + + if (copy_from_user(&enable, uarg, sizeof(enable))) + return -EFAULT; + + if (enable.salt_size > max_buffer_len || enable.sig_size > max_buffer_len) + return -ENOMEM; + + if (enable.salt_size > 0) { + iov++; + (*in_iovs)++; + + iov->iov_base = u64_to_user_ptr(enable.salt_ptr); + iov->iov_len = enable.salt_size; + } + + if (enable.sig_size > 0) { + iov++; + (*in_iovs)++; + + iov->iov_base = u64_to_user_ptr(enable.sig_ptr); + iov->iov_len = enable.sig_size; + } + return 0; +} /* * For ioctls, there is no generic way to determine how much memory * needs to be read and/or written. Furthermore, ioctls are allowed * to dereference the passed pointer, so the parameter requires deep @@ -225,10 +273,22 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, if (_IOC_DIR(cmd) & _IOC_READ) { out_iov = iov; out_iovs = 1; } + + err = 0; + switch (cmd) { + case FS_IOC_MEASURE_VERITY: + err = fuse_setup_measure_verity(arg, iov); + break; + case FS_IOC_ENABLE_VERITY: + err = fuse_setup_enable_verity(arg, iov, &in_iovs); + break; + } + if (err) + goto out; } retry: inarg.in_size = in_size = iov_length(in_iov, in_iovs); inarg.out_size = out_size = iov_length(out_iov, out_iovs); -- 2.34.3

From: Joanne Koong <joannelkoong@gmail.com> mainline inclusion from mainline-v6.13-rc1 commit 2b3933b1e0a0a4b758fbc164bb31db0c113a7e2c category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Introduce the capability to dynamically configure the max pages limit (FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators to dynamically set the maximum number of pages that can be used for servicing requests in fuse. Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set to 256 pages. One result of this is that the buffer size for a write request is limited to 1 MiB on a 4k-page system. The default value for this sysctl is the original limit (256 pages). $ sysctl -a | grep max_pages_limit fs.fuse.max_pages_limit = 256 $ sysctl -n fs.fuse.max_pages_limit 256 $ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit 1024 $ sysctl -n fs.fuse.max_pages_limit 1024 $ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit 65535 $ sysctl -n fs.fuse.max_pages_limit 65535 Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- Documentation/admin-guide/sysctl/fs.rst | 10 +++++++ fs/fuse/Makefile | 1 + fs/fuse/fuse_i.h | 14 +++++++-- fs/fuse/inode.c | 11 ++++++- fs/fuse/ioctl.c | 4 ++- fs/fuse/sysctl.c | 40 +++++++++++++++++++++++++ 6 files changed, 75 insertions(+), 5 deletions(-) create mode 100644 fs/fuse/sysctl.c diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index a321b84eccaa..297228f9f299 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -330,5 +330,15 @@ This configuration option sets the maximum number of "watches" that are allowed for each user. Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit one. The current default value for ``max_user_watches`` is 4% of the available low memory, divided by the "watch" cost in bytes. + +5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems +===================================================================== + +This directory contains the following configuration options for FUSE +filesystems: + +``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for +setting/getting the maximum number of pages that can be used for servicing +requests in FUSE. diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index 6e0228c6d0cb..dabc852a7ff1 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_VIRTIO_FS) += virtiofs.o fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o fuse-y += iomode.o fuse-$(CONFIG_FUSE_DAX) += dax.o fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o +fuse-$(CONFIG_SYSCTL) += sysctl.o virtiofs-y := virtio_fs.o diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7dd7471962b9..879111e1e9d4 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -34,13 +34,10 @@ #include <linux/kabi.h> /** Default max number of pages that can be used in a single read request */ #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32 -/** Maximum of max_pages received in init_out */ -#define FUSE_MAX_MAX_PAGES 256 - /** Maximum number of outstanding background requests */ #define FUSE_DEFAULT_MAX_BACKGROUND 12 /** Bias for fi->writectr, meaning new writepages must not be sent */ #define FUSE_NOWRITE INT_MIN @@ -49,10 +46,13 @@ #define FUSE_NAME_MAX 1024 /** Number of dentries for each connection in the control filesystem */ #define FUSE_CTL_NUM_DENTRIES 5 +/** Maximum of max_pages received in init_out */ +extern unsigned int fuse_max_pages_limit; + /** List of active connections */ extern struct list_head fuse_conn_list; /** Global mutex protecting fuse_conn_list and the control filesystem */ extern struct mutex fuse_mutex; @@ -1510,6 +1510,14 @@ ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos, ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags); ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma); +#ifdef CONFIG_SYSCTL +extern int fuse_sysctl_register(void); +extern void fuse_sysctl_unregister(void); +#else +#define fuse_sysctl_register() (0) +#define fuse_sysctl_unregister() do { } while (0) +#endif /* CONFIG_SYSCTL */ + #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 9caa80973a5c..0c95552179e5 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -33,10 +33,12 @@ static struct kmem_cache *fuse_inode_cachep; struct list_head fuse_conn_list; DEFINE_MUTEX(fuse_mutex); static int set_global_limit(const char *val, const struct kernel_param *kp); +unsigned int fuse_max_pages_limit = 256; + unsigned max_user_bgreq; module_param_call(max_user_bgreq, set_global_limit, param_get_uint, &max_user_bgreq, 0644); __MODULE_PARM_TYPE(max_user_bgreq, "uint"); MODULE_PARM_DESC(max_user_bgreq, @@ -947,11 +949,11 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, atomic64_set(&fc->attr_version, 1); get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; - fc->max_pages_limit = FUSE_MAX_MAX_PAGES; + fc->max_pages_limit = fuse_max_pages_limit; if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) fuse_backing_files_init(fc); INIT_LIST_HEAD(&fc->mounts); @@ -2067,22 +2069,29 @@ static int __init fuse_fs_init(void) err = register_filesystem(&fuse_fs_type); if (err) goto out3; + err = fuse_sysctl_register(); + if (err) + goto out4; + return 0; + out4: + unregister_filesystem(&fuse_fs_type); out3: unregister_fuseblk(); out2: kmem_cache_destroy(fuse_inode_cachep); out: return err; } static void fuse_fs_cleanup(void) { + fuse_sysctl_unregister(); unregister_filesystem(&fuse_fs_type); unregister_fuseblk(); /* * Make sure all delayed rcu free inodes are flushed before we diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c index 572ce8a82ceb..a6c8ee551635 100644 --- a/fs/fuse/ioctl.c +++ b/fs/fuse/ioctl.c @@ -8,10 +8,12 @@ #include <linux/uio.h> #include <linux/compat.h> #include <linux/fileattr.h> #include <linux/fsverity.h> +#define FUSE_VERITY_ENABLE_ARG_MAX_PAGES 256 + static ssize_t fuse_send_ioctl(struct fuse_mount *fm, struct fuse_args *args, struct fuse_ioctl_out *outarg) { ssize_t ret; @@ -138,11 +140,11 @@ static int fuse_setup_measure_verity(unsigned long arg, struct iovec *iov) static int fuse_setup_enable_verity(unsigned long arg, struct iovec *iov, unsigned int *in_iovs) { struct fsverity_enable_arg enable; struct fsverity_enable_arg __user *uarg = (void __user *)arg; - const __u32 max_buffer_len = FUSE_MAX_MAX_PAGES * PAGE_SIZE; + const __u32 max_buffer_len = FUSE_VERITY_ENABLE_ARG_MAX_PAGES * PAGE_SIZE; if (copy_from_user(&enable, uarg, sizeof(enable))) return -EFAULT; if (enable.salt_size > max_buffer_len || enable.sig_size > max_buffer_len) diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c new file mode 100644 index 000000000000..b272bb333005 --- /dev/null +++ b/fs/fuse/sysctl.c @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * linux/fs/fuse/fuse_sysctl.c + * + * Sysctl interface to fuse parameters + */ +#include <linux/sysctl.h> + +#include "fuse_i.h" + +static struct ctl_table_header *fuse_table_header; + +/* Bound by fuse_init_out max_pages, which is a u16 */ +static unsigned int sysctl_fuse_max_pages_limit = 65535; + +static struct ctl_table fuse_sysctl_table[] = { + { + .procname = "max_pages_limit", + .data = &fuse_max_pages_limit, + .maxlen = sizeof(fuse_max_pages_limit), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = SYSCTL_ONE, + .extra2 = &sysctl_fuse_max_pages_limit, + }, +}; + +int fuse_sysctl_register(void) +{ + fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table); + if (!fuse_table_header) + return -ENOMEM; + return 0; +} + +void fuse_sysctl_unregister(void) +{ + unregister_sysctl_table(fuse_table_header); + fuse_table_header = NULL; +} -- 2.34.3

From: yangyun <yangyun50@huawei.com> mainline inclusion from mainline-v6.12-rc1 commit ac5cffec53be0b0231b89470a357bd3a5814f599 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In some cases, the fi->writepages may be empty. And there is no need to check fi->writepages with spin_lock, which may have an impact on performance due to lock contention. For example, in scenarios where multiple readers read the same file without any writers, or where the page cache is not enabled. Also remove the outdated comment since commit 6b2fb79963fb ("fuse: optimize writepages search") has optimize the situation by replacing list with rb-tree. Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/file.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 37282b8363f0..8d8e2e66634c 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -446,20 +446,20 @@ static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, return NULL; } /* * Check if any page in a range is under writeback - * - * This is currently done by walking the list of writepage requests - * for the inode, which can be pretty inefficient. */ static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from, pgoff_t idx_to) { struct fuse_inode *fi = get_fuse_inode(inode); bool found; + if (RB_EMPTY_ROOT(&fi->writepages)) + return false; + spin_lock(&fi->lock); found = fuse_find_writeback(fi, idx_from, idx_to); spin_unlock(&fi->lock); return found; -- 2.34.3

From: Miklos Szeredi <mszeredi@redhat.com> mainline inclusion from mainline-v6.12-rc1 commit 5de8acb41c86f1d335d165e0a350441ea3a1f480 category: cleanup bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Virtiofs has its own queuing mechanism, but still requests are first queued on fiq->pending to be immediately dequeued and queued onto the virtio queue. The queuing on fiq->pending is unnecessary and might even have some performance impact due to being a contention point. Forget requests are handled similarly. Move the queuing of requests and forgets into the fiq->ops->*. fuse_iqueue_ops are renamed to reflect the new semantics. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Fixed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Tested-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Reviewed-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/dev.c [Conflicts with "anolis: fuse: separate bg_queue for write and other requests"] Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 159 ++++++++++++++++++++++++-------------------- fs/fuse/fuse_i.h | 19 ++---- fs/fuse/virtio_fs.c | 41 ++++-------- 3 files changed, 106 insertions(+), 113 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 92301dbe7f77..be92d418a50e 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -194,15 +194,26 @@ unsigned int fuse_len_args(unsigned int numargs, struct fuse_arg *args) return nbytes; } EXPORT_SYMBOL_GPL(fuse_len_args); -u64 fuse_get_unique(struct fuse_iqueue *fiq) +static u64 fuse_get_unique_locked(struct fuse_iqueue *fiq) { fiq->reqctr += FUSE_REQ_ID_STEP; return fiq->reqctr; } + +u64 fuse_get_unique(struct fuse_iqueue *fiq) +{ + u64 ret; + + spin_lock(&fiq->lock); + ret = fuse_get_unique_locked(fiq); + spin_unlock(&fiq->lock); + + return ret; +} EXPORT_SYMBOL_GPL(fuse_get_unique); static unsigned int fuse_req_hash(u64 unique) { return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS); @@ -217,45 +228,83 @@ __releases(fiq->lock) wake_up(&fiq->waitq); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); spin_unlock(&fiq->lock); } +static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget) +{ + spin_lock(&fiq->lock); + if (fiq->connected) { + fiq->forget_list_tail->next = forget; + fiq->forget_list_tail = forget; + fuse_dev_wake_and_unlock(fiq); + } else { + kfree(forget); + spin_unlock(&fiq->lock); + } +} + +static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) +{ + spin_lock(&fiq->lock); + if (list_empty(&req->intr_entry)) { + list_add_tail(&req->intr_entry, &fiq->interrupts); + /* + * Pairs with smp_mb() implied by test_and_set_bit() + * from fuse_request_end(). + */ + smp_mb(); + if (test_bit(FR_FINISHED, &req->flags)) { + list_del_init(&req->intr_entry); + spin_unlock(&fiq->lock); + } else { + fuse_dev_wake_and_unlock(fiq); + } + } else { + spin_unlock(&fiq->lock); + } +} + +static void fuse_dev_queue_req(struct fuse_iqueue *fiq, struct fuse_req *req) +{ + spin_lock(&fiq->lock); + if (fiq->connected) { + if (req->in.h.opcode != FUSE_NOTIFY_REPLY) + req->in.h.unique = fuse_get_unique_locked(fiq); + list_add_tail(&req->list, &fiq->pending); + fuse_dev_wake_and_unlock(fiq); + } else { + spin_unlock(&fiq->lock); + req->out.h.error = -ENOTCONN; + fuse_request_end(req); + } +} + const struct fuse_iqueue_ops fuse_dev_fiq_ops = { - .wake_forget_and_unlock = fuse_dev_wake_and_unlock, - .wake_interrupt_and_unlock = fuse_dev_wake_and_unlock, - .wake_pending_and_unlock = fuse_dev_wake_and_unlock, + .send_forget = fuse_dev_queue_forget, + .send_interrupt = fuse_dev_queue_interrupt, + .send_req = fuse_dev_queue_req, }; EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops); -static void queue_request_and_unlock(struct fuse_iqueue *fiq, - struct fuse_req *req) -__releases(fiq->lock) +static void fuse_send_one(struct fuse_iqueue *fiq, struct fuse_req *req) { req->in.h.len = sizeof(struct fuse_in_header) + fuse_len_args(req->args->in_numargs, (struct fuse_arg *) req->args->in_args); - list_add_tail(&req->list, &fiq->pending); - fiq->ops->wake_pending_and_unlock(fiq); + fiq->ops->send_req(fiq, req); } void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, u64 nodeid, u64 nlookup) { struct fuse_iqueue *fiq = &fc->iq; forget->forget_one.nodeid = nodeid; forget->forget_one.nlookup = nlookup; - spin_lock(&fiq->lock); - if (fiq->connected) { - fiq->forget_list_tail->next = forget; - fiq->forget_list_tail = forget; - fiq->ops->wake_forget_and_unlock(fiq); - } else { - kfree(forget); - spin_unlock(&fiq->lock); - } + fiq->ops->send_forget(fiq, forget); } static void fuse_add_bg_queue(struct fuse_conn *fc, struct fuse_req *req) { if (fc->separate_background) { @@ -296,13 +345,11 @@ static bool do_flush_bg_queue(struct fuse_conn *fc, unsigned int index, return true; req = list_first_entry(&fc->bg_queue[index], struct fuse_req, list); list_del(&req->list); fc->active_background[index]++; - spin_lock(&fiq->lock); - req->in.h.unique = fuse_get_unique(fiq); - queue_request_and_unlock(fiq, req); + fuse_send_one(fiq, req); } return false; } static void flush_bg_queue(struct fuse_conn *fc) @@ -387,33 +434,16 @@ EXPORT_SYMBOL_GPL(fuse_request_end); static int queue_interrupt(struct fuse_req *req) { struct fuse_iqueue *fiq = &req->fm->fc->iq; - spin_lock(&fiq->lock); /* Check for we've sent request to interrupt this req */ - if (unlikely(!test_bit(FR_INTERRUPTED, &req->flags))) { - spin_unlock(&fiq->lock); + if (unlikely(!test_bit(FR_INTERRUPTED, &req->flags))) return -EINVAL; - } - if (list_empty(&req->intr_entry)) { - list_add_tail(&req->intr_entry, &fiq->interrupts); - /* - * Pairs with smp_mb() implied by test_and_set_bit() - * from fuse_request_end(). - */ - smp_mb(); - if (test_bit(FR_FINISHED, &req->flags)) { - list_del_init(&req->intr_entry); - spin_unlock(&fiq->lock); - return 0; - } - fiq->ops->wake_interrupt_and_unlock(fiq); - } else { - spin_unlock(&fiq->lock); - } + fiq->ops->send_interrupt(fiq, req); + return 0; } static void request_wait_answer(struct fuse_req *req) { @@ -464,25 +494,19 @@ static void request_wait_answer(struct fuse_req *req) static void __fuse_request_send(struct fuse_req *req) { struct fuse_iqueue *fiq = &req->fm->fc->iq; BUG_ON(test_bit(FR_BACKGROUND, &req->flags)); - spin_lock(&fiq->lock); - if (!fiq->connected) { - spin_unlock(&fiq->lock); - req->out.h.error = -ENOTCONN; - } else { - req->in.h.unique = fuse_get_unique(fiq); - /* acquire extra reference, since request is still needed - after fuse_request_end() */ - __fuse_get_request(req); - queue_request_and_unlock(fiq, req); - request_wait_answer(req); - /* Pairs with smp_wmb() in fuse_request_end() */ - smp_rmb(); - } + /* acquire extra reference, since request is still needed after + fuse_request_end() */ + __fuse_get_request(req); + fuse_send_one(fiq, req); + + request_wait_answer(req); + /* Pairs with smp_wmb() in fuse_request_end() */ + smp_rmb(); } static void fuse_adjust_compat(struct fuse_conn *fc, struct fuse_args *args) { if (fc->minor < 4 && args->opcode == FUSE_STATFS) @@ -633,31 +657,23 @@ EXPORT_SYMBOL_GPL(fuse_simple_background); static int fuse_simple_notify_reply(struct fuse_mount *fm, struct fuse_args *args, u64 unique) { struct fuse_req *req; struct fuse_iqueue *fiq = &fm->fc->iq; - int err = 0; req = fuse_get_req(fm, false); if (IS_ERR(req)) return PTR_ERR(req); __clear_bit(FR_ISREPLY, &req->flags); req->in.h.unique = unique; fuse_args_to_req(req, args); - spin_lock(&fiq->lock); - if (fiq->connected) { - queue_request_and_unlock(fiq, req); - } else { - err = -ENODEV; - spin_unlock(&fiq->lock); - fuse_put_request(req); - } + fuse_send_one(fiq, req); - return err; + return 0; } /* * Lock the request. Up to the next unlock_request() there mustn't be * anything that could cause a page-fault. If the request was already @@ -1128,13 +1144,13 @@ __releases(fiq->lock) fuse_copy_finish(cs); return err ? err : reqsize; } -struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq, - unsigned int max, - unsigned int *countp) +static struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq, + unsigned int max, + unsigned int *countp) { struct fuse_forget_link *head = fiq->forget_list_head.next; struct fuse_forget_link **newhead = &head; unsigned count; @@ -1149,11 +1165,10 @@ struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq, if (countp != NULL) *countp = count; return head; } -EXPORT_SYMBOL(fuse_dequeue_forget); static int fuse_read_single_forget(struct fuse_iqueue *fiq, struct fuse_copy_state *cs, size_t nbytes) __releases(fiq->lock) @@ -1164,11 +1179,11 @@ __releases(fiq->lock) .nlookup = forget->forget_one.nlookup, }; struct fuse_in_header ih = { .opcode = FUSE_FORGET, .nodeid = forget->forget_one.nodeid, - .unique = fuse_get_unique(fiq), + .unique = fuse_get_unique_locked(fiq), .len = sizeof(ih) + sizeof(arg), }; spin_unlock(&fiq->lock); kfree(forget); @@ -1195,11 +1210,11 @@ __releases(fiq->lock) unsigned count; struct fuse_forget_link *head; struct fuse_batch_forget_in arg = { .count = 0 }; struct fuse_in_header ih = { .opcode = FUSE_BATCH_FORGET, - .unique = fuse_get_unique(fiq), + .unique = fuse_get_unique_locked(fiq), .len = sizeof(ih) + sizeof(arg), }; if (nbytes < ih.len) { spin_unlock(&fiq->lock); @@ -1882,11 +1897,11 @@ static void fuse_resend(struct fuse_conn *fc) end_requests(&to_queue); return; } /* iq and pq requests are both oldest to newest */ list_splice(&to_queue, &fiq->pending); - fiq->ops->wake_pending_and_unlock(fiq); + fuse_dev_wake_and_unlock(fiq); } static int fuse_notify_resend(struct fuse_conn *fc) { fuse_resend(fc); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 879111e1e9d4..665e89d8ea5b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -461,26 +461,23 @@ struct fuse_iqueue; * readiness. These callbacks allow other device types to respond to input * queue activity. */ struct fuse_iqueue_ops { /** - * Signal that a forget has been queued + * Send one forget */ - void (*wake_forget_and_unlock)(struct fuse_iqueue *fiq) - __releases(fiq->lock); + void (*send_forget)(struct fuse_iqueue *fiq, struct fuse_forget_link *link); /** - * Signal that an INTERRUPT request has been queued + * Send interrupt for request */ - void (*wake_interrupt_and_unlock)(struct fuse_iqueue *fiq) - __releases(fiq->lock); + void (*send_interrupt)(struct fuse_iqueue *fiq, struct fuse_req *req); /** - * Signal that a request has been queued + * Send one request */ - void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq) - __releases(fiq->lock); + void (*send_req)(struct fuse_iqueue *fiq, struct fuse_req *req); /** * Clean up when fuse_iqueue is destroyed */ void (*release)(struct fuse_iqueue *fiq); @@ -1091,14 +1088,10 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, u64 nodeid, u64 nlookup); struct fuse_forget_link *fuse_alloc_forget(void); -struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq, - unsigned int max, - unsigned int *countp); - /* * Initialize READ or READDIR request */ struct fuse_io_args { union { diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index ba816b50c1d4..a6e31e68906a 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1081,26 +1081,17 @@ static struct virtio_driver virtio_fs_driver = { .freeze = virtio_fs_freeze, .restore = virtio_fs_restore, #endif }; -static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq) -__releases(fiq->lock) +static void virtio_fs_send_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *link) { - struct fuse_forget_link *link; struct virtio_fs_forget *forget; struct virtio_fs_forget_req *req; - struct virtio_fs *fs; - struct virtio_fs_vq *fsvq; - u64 unique; - - link = fuse_dequeue_forget(fiq, 1, NULL); - unique = fuse_get_unique(fiq); - - fs = fiq->priv; - fsvq = &fs->vqs[VQ_HIPRIO]; - spin_unlock(&fiq->lock); + struct virtio_fs *fs = fiq->priv; + struct virtio_fs_vq *fsvq = &fs->vqs[VQ_HIPRIO]; + u64 unique = fuse_get_unique(fiq); /* Allocate a buffer for the request */ forget = kmalloc(sizeof(*forget), GFP_NOFS | __GFP_NOFAIL); req = &forget->req; @@ -1116,21 +1107,19 @@ __releases(fiq->lock) send_forget_request(fsvq, forget, false); kfree(link); } -static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq) -__releases(fiq->lock) +static void virtio_fs_send_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) { /* * TODO interrupts. * * Normal fs operations on a local filesystems aren't interruptible. * Exceptions are blocking lock operations; for example fcntl(F_SETLKW) * with shared lock between host and guest. */ - spin_unlock(&fiq->lock); } /* Count number of scatter-gather elements required */ static unsigned int sg_count_fuse_pages(struct fuse_page_desc *page_descs, unsigned int num_pages, @@ -1331,25 +1320,21 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq, } return ret; } -static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq) -__releases(fiq->lock) +static void virtio_fs_send_req(struct fuse_iqueue *fiq, struct fuse_req *req) { unsigned int queue_id; struct virtio_fs *fs; - struct fuse_req *req; struct virtio_fs_vq *fsvq; int ret; - WARN_ON(list_empty(&fiq->pending)); - req = list_last_entry(&fiq->pending, struct fuse_req, list); + if (req->in.h.opcode != FUSE_NOTIFY_REPLY) + req->in.h.unique = fuse_get_unique(fiq); + clear_bit(FR_PENDING, &req->flags); - list_del_init(&req->list); - WARN_ON(!list_empty(&fiq->pending)); - spin_unlock(&fiq->lock); fs = fiq->priv; queue_id = VQ_REQUEST + fs->mq_map[raw_smp_processor_id()]; pr_debug("%s: opcode %u unique %#llx nodeid %#llx in.len %u out.len %u queue_id %u\n", @@ -1385,14 +1370,14 @@ __releases(fiq->lock) return; } } static const struct fuse_iqueue_ops virtio_fs_fiq_ops = { - .wake_forget_and_unlock = virtio_fs_wake_forget_and_unlock, - .wake_interrupt_and_unlock = virtio_fs_wake_interrupt_and_unlock, - .wake_pending_and_unlock = virtio_fs_wake_pending_and_unlock, - .release = virtio_fs_fiq_release, + .send_forget = virtio_fs_send_forget, + .send_interrupt = virtio_fs_send_interrupt, + .send_req = virtio_fs_send_req, + .release = virtio_fs_fiq_release, }; static inline void virtio_fs_ctx_set_defaults(struct fuse_fs_context *ctx) { ctx->rootmode = S_IFDIR; -- 2.34.3

From: Miklos Szeredi <mszeredi@redhat.com> mainline inclusion form mainline-v6.12-rc1 commit fcd2d9e1fdcd7cada612f2e8737fb13a2bce7d0e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBT4CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... -------------------------------- The (!fiq->connected) check was moved into the queuing method resulting in the following: Fixes: 5de8acb41c86 ("fuse: cleanup request queuing towards virtiofs") Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/all/ZvFEAM6JfrBKsOU0@ly-workstation/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index be92d418a50e..0d9c6e604cd9 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -273,10 +273,11 @@ static void fuse_dev_queue_req(struct fuse_iqueue *fiq, struct fuse_req *req) list_add_tail(&req->list, &fiq->pending); fuse_dev_wake_and_unlock(fiq); } else { spin_unlock(&fiq->lock); req->out.h.error = -ENOTCONN; + clear_bit(FR_PENDING, &req->flags); fuse_request_end(req); } } const struct fuse_iqueue_ops fuse_dev_fiq_ops = { -- 2.34.3

From: Kemeng Shi <shikemeng@huaweicloud.com> mainline inclusion from mainline-v6.9-rc1 commit efc4105a4cf9e300b8e9150147415fa235059293 category: performance bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Commit 670d21c6e17f6 ("fuse: remove reliance on bdi congestion") change how congestion_threshold is used and lock in fuse_conn_congestion_threshold_write is not needed anymore. 1. Access to supe_block is removed along with removing of bdi congestion. Then down_read(&fc->killsb) which protecting access to super_block is no needed. 2. Compare num_background and congestion_threshold without holding bg_lock. Then there is no need to hold bg_lock to update congestion_threshold. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/control.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/fuse/control.c b/fs/fuse/control.c index ab62e4624256..1bf928e277fe 100644 --- a/fs/fuse/control.c +++ b/fs/fuse/control.c @@ -172,15 +172,11 @@ static ssize_t fuse_conn_congestion_threshold_write(struct file *file, goto out; fc = fuse_ctl_file_conn_get(file); if (!fc) goto out; - down_read(&fc->killsb); - spin_lock(&fc->bg_lock); - fc->congestion_threshold = val; - spin_unlock(&fc->bg_lock); - up_read(&fc->killsb); + WRITE_ONCE(fc->congestion_threshold, val); fuse_conn_put(fc); out: return ret; } -- 2.34.3

From: Josef Bacik <josef@toxicpanda.com> mainline inclusion from mainline-v6.13-rc1 commit aaa32429da09a9afa0f54a197733d757334ed169 category: performance bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- fuse_send_readpages() waits for writeback on each page. This can be replaced by a single call to fuse_range_is_writeback(). [SzM: split this off from "fuse: convert readahead to use folios"] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/file.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 8d8e2e66634c..a20c0e449042 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -998,16 +998,21 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) } static void fuse_readahead(struct readahead_control *rac) { struct inode *inode = rac->mapping->host; + struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); unsigned int i, max_pages, nr_pages = 0; + pgoff_t first = readahead_index(rac); + pgoff_t last = first + readahead_count(rac) - 1; if (fuse_is_bad(inode)) return; + wait_event(fi->page_waitq, !fuse_range_is_writeback(inode, first, last)); + max_pages = min_t(unsigned int, fc->max_pages, fc->max_read / PAGE_SIZE); for (;;) { struct fuse_io_args *ia; @@ -1030,12 +1035,10 @@ static void fuse_readahead(struct readahead_control *rac) if (!ia) return; ap = &ia->ap; nr_pages = __readahead_batch(rac, ap->pages, nr_pages); for (i = 0; i < nr_pages; i++) { - fuse_wait_on_page_writeback(inode, - readahead_index(rac) + i); ap->descs[i].length = PAGE_SIZE; } ap->num_pages = nr_pages; fuse_send_readpages(ia, rac->file); } -- 2.34.3

From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> mainline inclusion from mainline-v6.9-rc1 commit 8a5fb186431326886ccc7b71d40aaf5e53b5d91a category: cleanup bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC6CFR Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- FUSE remote locking code paths never add any locking state to inode->i_flctx, so the locks_remove_posix() function called on file close will return without calling fuse_setlk(). Therefore, as the if statement to be removed in this commit will always be false, remove it for clearness. Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/fuse/file.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index a20c0e449042..4c1922e8c3f8 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2743,14 +2743,10 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock) if (fl->fl_lmops && fl->fl_lmops->lm_grant) { /* NLM needs asynchronous locks, which we don't support yet */ return -ENOLCK; } - /* Unlock on close is handled by the flush method */ - if ((fl->fl_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX) - return 0; - fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg); err = fuse_simple_request(fm, &args); /* locking is restartable */ if (err == -EINTR) -- 2.34.3

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16412 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/UNQ... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16412 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/UNQ...
participants (2)
-
patchwork bot
-
Wang Zhaolong